pub(super) struct HashJoinStream {Show 18 fields
partition: usize,
schema: Arc<Schema>,
on_right: Vec<PhysicalExprRef>,
filter: Option<JoinFilter>,
join_type: JoinType,
right: SendableRecordBatchStream,
random_state: RandomState,
join_metrics: BuildProbeJoinMetrics,
column_indices: Vec<ColumnIndex>,
null_equality: NullEquality,
state: HashJoinStreamState,
build_side: BuildSide,
batch_size: usize,
hashes_buffer: Vec<u64>,
right_side_ordered: bool,
bounds_accumulator: Option<Arc<SharedBoundsAccumulator>>,
bounds_waiter: Option<OnceFut<()>>,
mode: PartitionMode,
}Expand description
[Stream] for super::HashJoinExec that does the actual join.
This stream:
- Collecting the build side (left input) into a hash map
- Iterating over the probe side (right input) in streaming fashion
- Looking up matches against the hash table and applying join filters
- Producing joined [
RecordBatch]es incrementally - Emitting unmatched rows for outer/semi/anti joins in the final stage
Fields§
§partition: usizePartition identifier for debugging and determinism
schema: Arc<Schema>Input schema
on_right: Vec<PhysicalExprRef>equijoin columns from the right (probe side)
filter: Option<JoinFilter>optional join filter
join_type: JoinTypetype of the join (left, right, semi, etc)
right: SendableRecordBatchStreamright (probe) input
random_state: RandomStateRandom state used for hashing initialization
join_metrics: BuildProbeJoinMetricsMetrics
column_indices: Vec<ColumnIndex>Information of index and left / right placement of columns
null_equality: NullEqualityDefines the null equality for the join.
state: HashJoinStreamStateState of the stream
build_side: BuildSideBuild side
batch_size: usizeMaximum output batch size
hashes_buffer: Vec<u64>Scratch space for computing hashes
right_side_ordered: boolSpecifies whether the right side has an ordering to potentially preserve
bounds_accumulator: Option<Arc<SharedBoundsAccumulator>>Shared bounds accumulator for coordinating dynamic filter updates (optional)
bounds_waiter: Option<OnceFut<()>>Optional future to signal when bounds have been reported by all partitions and the dynamic filter has been updated
mode: PartitionModePartitioning mode to use
Implementations§
Source§impl HashJoinStream
impl HashJoinStream
pub(super) fn new( partition: usize, schema: Arc<Schema>, on_right: Vec<PhysicalExprRef>, filter: Option<JoinFilter>, join_type: JoinType, right: SendableRecordBatchStream, random_state: RandomState, join_metrics: BuildProbeJoinMetrics, column_indices: Vec<ColumnIndex>, null_equality: NullEquality, state: HashJoinStreamState, build_side: BuildSide, batch_size: usize, hashes_buffer: Vec<u64>, right_side_ordered: bool, bounds_accumulator: Option<Arc<SharedBoundsAccumulator>>, mode: PartitionMode, ) -> Self
Sourcefn poll_next_impl(
&mut self,
cx: &mut Context<'_>,
) -> Poll<Option<Result<RecordBatch>>>
fn poll_next_impl( &mut self, cx: &mut Context<'_>, ) -> Poll<Option<Result<RecordBatch>>>
Separate implementation function that unpins the HashJoinStream so
that partial borrows work correctly
Sourcefn wait_for_partition_bounds_report(
&mut self,
cx: &mut Context<'_>,
) -> Poll<Result<StatefulStreamResult<Option<RecordBatch>>>>
fn wait_for_partition_bounds_report( &mut self, cx: &mut Context<'_>, ) -> Poll<Result<StatefulStreamResult<Option<RecordBatch>>>>
Optional step to wait until bounds have been reported by all partitions. This state is only entered if a bounds accumulator is present.
§Why wait?
The dynamic filter is only built once all partitions have reported their bounds. If we do not wait here, the probe-side scan may start before the filter is ready. This can lead to the probe-side scan missing the opportunity to apply the filter and skip reading unnecessary data.
Sourcefn collect_build_side(
&mut self,
cx: &mut Context<'_>,
) -> Poll<Result<StatefulStreamResult<Option<RecordBatch>>>>
fn collect_build_side( &mut self, cx: &mut Context<'_>, ) -> Poll<Result<StatefulStreamResult<Option<RecordBatch>>>>
Collects build-side data by polling OnceFut future from initialized build-side
Updates build-side to Ready, and state to FetchProbeSide
Sourcefn fetch_probe_batch(
&mut self,
cx: &mut Context<'_>,
) -> Poll<Result<StatefulStreamResult<Option<RecordBatch>>>>
fn fetch_probe_batch( &mut self, cx: &mut Context<'_>, ) -> Poll<Result<StatefulStreamResult<Option<RecordBatch>>>>
Fetches next batch from probe-side
If non-empty batch has been fetched, updates state to ProcessProbeBatchState,
otherwise updates state to ExhaustedProbeSide
Sourcefn process_probe_batch(
&mut self,
) -> Result<StatefulStreamResult<Option<RecordBatch>>>
fn process_probe_batch( &mut self, ) -> Result<StatefulStreamResult<Option<RecordBatch>>>
Joins current probe batch with build-side data and produces batch with matched output
Updates state to FetchProbeBatch
Sourcefn process_unmatched_build_batch(
&mut self,
) -> Result<StatefulStreamResult<Option<RecordBatch>>>
fn process_unmatched_build_batch( &mut self, ) -> Result<StatefulStreamResult<Option<RecordBatch>>>
Processes unmatched build-side rows for certain join types and produces output batch
Updates state to Completed
Trait Implementations§
Source§impl RecordBatchStream for HashJoinStream
impl RecordBatchStream for HashJoinStream
Source§impl Stream for HashJoinStream
impl Stream for HashJoinStream
Source§type Item = Result<RecordBatch, DataFusionError>
type Item = Result<RecordBatch, DataFusionError>
Auto Trait Implementations§
impl Freeze for HashJoinStream
impl !RefUnwindSafe for HashJoinStream
impl Send for HashJoinStream
impl !Sync for HashJoinStream
impl Unpin for HashJoinStream
impl !UnwindSafe for HashJoinStream
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
§impl<T> Instrument for T
impl<T> Instrument for T
§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more§impl<T> PolicyExt for Twhere
T: ?Sized,
impl<T> PolicyExt for Twhere
T: ?Sized,
§impl<T> StreamExt for Twhere
T: Stream + ?Sized,
impl<T> StreamExt for Twhere
T: Stream + ?Sized,
§fn next(&mut self) -> Next<'_, Self>where
Self: Unpin,
fn next(&mut self) -> Next<'_, Self>where
Self: Unpin,
§fn into_future(self) -> StreamFuture<Self>
fn into_future(self) -> StreamFuture<Self>
§fn map<T, F>(self, f: F) -> Map<Self, F>
fn map<T, F>(self, f: F) -> Map<Self, F>
§fn enumerate(self) -> Enumerate<Self>where
Self: Sized,
fn enumerate(self) -> Enumerate<Self>where
Self: Sized,
§fn filter<Fut, F>(self, f: F) -> Filter<Self, Fut, F>
fn filter<Fut, F>(self, f: F) -> Filter<Self, Fut, F>
§fn filter_map<Fut, T, F>(self, f: F) -> FilterMap<Self, Fut, F>
fn filter_map<Fut, T, F>(self, f: F) -> FilterMap<Self, Fut, F>
§fn then<Fut, F>(self, f: F) -> Then<Self, Fut, F>
fn then<Fut, F>(self, f: F) -> Then<Self, Fut, F>
§fn collect<C>(self) -> Collect<Self, C>
fn collect<C>(self) -> Collect<Self, C>
§fn unzip<A, B, FromA, FromB>(self) -> Unzip<Self, FromA, FromB>
fn unzip<A, B, FromA, FromB>(self) -> Unzip<Self, FromA, FromB>
§fn concat(self) -> Concat<Self>
fn concat(self) -> Concat<Self>
§fn count(self) -> Count<Self>where
Self: Sized,
fn count(self) -> Count<Self>where
Self: Sized,
§fn fold<T, Fut, F>(self, init: T, f: F) -> Fold<Self, Fut, T, F>
fn fold<T, Fut, F>(self, init: T, f: F) -> Fold<Self, Fut, T, F>
§fn any<Fut, F>(self, f: F) -> Any<Self, Fut, F>
fn any<Fut, F>(self, f: F) -> Any<Self, Fut, F>
true if any element in stream satisfied a predicate. Read more§fn all<Fut, F>(self, f: F) -> All<Self, Fut, F>
fn all<Fut, F>(self, f: F) -> All<Self, Fut, F>
true if all element in stream satisfied a predicate. Read more§fn flatten(self) -> Flatten<Self>where
Self::Item: Stream,
Self: Sized,
fn flatten(self) -> Flatten<Self>where
Self::Item: Stream,
Self: Sized,
§fn flatten_unordered(
self,
limit: impl Into<Option<usize>>,
) -> FlattenUnorderedWithFlowController<Self, ()>
fn flatten_unordered( self, limit: impl Into<Option<usize>>, ) -> FlattenUnorderedWithFlowController<Self, ()>
§fn flat_map_unordered<U, F>(
self,
limit: impl Into<Option<usize>>,
f: F,
) -> FlatMapUnordered<Self, U, F>
fn flat_map_unordered<U, F>( self, limit: impl Into<Option<usize>>, f: F, ) -> FlatMapUnordered<Self, U, F>
StreamExt::map] but flattens nested Streams
and polls them concurrently, yielding items in any order, as they made
available. Read more§fn scan<S, B, Fut, F>(self, initial_state: S, f: F) -> Scan<Self, S, Fut, F>
fn scan<S, B, Fut, F>(self, initial_state: S, f: F) -> Scan<Self, S, Fut, F>
StreamExt::fold] that holds internal state
and produces a new stream. Read more§fn skip_while<Fut, F>(self, f: F) -> SkipWhile<Self, Fut, F>
fn skip_while<Fut, F>(self, f: F) -> SkipWhile<Self, Fut, F>
true. Read more§fn take_while<Fut, F>(self, f: F) -> TakeWhile<Self, Fut, F>
fn take_while<Fut, F>(self, f: F) -> TakeWhile<Self, Fut, F>
true. Read more§fn take_until<Fut>(self, fut: Fut) -> TakeUntil<Self, Fut>
fn take_until<Fut>(self, fut: Fut) -> TakeUntil<Self, Fut>
§fn for_each<Fut, F>(self, f: F) -> ForEach<Self, Fut, F>
fn for_each<Fut, F>(self, f: F) -> ForEach<Self, Fut, F>
§fn for_each_concurrent<Fut, F>(
self,
limit: impl Into<Option<usize>>,
f: F,
) -> ForEachConcurrent<Self, Fut, F>
fn for_each_concurrent<Fut, F>( self, limit: impl Into<Option<usize>>, f: F, ) -> ForEachConcurrent<Self, Fut, F>
§fn take(self, n: usize) -> Take<Self>where
Self: Sized,
fn take(self, n: usize) -> Take<Self>where
Self: Sized,
n items of the underlying stream. Read more§fn skip(self, n: usize) -> Skip<Self>where
Self: Sized,
fn skip(self, n: usize) -> Skip<Self>where
Self: Sized,
n items of the underlying stream. Read more§fn catch_unwind(self) -> CatchUnwind<Self>where
Self: Sized + UnwindSafe,
fn catch_unwind(self) -> CatchUnwind<Self>where
Self: Sized + UnwindSafe,
§fn boxed<'a>(self) -> Pin<Box<dyn Stream<Item = Self::Item> + Send + 'a>>
fn boxed<'a>(self) -> Pin<Box<dyn Stream<Item = Self::Item> + Send + 'a>>
§fn boxed_local<'a>(self) -> Pin<Box<dyn Stream<Item = Self::Item> + 'a>>where
Self: Sized + 'a,
fn boxed_local<'a>(self) -> Pin<Box<dyn Stream<Item = Self::Item> + 'a>>where
Self: Sized + 'a,
§fn buffered(self, n: usize) -> Buffered<Self>
fn buffered(self, n: usize) -> Buffered<Self>
§fn buffer_unordered(self, n: usize) -> BufferUnordered<Self>
fn buffer_unordered(self, n: usize) -> BufferUnordered<Self>
§fn zip<St>(self, other: St) -> Zip<Self, St>where
St: Stream,
Self: Sized,
fn zip<St>(self, other: St) -> Zip<Self, St>where
St: Stream,
Self: Sized,
§fn chain<St>(self, other: St) -> Chain<Self, St>where
St: Stream<Item = Self::Item>,
Self: Sized,
fn chain<St>(self, other: St) -> Chain<Self, St>where
St: Stream<Item = Self::Item>,
Self: Sized,
§fn peekable(self) -> Peekable<Self>where
Self: Sized,
fn peekable(self) -> Peekable<Self>where
Self: Sized,
peek method. Read more§fn chunks(self, capacity: usize) -> Chunks<Self>where
Self: Sized,
fn chunks(self, capacity: usize) -> Chunks<Self>where
Self: Sized,
§fn ready_chunks(self, capacity: usize) -> ReadyChunks<Self>where
Self: Sized,
fn ready_chunks(self, capacity: usize) -> ReadyChunks<Self>where
Self: Sized,
§fn forward<S>(self, sink: S) -> Forward<Self, S>where
S: Sink<Self::Ok, Error = Self::Error>,
Self: Sized + TryStream,
fn forward<S>(self, sink: S) -> Forward<Self, S>where
S: Sink<Self::Ok, Error = Self::Error>,
Self: Sized + TryStream,
§fn inspect<F>(self, f: F) -> Inspect<Self, F>
fn inspect<F>(self, f: F) -> Inspect<Self, F>
§fn left_stream<B>(self) -> Either<Self, B>where
B: Stream<Item = Self::Item>,
Self: Sized,
fn left_stream<B>(self) -> Either<Self, B>where
B: Stream<Item = Self::Item>,
Self: Sized,
§fn right_stream<B>(self) -> Either<B, Self>where
B: Stream<Item = Self::Item>,
Self: Sized,
fn right_stream<B>(self) -> Either<B, Self>where
B: Stream<Item = Self::Item>,
Self: Sized,
§fn poll_next_unpin(&mut self, cx: &mut Context<'_>) -> Poll<Option<Self::Item>>where
Self: Unpin,
fn poll_next_unpin(&mut self, cx: &mut Context<'_>) -> Poll<Option<Self::Item>>where
Self: Unpin,
Stream::poll_next] on Unpin
stream types.§fn select_next_some(&mut self) -> SelectNextSome<'_, Self>where
Self: Unpin + FusedStream,
fn select_next_some(&mut self) -> SelectNextSome<'_, Self>where
Self: Unpin + FusedStream,
§impl<S, T, E> TryStream for S
impl<S, T, E> TryStream for S
§impl<S> TryStreamExt for Swhere
S: TryStream + ?Sized,
impl<S> TryStreamExt for Swhere
S: TryStream + ?Sized,
§fn err_into<E>(self) -> ErrInto<Self, E>
fn err_into<E>(self) -> ErrInto<Self, E>
§fn map_ok<T, F>(self, f: F) -> MapOk<Self, F>
fn map_ok<T, F>(self, f: F) -> MapOk<Self, F>
§fn map_err<E, F>(self, f: F) -> MapErr<Self, F>
fn map_err<E, F>(self, f: F) -> MapErr<Self, F>
§fn and_then<Fut, F>(self, f: F) -> AndThen<Self, Fut, F>
fn and_then<Fut, F>(self, f: F) -> AndThen<Self, Fut, F>
f. Read more§fn or_else<Fut, F>(self, f: F) -> OrElse<Self, Fut, F>
fn or_else<Fut, F>(self, f: F) -> OrElse<Self, Fut, F>
f. Read more§fn inspect_ok<F>(self, f: F) -> InspectOk<Self, F>
fn inspect_ok<F>(self, f: F) -> InspectOk<Self, F>
§fn inspect_err<F>(self, f: F) -> InspectErr<Self, F>
fn inspect_err<F>(self, f: F) -> InspectErr<Self, F>
§fn into_stream(self) -> IntoStream<Self>where
Self: Sized,
fn into_stream(self) -> IntoStream<Self>where
Self: Sized,
§fn try_next(&mut self) -> TryNext<'_, Self>where
Self: Unpin,
fn try_next(&mut self) -> TryNext<'_, Self>where
Self: Unpin,
§fn try_for_each<Fut, F>(self, f: F) -> TryForEach<Self, Fut, F>
fn try_for_each<Fut, F>(self, f: F) -> TryForEach<Self, Fut, F>
§fn try_skip_while<Fut, F>(self, f: F) -> TrySkipWhile<Self, Fut, F>
fn try_skip_while<Fut, F>(self, f: F) -> TrySkipWhile<Self, Fut, F>
true. Read more§fn try_take_while<Fut, F>(self, f: F) -> TryTakeWhile<Self, Fut, F>
fn try_take_while<Fut, F>(self, f: F) -> TryTakeWhile<Self, Fut, F>
true. Read more§fn try_for_each_concurrent<Fut, F>(
self,
limit: impl Into<Option<usize>>,
f: F,
) -> TryForEachConcurrent<Self, Fut, F>
fn try_for_each_concurrent<Fut, F>( self, limit: impl Into<Option<usize>>, f: F, ) -> TryForEachConcurrent<Self, Fut, F>
§fn try_collect<C>(self) -> TryCollect<Self, C>
fn try_collect<C>(self) -> TryCollect<Self, C>
§fn try_chunks(self, capacity: usize) -> TryChunks<Self>where
Self: Sized,
fn try_chunks(self, capacity: usize) -> TryChunks<Self>where
Self: Sized,
§fn try_ready_chunks(self, capacity: usize) -> TryReadyChunks<Self>where
Self: Sized,
fn try_ready_chunks(self, capacity: usize) -> TryReadyChunks<Self>where
Self: Sized,
§fn try_filter<Fut, F>(self, f: F) -> TryFilter<Self, Fut, F>
fn try_filter<Fut, F>(self, f: F) -> TryFilter<Self, Fut, F>
§fn try_filter_map<Fut, F, T>(self, f: F) -> TryFilterMap<Self, Fut, F>
fn try_filter_map<Fut, F, T>(self, f: F) -> TryFilterMap<Self, Fut, F>
§fn try_flatten_unordered(
self,
limit: impl Into<Option<usize>>,
) -> TryFlattenUnordered<Self>
fn try_flatten_unordered( self, limit: impl Into<Option<usize>>, ) -> TryFlattenUnordered<Self>
§fn try_flatten(self) -> TryFlatten<Self>
fn try_flatten(self) -> TryFlatten<Self>
§fn try_fold<T, Fut, F>(self, init: T, f: F) -> TryFold<Self, Fut, T, F>
fn try_fold<T, Fut, F>(self, init: T, f: F) -> TryFold<Self, Fut, T, F>
§fn try_concat(self) -> TryConcat<Self>
fn try_concat(self) -> TryConcat<Self>
§fn try_buffer_unordered(self, n: usize) -> TryBufferUnordered<Self>where
Self::Ok: TryFuture<Error = Self::Error>,
Self: Sized,
fn try_buffer_unordered(self, n: usize) -> TryBufferUnordered<Self>where
Self::Ok: TryFuture<Error = Self::Error>,
Self: Sized,
§fn try_buffered(self, n: usize) -> TryBuffered<Self>where
Self::Ok: TryFuture<Error = Self::Error>,
Self: Sized,
fn try_buffered(self, n: usize) -> TryBuffered<Self>where
Self::Ok: TryFuture<Error = Self::Error>,
Self: Sized,
§fn try_poll_next_unpin(
&mut self,
cx: &mut Context<'_>,
) -> Poll<Option<Result<Self::Ok, Self::Error>>>where
Self: Unpin,
fn try_poll_next_unpin(
&mut self,
cx: &mut Context<'_>,
) -> Poll<Option<Result<Self::Ok, Self::Error>>>where
Self: Unpin,
TryStream::try_poll_next] on Unpin
stream types.§fn into_async_read(self) -> IntoAsyncRead<Self>
fn into_async_read(self) -> IntoAsyncRead<Self>
AsyncBufRead. Read more§fn try_all<Fut, F>(self, f: F) -> TryAll<Self, Fut, F>
fn try_all<Fut, F>(self, f: F) -> TryAll<Self, Fut, F>
Err is encountered or if an Ok item is found
that does not satisfy the predicate. Read more