collect_left_input

Function collect_left_input 

Source
async fn collect_left_input(
    random_state: RandomState,
    left_stream: SendableRecordBatchStream,
    on_left: Vec<PhysicalExprRef>,
    metrics: BuildProbeJoinMetrics,
    reservation: MemoryReservation,
    with_visited_indices_bitmap: bool,
    probe_threads_count: usize,
    should_compute_bounds: bool,
) -> Result<JoinLeftData>
Expand description

Collects all batches from the left (build) side stream and creates a hash map for joining.

This function is responsible for:

  1. Consuming the entire left stream and collecting all batches into memory
  2. Building a hash map from the join key columns for efficient probe operations
  3. Computing bounds for dynamic filter pushdown (if enabled)
  4. Preparing visited indices bitmap for certain join types

§Parameters

  • random_state - Random state for consistent hashing across partitions
  • left_stream - Stream of record batches from the build side
  • on_left - Physical expressions for the left side join keys
  • metrics - Metrics collector for tracking memory usage and row counts
  • reservation - Memory reservation tracker for the hash table and data
  • with_visited_indices_bitmap - Whether to track visited indices (for outer joins)
  • probe_threads_count - Number of threads that will probe this hash table
  • should_compute_bounds - Whether to compute min/max bounds for dynamic filtering

§Dynamic Filter Coordination

When should_compute_bounds is true, this function computes the min/max bounds for each join key column but does NOT update the dynamic filter. Instead, the bounds are stored in the returned JoinLeftData and later coordinated by SharedBoundsAccumulator to ensure all partitions contribute their bounds before updating the filter exactly once.

§Returns

JoinLeftData containing the hash map, consolidated batch, join key values, visited indices bitmap, and computed bounds (if requested).