Module stream_join_utils

Module stream_join_utils 

Source
Expand description

This file contains common subroutines for symmetric hash join related functionality, used both in join calculations and optimization rules.

StructsΒ§

PruningJoinHashMap
The PruningJoinHashMap is similar to a regular JoinHashMap, but with the capability of pruning elements in an efficient manner. This structure is particularly useful for cases where it’s necessary to remove elements from the map based on their buffer order.
SortedFilterExpr
The SortedFilterExpr object represents a sorted filter expression. It contains the following information: The origin expression, the filter expression, an interval encapsulating expression bounds, and a stable index identifying the expression in the expression DAG.
StreamJoinMetrics
Metrics for HashJoinExec
StreamJoinSideMetrics

FunctionsΒ§

build_filter_input_order
This function is used to build the filter expression based on the sort order of input columns.
calculate_filter_expr_intervals
Calculate the filter expression intervals.
check_filter_expr_contains_sort_information πŸ”’
combine_two_batches
convert_filter_columns πŸ”’
Convert a physical expression into a filter expression using the given column mapping information.
convert_sort_expr_with_filter_schema
This function analyzes PhysicalSortExpr graphs with respect to output orderings (sorting) properties. This is necessary since monotonically increasing and/or decreasing expressions are required when using join filter expressions for data pruning purposes.
get_pruning_anti_indices
Get the anti join indices from the visited hash set.
get_pruning_semi_indices
This method creates a boolean buffer from the visited rows hash set and the indices of the pruned record batch slice.
map_origin_col_to_filter_col
Create a one to one mapping from main columns to filter columns using filter column indices. A column index looks like:
prepare_sorted_exprs
Prepares and sorts expressions based on a given filter, left and right schemas, and sort expressions.
record_visited_indices
Records the visited indices from the input PrimitiveArray of type T into the given hash set visited. This function will insert the indices (offset by offset) into the visited hash set.
update_filter_expr_interval
This is a subroutine of the function calculate_filter_expr_intervals. It constructs the current interval using the given batch and updates the filter expression (i.e. sorted_expr) with this interval.
update_sorted_exprs_with_node_indices πŸ”’
Updates sorted filter expressions with corresponding node indices from the expression interval graph.