Module utils

Module utils 

Source
Expand description

Join related functionality used both on logical and physical plans

Re-exportsΒ§

pub use crate::joins::JoinOn;
pub use crate::joins::JoinOnRef;

StructsΒ§

BatchSplitter πŸ”’
Splits large batches into smaller batches with a maximum number of rows.
BuildProbeJoinMetrics πŸ”’
Metrics for build & probe joins
ColumnIndex
Information about the index and placement (left or right) of the columns
JoinFilter
Filter applied before join output. Fields are crate-public to allow downstream implementations to experiment with custom joins.
NoopBatchTransformer πŸ”’
A batch transformer that does nothing.
OnceAsync πŸ”’
A OnceAsync runs an async closure once, where multiple calls to OnceAsync::try_once return a OnceFut that resolves to the result of the same computation.
OnceFut πŸ”’
A OnceFut represents a shared asynchronous computation, that will be evaluated once for all Clone’s, with OnceFut::get providing a non-consuming interface to drive the underlying Future to completion
PartialJoinStatistics πŸ”’
A shared state between statistic aggregators for a join operation.

EnumsΒ§

OnceFutState πŸ”’
StatefulStreamResult
Represents the result of a stateful operation.

TraitsΒ§

BatchTransformer πŸ”’
Trait for incrementally generating Join output.
JoinHashMapType
Maps a u64 hash value based on the build side [β€œon” values] to a list of indices with this key’s value.

FunctionsΒ§

adjust_indices_by_join_type πŸ”’
The input is the matched indices for left and right and adjust the indices according to the join type
adjust_right_output_partitioning
Adjust the right out partitioning to new Column Index
append_probe_indices_in_order πŸ”’
Appends probe indices in order by considering the given build indices.
append_right_indices πŸ”’
Appends right indices to left indices based on the specified order mode.
apply_join_filter_to_indices πŸ”’
asymmetric_join_output_partitioning πŸ”’
build_batch_empty_build_side πŸ”’
Returns a new [RecordBatch] resulting of a join where the build/left side is empty. The resulting batch has [Schema] schema.
build_batch_from_indices πŸ”’
Returns a new [RecordBatch] by combining the left and right according to indices. The resulting batch has [Schema] schema.
build_join_schema
Creates a schema for a join operation. The fields from the left side are first
build_range_bitmap πŸ”’
calculate_join_output_ordering
Calculate the output ordering of a given join operation.
check_join_is_valid
Checks whether the schemas β€œleft” and β€œright” and columns β€œon” represent a valid join. They are valid whenever their columns’ intersection equals the set on
check_join_set_is_valid πŸ”’
Checks whether the sets left, right and on compose a valid join. They are valid whenever their intersection equals the set on
compare_join_arrays
Get comparison result of two rows of join arrays
eq_dyn_null πŸ”’
equal_rows_arr πŸ”’
estimate_disjoint_inputs πŸ”’
Estimates if inputs are non-overlapping, using input statistics. If inputs are disjoint, returns zero estimation, otherwise returns None
estimate_inner_join_cardinality πŸ”’
Estimate the inner join cardinality by using the basic building blocks of column-level statistics and the total row count. This is a very naive and a very conservative implementation that can quickly give up if there is not enough input statistics.
estimate_join_cardinality πŸ”’
estimate_join_statistics πŸ”’
Estimate the statistics for the given join’s output.
get_anti_indices πŸ”’
Returns range indices which are not present in input_indices
get_final_indices_from_bit_map πŸ”’
In the end of join execution, need to use bit map of the matched indices to generate the final left and right indices.
get_final_indices_from_shared_bitmap πŸ”’
get_mark_indices πŸ”’
get_semi_indices πŸ”’
Returns intersection of range and input_indices omitting duplicates
max_distinct_count πŸ”’
Estimate the number of maximum distinct values that can be present in the given column from its statistics. If distinct_count is available, uses it directly. Otherwise, if the column is numeric and has min/max values, it estimates the maximum distinct count from those. Otherwise, the num_rows is used.
need_produce_result_in_final πŸ”’
Some type join_type of join need to maintain the matched indices bit map for the left side, and use the bit map to generate the part of result of the join.
need_produce_right_in_final πŸ”’
Should we use a bitmap to track each incoming right batch’s each row’s β€˜joined’ status.
output_join_field πŸ”’
Returns the output field given the input field. Outer joins may insert nulls even if the input was not null
reorder_output_after_swap
When the order of the join inputs are changed, the output order of columns must remain the same.
swap_join_projection
This function swaps the given join’s projection.
swap_reverting_projection πŸ”’
When the order of the join is changed, the output order of columns must remain the same.
symmetric_join_output_partitioning πŸ”’
update_hash
Updates hash_map with new entries from batch evaluated against the expressions on using offset as a start value for batch row indices.

Type AliasesΒ§

OnceFutPending πŸ”’
The shared future type used internally within OnceAsync