Module enforce_distribution

Module enforce_distribution 

Source
Expand description

EnforceDistribution optimizer rule inspects the physical plan with respect to distribution requirements and adds RepartitionExecs to satisfy them when necessary. If increasing parallelism is beneficial (and also desirable according to the configuration), this rule increases partition counts in the physical plan.

StructsΒ§

EnforceDistribution
The EnforceDistribution rule ensures that distribution requirements are met. In doing so, this rule will increase the parallelism in the plan by introducing repartitioning operators to the physical plan.
JoinKeyPairs πŸ”’
RepartitionRequirementStatus πŸ”’
A struct to keep track of repartition requirements for each child node.

FunctionsΒ§

add_hash_on_top πŸ”’
Adds a hash repartition operator:
add_merge_on_top πŸ”’
Adds a SortPreservingMergeExec or a CoalescePartitionsExec operator on top of the given plan node to satisfy a single partition requirement while preserving ordering constraints.
add_roundrobin_on_top πŸ”’
Adds RoundRobin repartition operator to the plan increase parallelism.
adjust_input_keys_ordering
When the physical planner creates the Joins, the ordering of join keys is from the original query. That might not match with the output partitioning of the join node’s children A Top-Down process will use this method to adjust children’s output partitioning based on the parent key reordering requirements:
ensure_distribution
This function checks whether we need to add additional data exchange operators to satisfy distribution requirements. Since this function takes care of such requirements, we should avoid manually adding data exchange operators in other places.
expected_expr_positions πŸ”’
Return the expected expressions positions. For example, the current expressions are [β€˜c’, β€˜a’, β€˜a’, b’], the expected expressions are [β€˜b’, β€˜c’, β€˜a’, β€˜a’],
extract_join_keys πŸ”’
get_repartition_requirement_status πŸ”’
Calculates the RepartitionRequirementStatus for each children to generate consistent and sensible (in terms of performance) distribution requirements. As an example, a hash join’s left (build) child might produce
new_join_conditions πŸ”’
remove_dist_changing_operators πŸ”’
Updates the physical plan inside DistributionContext so that distribution changing operators are removed from the top. If they are necessary, they will be added in subsequent stages.
reorder_aggregate_keys
reorder_current_join_keys πŸ”’
Reorder the current join keys ordering based on either left partition or right partition
reorder_join_keys_to_inputs
When the physical planner creates the Joins, the ordering of join keys is from the original query. That might not match with the output partitioning of the join node’s children This method will try to change the ordering of the join keys to match with the partitioning of the join nodes’ children. If it can not match with both sides, it will try to match with one, either the left side or the right side.
reorder_partitioned_join_keys
replace_order_preserving_variants
Updates the DistributionContext if preserving ordering while changing partitioning is not helpful or desirable.
shift_right_required πŸ”’
try_reorder πŸ”’
update_children πŸ”’

Type AliasesΒ§

DistributionContext
Keeps track of distribution changing operators (like RepartitionExec, SortPreservingMergeExec, CoalescePartitionsExec) and their ancestors. Using this information, we can optimize distribution of the plan if/when necessary.
PlanWithKeyRequirements
Keeps track of parent required key orderings.