SortPreservingMergeExec

Struct SortPreservingMergeExec 

Source
pub struct SortPreservingMergeExec {
    input: Arc<dyn ExecutionPlan>,
    expr: LexOrdering,
    metrics: ExecutionPlanMetricsSet,
    fetch: Option<usize>,
    cache: PlanProperties,
    enable_round_robin_repartition: bool,
}
Expand description

Sort preserving merge execution plan

§Overview

This operator implements a K-way merge. It is used to merge multiple sorted streams into a single sorted stream and is highly optimized.

§Inputs:

  1. A list of sort expressions
  2. An input plan, where each partition is sorted with respect to these sort expressions.

§Output:

  1. A single partition that is also sorted with respect to the expressions

§Diagram

┌─────────────────────────┐
│ ┌───┬───┬───┬───┐       │
│ │ A │ B │ C │ D │ ...   │──┐
│ └───┴───┴───┴───┘       │  │
└─────────────────────────┘  │  ┌───────────────────┐    ┌───────────────────────────────┐
  Stream 1                   │  │                   │    │ ┌───┬───╦═══╦───┬───╦═══╗     │
                             ├─▶│SortPreservingMerge│───▶│ │ A │ B ║ B ║ C │ D ║ E ║ ... │
                             │  │                   │    │ └───┴─▲─╩═══╩───┴───╩═══╝     │
┌─────────────────────────┐  │  └───────────────────┘    └─┬─────┴───────────────────────┘
│ ╔═══╦═══╗               │  │
│ ║ B ║ E ║     ...       │──┘                             │
│ ╚═══╩═══╝               │              Stable sort if `enable_round_robin_repartition=false`:
└─────────────────────────┘              the merged stream places equal rows from stream 1
  Stream 2


 Input Partitions                                          Output Partition
   (sorted)                                                  (sorted)

§Error Handling

If any of the input partitions return an error, the error is propagated to the output and inputs are not polled again.

Fields§

§input: Arc<dyn ExecutionPlan>§expr: LexOrdering§metrics: ExecutionPlanMetricsSet§fetch: Option<usize>§cache: PlanProperties§enable_round_robin_repartition: bool

Implementations§

Source§

impl SortPreservingMergeExec

Source

pub fn new( expr: LexOrdering, input: Arc<dyn ExecutionPlan>, ) -> SortPreservingMergeExec

Create a new sort execution plan

Source

pub fn with_fetch(self, fetch: Option<usize>) -> SortPreservingMergeExec

Sets the number of rows to fetch

Source

pub fn with_round_robin_repartition( self, enable_round_robin_repartition: bool, ) -> SortPreservingMergeExec

Sets the selection strategy of tied winners of the loser tree algorithm

If true (the default) equal output rows are placed in the merged stream in round robin fashion. This approach consumes input streams at more even rates when there are many rows with the same sort key.

If false, equal output rows are always placed in the merged stream in the order of the inputs, resulting in potentially slower execution but a stable output order.

Source

pub fn input(&self) -> &Arc<dyn ExecutionPlan>

Input schema

Source

pub fn expr(&self) -> &LexOrdering

Sort expressions

Source

pub fn fetch(&self) -> Option<usize>

Fetch

Trait Implementations§

Source§

impl Clone for SortPreservingMergeExec

Source§

fn clone(&self) -> SortPreservingMergeExec

Returns a duplicate of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for SortPreservingMergeExec

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result<(), Error>

Formats the value using the given formatter. Read more
Source§

impl DisplayAs for SortPreservingMergeExec

Source§

fn fmt_as( &self, t: DisplayFormatType, f: &mut Formatter<'_>, ) -> Result<(), Error>

Format according to DisplayFormatType, used when verbose representation looks different from the default one Read more
Source§

impl ExecutionPlan for SortPreservingMergeExec

Source§

fn as_any(&self) -> &(dyn Any + 'static)

Return a reference to Any that can be used for downcasting

Source§

fn with_fetch(&self, limit: Option<usize>) -> Option<Arc<dyn ExecutionPlan>>

Sets the number of rows to fetch

Source§

fn try_swapping_with_projection( &self, projection: &ProjectionExec, ) -> Result<Option<Arc<dyn ExecutionPlan>>, DataFusionError>

Tries to swap the projection with its input SortPreservingMergeExec. If this is possible, it returns the new SortPreservingMergeExec whose child is a projection. Otherwise, it returns None.

Source§

fn name(&self) -> &'static str

Short name for the ExecutionPlan, such as ‘DataSourceExec’. Read more
Source§

fn properties(&self) -> &PlanProperties

Return properties of the output of the ExecutionPlan, such as output ordering(s), partitioning information etc. Read more
Source§

fn fetch(&self) -> Option<usize>

Gets the fetch count for the operator, None means there is no fetch.
Source§

fn required_input_distribution(&self) -> Vec<Distribution>

Specifies the data distribution requirements for all the children for this ExecutionPlan, By default it’s [Distribution::UnspecifiedDistribution] for each child,
Source§

fn benefits_from_input_partitioning(&self) -> Vec<bool>

Specifies whether the ExecutionPlan benefits from increased parallelization at its input for each child. Read more
Source§

fn required_input_ordering(&self) -> Vec<Option<OrderingRequirements>>

Specifies the ordering required for all of the children of this ExecutionPlan. Read more
Source§

fn maintains_input_order(&self) -> Vec<bool>

Returns false if this ExecutionPlan’s implementation may reorder rows within or between partitions. Read more
Source§

fn children(&self) -> Vec<&Arc<dyn ExecutionPlan>>

Get a list of children ExecutionPlans that act as inputs to this plan. The returned list will be empty for leaf nodes such as scans, will contain a single value for unary nodes, or two values for binary nodes (such as joins).
Source§

fn with_new_children( self: Arc<SortPreservingMergeExec>, children: Vec<Arc<dyn ExecutionPlan>>, ) -> Result<Arc<dyn ExecutionPlan>, DataFusionError>

Returns a new ExecutionPlan where all existing children were replaced by the children, in order
Source§

fn execute( &self, partition: usize, context: Arc<TaskContext>, ) -> Result<Pin<Box<dyn RecordBatchStream<Item = Result<RecordBatch, DataFusionError>> + Send>>, DataFusionError>

Begin execution of partition, returning a Stream of RecordBatches. Read more
Source§

fn metrics(&self) -> Option<MetricsSet>

Return a snapshot of the set of Metrics for this ExecutionPlan. If no Metrics are available, return None. Read more
Source§

fn statistics(&self) -> Result<Statistics, DataFusionError>

👎Deprecated since 48.0.0: Use partition_statistics method instead
Returns statistics for this ExecutionPlan node. If statistics are not available, should return Statistics::new_unknown (the default), not an error. Read more
Source§

fn partition_statistics( &self, _partition: Option<usize>, ) -> Result<Statistics, DataFusionError>

Returns statistics for a specific partition of this ExecutionPlan node. If statistics are not available, should return Statistics::new_unknown (the default), not an error. If partition is None, it returns statistics for the entire plan.
Source§

fn supports_limit_pushdown(&self) -> bool

Returns true if a limit can be safely pushed down through this ExecutionPlan node. Read more
Source§

fn static_name() -> &'static str
where Self: Sized,

Short name for the ExecutionPlan, such as ‘DataSourceExec’. Like name but can be called without an instance.
Source§

fn schema(&self) -> Arc<Schema>

Get the schema for this execution plan
Source§

fn check_invariants(&self, check: InvariantLevel) -> Result<(), DataFusionError>

Returns an error if this individual node does not conform to its invariants. These invariants are typically only checked in debug mode. Read more
Source§

fn reset_state( self: Arc<Self>, ) -> Result<Arc<dyn ExecutionPlan>, DataFusionError>

Reset any internal state within this ExecutionPlan. Read more
Source§

fn repartitioned( &self, _target_partitions: usize, _config: &ConfigOptions, ) -> Result<Option<Arc<dyn ExecutionPlan>>, DataFusionError>

If supported, attempt to increase the partitioning of this ExecutionPlan to produce target_partitions partitions. Read more
Source§

fn cardinality_effect(&self) -> CardinalityEffect

Gets the effect on cardinality, if known
Source§

fn gather_filters_for_pushdown( &self, _phase: FilterPushdownPhase, parent_filters: Vec<Arc<dyn PhysicalExpr>>, _config: &ConfigOptions, ) -> Result<FilterDescription, DataFusionError>

Collect filters that this node can push down to its children. Filters that are being pushed down from parents are passed in, and the node may generate additional filters to push down. For example, given the plan FilterExec -> HashJoinExec -> DataSourceExec, what will happen is that we recurse down the plan calling ExecutionPlan::gather_filters_for_pushdown: Read more
Source§

fn handle_child_pushdown_result( &self, _phase: FilterPushdownPhase, child_pushdown_result: ChildPushdownResult, _config: &ConfigOptions, ) -> Result<FilterPushdownPropagation<Arc<dyn ExecutionPlan>>, DataFusionError>

Handle the result of a child pushdown. This method is called as we recurse back up the plan tree after pushing filters down to child nodes via ExecutionPlan::gather_filters_for_pushdown. It allows the current node to process the results of filter pushdown from its children, deciding whether to absorb filters, modify the plan, or pass filters back up to its parent. Read more
Source§

fn with_new_state( &self, _state: Arc<dyn Any + Sync + Send>, ) -> Option<Arc<dyn ExecutionPlan>>

Injects arbitrary run-time state into this execution plan, returning a new plan instance that incorporates that state if it is relevant to the concrete node implementation. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

§

impl<T> Instrument for T

§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided [Span], returning an Instrumented wrapper. Read more
§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
§

impl<T> PolicyExt for T
where T: ?Sized,

§

fn and<P, B, E>(self, other: P) -> And<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns [Action::Follow] only if self and other return Action::Follow. Read more
§

fn or<P, B, E>(self, other: P) -> Or<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns [Action::Follow] if either self or other returns Action::Follow. Read more
Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

§

fn vzip(self) -> V

§

impl<T> WithSubscriber for T

§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a [WithDispatch] wrapper. Read more
§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a [WithDispatch] wrapper. Read more
§

impl<T> ErasedDestructor for T
where T: 'static,