Struct ExternalSorter

Source

struct ExternalSorter {Show 13 fields
    schema: SchemaRef,
    expr: LexOrdering,
    batch_size: usize,
    sort_in_place_threshold_bytes: usize,
    in_mem_batches: Vec<RecordBatch>,
    in_progress_spill_file: Option<(InProgressSpillFile, usize)>,
    finished_spill_files: Vec<SortedSpillFile>,
    metrics: ExternalSorterMetrics,
    runtime: Arc<RuntimeEnv>,
    reservation: MemoryReservation,
    spill_manager: SpillManager,
    merge_reservation: MemoryReservation,
    sort_spill_reservation_bytes: usize,
}

Expand description

Sorts an arbitrary sized, unsorted, stream of [RecordBatch]es to a total order. Depending on the input size and memory manager configuration, writes intermediate results to disk (“spills”) using Arrow IPC format.

§Algorithm

get a non-empty new batch from input
check with the memory manager there is sufficient space to buffer the batch in memory.

2.1 if memory is sufficient, buffer batch in memory, go to 1.

2.2 if no more memory is available, sort all buffered batches and spill to file. buffer the next batch in memory, go to 1.

when input is exhausted, merge all in memory batches and spills to get a total order.

§When data fits in available memory

If there is sufficient memory, data is sorted in memory to produce the output

   ┌─────┐
   │  2  │
   │  3  │
   │  1  │─ ─ ─ ─ ─ ─ ─ ─ ─ ┐
   │  4  │
   │  2  │                  │
   └─────┘                  ▼
   ┌─────┐
   │  1  │              In memory
   │  4  │─ ─ ─ ─ ─ ─▶ sort/merge  ─ ─ ─ ─ ─▶  total sorted output
   │  1  │
   └─────┘                  ▲
     ...                    │

   ┌─────┐                  │
   │  4  │
   │  3  │─ ─ ─ ─ ─ ─ ─ ─ ─ ┘
   └─────┘

in_mem_batches

§When data does not fit in available memory

When memory is exhausted, data is first sorted and written to one or more spill files on disk:

   ┌─────┐                               .─────────────────.
   │  2  │                              (                   )
   │  3  │                              │`─────────────────'│
   │  1  │─ ─ ─ ─ ─ ─ ─                 │  ┌────┐           │
   │  4  │             │                │  │ 1  │░          │
   │  2  │                              │  │... │░          │
   └─────┘             ▼                │  │ 4  │░  ┌ ─ ─   │
   ┌─────┐                              │  └────┘░    1  │░ │
   │  1  │         In memory            │   ░░░░░░  │    ░░ │
   │  4  │─ ─ ▶   sort/merge    ─ ─ ─ ─ ┼ ─ ─ ─ ─ ─▶ ... │░ │
   │  1  │     and write to file        │           │    ░░ │
   └─────┘                              │             4  │░ │
     ...               ▲                │           └░─░─░░ │
                       │                │            ░░░░░░ │
   ┌─────┐                              │.─────────────────.│
   │  4  │             │                (                   )
   │  3  │─ ─ ─ ─ ─ ─ ─                  `─────────────────'
   └─────┘

in_mem_batches                                  spills
                                        (file on disk in Arrow
                                              IPC format)

Once the input is completely read, the spill files are read and merged with any in memory batches to produce a single total sorted output:

  .─────────────────.
 (                   )
 │`─────────────────'│
 │  ┌────┐           │
 │  │ 1  │░          │
 │  │... │─ ─ ─ ─ ─ ─│─ ─ ─ ─ ─ ─
 │  │ 4  │░ ┌────┐   │           │
 │  └────┘░ │ 1  │░  │           ▼
 │   ░░░░░░ │    │░  │
 │          │... │─ ─│─ ─ ─ ▶ merge  ─ ─ ─▶  total sorted output
 │          │    │░  │
 │          │ 4  │░  │           ▲
 │          └────┘░  │           │
 │           ░░░░░░  │
 │.─────────────────.│           │
 (                   )
  `─────────────────'            │
        spills
                                 │

                                 │

    ┌─────┐                      │
    │  1  │
    │  4  │─ ─ ─ ─               │
    └─────┘       │
      ...                   In memory
                  └ ─ ─ ─▶  sort/merge
    ┌─────┐
    │  4  │                      ▲
    │  3  │─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘
    └─────┘

 in_mem_batches

Fields§

§schema: SchemaRef

Schema of the output (and the input)

§expr: LexOrdering

Sort expressions

§batch_size: usize

The target number of rows for output batches

§sort_in_place_threshold_bytes: usize

If the in size of buffered memory batches is below this size, the data will be concatenated and sorted in place rather than sort/merged.

§in_mem_batches: Vec<RecordBatch>

Unsorted input batches stored in the memory buffer

§in_progress_spill_file: Option<(InProgressSpillFile, usize)>

During external sorting, in-memory intermediate data will be appended to this file incrementally. Once finished, this file will be moved to Self::finished_spill_files.

this is a tuple of:

InProgressSpillFile - the file that is being written to
max_record_batch_memory - the maximum memory usage of a single batch in this spill file.

§finished_spill_files: Vec<SortedSpillFile>

If data has previously been spilled, the locations of the spill files (in Arrow IPC format) Within the same spill file, the data might be chunked into multiple batches, and ordered by sort keys.

§metrics: ExternalSorterMetrics

Runtime metrics

§runtime: Arc<RuntimeEnv>

A handle to the runtime to get spill files

§reservation: MemoryReservation

Reservation for in_mem_batches

§spill_manager: SpillManager§merge_reservation: MemoryReservation

Reservation for the merging of in-memory batches. If the sort might spill, sort_spill_reservation_bytes will be pre-reserved to ensure there is some space for this sort/merge.

§sort_spill_reservation_bytes: usize

How much memory to reserve for performing in-memory sort/merges prior to spilling.

Struct ExternalSorter Copy item path

§Algorithm

§When data fits in available memory

§When data does not fit in available memory

Fields§

Implementations§

impl ExternalSorter

pub fn new( partition_id: usize, schema: SchemaRef, expr: LexOrdering, batch_size: usize, sort_spill_reservation_bytes: usize, sort_in_place_threshold_bytes: usize, spill_compression: SpillCompression, metrics: &ExecutionPlanMetricsSet, runtime: Arc<RuntimeEnv>, ) -> Result<Self>

async fn insert_batch(&mut self, input: RecordBatch) -> Result<()>

fn spilled_before(&self) -> bool

async fn sort(&mut self) -> Result<SendableRecordBatchStream>

fn used(&self) -> usize

fn spilled_bytes(&self) -> usize

fn spilled_rows(&self) -> usize

fn spill_count(&self) -> usize

async fn consume_and_spill_append( &mut self, globally_sorted_batches: &mut Vec<RecordBatch>, ) -> Result<()>

async fn spill_finish(&mut self) -> Result<()>

fn organize_stringview_arrays( globally_sorted_batches: &mut Vec<RecordBatch>, ) -> Result<()>

§Rationale

§Example

async fn sort_and_spill_in_mem_batches(&mut self) -> Result<()>

fn in_mem_sort_stream( &mut self, metrics: BaselineMetrics, ) -> Result<SendableRecordBatchStream>

§Small Datasets

§Larger datasets

fn sort_batch_stream( &self, batch: RecordBatch, metrics: BaselineMetrics, reservation: MemoryReservation, split: bool, ) -> Result<SendableRecordBatchStream>

fn reserve_memory_for_merge(&mut self) -> Result<()>

async fn reserve_memory_for_batch_and_maybe_spill( &mut self, input: &RecordBatch, ) -> Result<()>

fn err_with_oom_context(e: DataFusionError) -> DataFusionError

Trait Implementations§

impl Debug for ExternalSorter

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Auto Trait Implementations§

impl Freeze for ExternalSorter

impl !RefUnwindSafe for ExternalSorter

impl Send for ExternalSorter

impl Sync for ExternalSorter

impl Unpin for ExternalSorter

impl !UnwindSafe for ExternalSorter

Blanket Implementations§

impl<T> Any for Twhere T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for Twhere T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for Twhere T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T> From<T> for T

fn from(t: T) -> T

impl<T> Instrument for T

fn instrument(self, span: Span) -> Instrumented<Self>

fn in_current_span(self) -> Instrumented<Self>

impl<T, U> Into<U> for Twhere U: From<T>,

fn into(self) -> U

impl<T> IntoEither for T

fn into_either(self, into_left: bool) -> Either<Self, Self>

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>where F: FnOnce(&Self) -> bool,

impl<T> PolicyExt for Twhere T: ?Sized,

fn and<P, B, E>(self, other: P) -> And<T, P>where T: Policy<B, E>, P: Policy<B, E>,

fn or<P, B, E>(self, other: P) -> Or<T, P>where T: Policy<B, E>, P: Policy<B, E>,

impl<T> Same for T

type Output = T

impl<T, U> TryFrom<U> for Twhere U: Into<T>,

type Error = Infallible

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

impl<T, U> TryInto<U> for Twhere U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

impl<V, T> VZip<V> for Twhere V: MultiLane<T>,

fn vzip(self) -> V

impl<T> WithSubscriber for T

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>where S: Into<Dispatch>,

fn with_current_subscriber(self) -> WithDispatch<Self>

impl<T> ErasedDestructor for Twhere T: 'static,

Struct ExternalSorter

impl<T> Any for T
where T: 'static + ?Sized,

impl<T> Borrow<T> for T
where T: ?Sized,

impl<T> BorrowMut<T> for T
where T: ?Sized,

impl<T, U> Into<U> for T
where U: From<T>,

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

impl<T> PolicyExt for T
where T: ?Sized,

fn and<P, B, E>(self, other: P) -> And<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

fn or<P, B, E>(self, other: P) -> Or<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

impl<T, U> TryFrom<U> for T
where U: Into<T>,

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

impl<T> ErasedDestructor for T
where T: 'static,