struct TopKHeap {
k: usize,
batch_size: usize,
inner: BinaryHeap<TopKRow>,
store: RecordBatchStore,
owned_bytes: usize,
}Expand description
This structure keeps at most the smallest k items, using the
[arrow::row] format for sort keys. While it is called “topK” for
values like 1, 2, 3, 4, 5 the “top 3” really means the
smallest 3 , 1, 2, 3, not the largest 3 3, 4, 5.
Using the Row format handles things such as ascending vs
descending and nulls first vs nulls last.
Fields§
§k: usizeThe maximum number of elements to store in this heap.
batch_size: usizeThe target number of rows for output batches
inner: BinaryHeap<TopKRow>Storage for up at most k items using a BinaryHeap. Reversed
so that the smallest k so far is on the top
store: RecordBatchStoreStorage the original row values (TopKRow only has the sort key)
owned_bytes: usizeThe size of all owned data held by this heap
Implementations§
Source§impl TopKHeap
impl TopKHeap
fn new(k: usize, batch_size: usize) -> Self
Sourcepub fn register_batch(&mut self, batch: RecordBatch) -> RecordBatchEntry
pub fn register_batch(&mut self, batch: RecordBatch) -> RecordBatchEntry
Register a [RecordBatch] with the heap, returning the
appropriate entry
Sourcepub fn insert_batch_entry(&mut self, entry: RecordBatchEntry)
pub fn insert_batch_entry(&mut self, entry: RecordBatchEntry)
Insert a RecordBatchEntry created by a previous call to
Self::register_batch into storage.
Sourcefn max(&self) -> Option<&TopKRow>
fn max(&self) -> Option<&TopKRow>
Returns the largest value stored by the heap if there are k items, otherwise returns None. Remember this structure is keeping the “smallest” k values
Sourcefn add(
&mut self,
batch_entry: &mut RecordBatchEntry,
row: impl AsRef<[u8]>,
index: usize,
)
fn add( &mut self, batch_entry: &mut RecordBatchEntry, row: impl AsRef<[u8]>, index: usize, )
Adds row to this heap. If inserting this new item would
increase the size past k, removes the previously smallest
item.
Sourcepub fn emit(&mut self) -> Result<Option<RecordBatch>>
pub fn emit(&mut self) -> Result<Option<RecordBatch>>
Returns the values stored in this heap, from values low to
high, as a single [RecordBatch], resetting the inner heap
Sourcepub fn emit_with_state(&mut self) -> Result<(Option<RecordBatch>, Vec<TopKRow>)>
pub fn emit_with_state(&mut self) -> Result<(Option<RecordBatch>, Vec<TopKRow>)>
Returns the values stored in this heap, from values low to
high, as a single [RecordBatch], and a sorted vec of the
current heap’s contents
Sourcepub fn maybe_compact(&mut self) -> Result<()>
pub fn maybe_compact(&mut self) -> Result<()>
Compact this heap, rewriting all stored batches into a single input batch
fn get_threshold_values( &self, sort_exprs: &[PhysicalSortExpr], ) -> Result<Option<Vec<ScalarValue>>>
Auto Trait Implementations§
impl Freeze for TopKHeap
impl !RefUnwindSafe for TopKHeap
impl Send for TopKHeap
impl Sync for TopKHeap
impl Unpin for TopKHeap
impl !UnwindSafe for TopKHeap
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
§impl<T> Instrument for T
impl<T> Instrument for T
§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more