struct MinMaxBytesState {
min_max: Vec<Option<Vec<u8>>>,
data_type: DataType,
total_data_bytes: usize,
}Expand description
Stores internal Min/Max state for “bytes” types.
This implementation is general and stores the minimum/maximum for each groups in an individual byte array, which balances allocations and memory fragmentation (aka garbage).
┌─────────────────────────────────┐
┌─────┐ ┌────▶│Option<Vec<u8>> (["A"]) │───────────▶ "A"
│ 0 │────┘ └─────────────────────────────────┘
├─────┤ ┌─────────────────────────────────┐
│ 1 │─────────▶│Option<Vec<u8>> (["Z"]) │───────────▶ "Z"
└─────┘ └─────────────────────────────────┘ ...
... ...
┌─────┐ ┌────────────────────────────────┐
│ N-2 │─────────▶│Option<Vec<u8>> (["A"]) │────────────▶ "A"
├─────┤ └────────────────────────────────┘
│ N-1 │────┐ ┌────────────────────────────────┐
└─────┘ └────▶│Option<Vec<u8>> (["Q"]) │────────────▶ "Q"
└────────────────────────────────┘
min_max: Vec<Option<Vec<u8>>Note that for StringViewArray and BinaryViewArray, there are potentially
more efficient implementations (e.g. by managing a string data buffer
directly), but then garbage collection, memory management, and final array
construction becomes more complex.
See discussion on https://github.com/apache/datafusion/issues/6906
Fields§
§min_max: Vec<Option<Vec<u8>>>The minimum/maximum value for each group
data_type: DataTypeThe data type of the array
total_data_bytes: usizeThe total bytes of the string data (for pre-allocating the final array, and tracking memory usage)
Implementations§
Source§impl MinMaxBytesState
Implement the MinMaxBytesAccumulator with a comparison function
for comparing strings
impl MinMaxBytesState
Implement the MinMaxBytesAccumulator with a comparison function for comparing strings
Sourcefn new(data_type: DataType) -> Self
fn new(data_type: DataType) -> Self
Create a new MinMaxBytesAccumulator
§Arguments:
data_type: The data type of the arrays that will be passed to this accumulator
Sourcefn set_value(&mut self, group_index: usize, new_val: &[u8])
fn set_value(&mut self, group_index: usize, new_val: &[u8])
Set the specified group to the given value, updating memory usage appropriately
Sourcefn update_batch<'a, F, I>(
&mut self,
iter: I,
group_indices: &[usize],
total_num_groups: usize,
cmp: F,
) -> Result<()>
fn update_batch<'a, F, I>( &mut self, iter: I, group_indices: &[usize], total_num_groups: usize, cmp: F, ) -> Result<()>
Updates the min/max values for the given string values
cmp is the comparison function to use, called like cmp(new_val, existing_val)
returns true if the new_val should replace existing_val
Sourcefn emit_to(&mut self, emit_to: EmitTo) -> (usize, Vec<Option<Vec<u8>>>)
fn emit_to(&mut self, emit_to: EmitTo) -> (usize, Vec<Option<Vec<u8>>>)
Emits the specified min_max values
Returns (data_capacity, min_maxes), updating the current value of total_data_bytes
data_capacity: the total length of all strings and their contents,min_maxes: the actual min/max values for each group
fn size(&self) -> usize
Trait Implementations§
Auto Trait Implementations§
impl Freeze for MinMaxBytesState
impl RefUnwindSafe for MinMaxBytesState
impl Send for MinMaxBytesState
impl Sync for MinMaxBytesState
impl Unpin for MinMaxBytesState
impl UnwindSafe for MinMaxBytesState
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
§impl<T> Instrument for T
impl<T> Instrument for T
§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more