ParquetAccessPlan

Struct ParquetAccessPlan 

Source
pub struct ParquetAccessPlan {
    row_groups: Vec<RowGroupAccess>,
}
Expand description

A selection of rows and row groups within a ParquetFile to decode.

A ParquetAccessPlan is used to limit the row groups and data pages a DataSourceExec will read and decode to improve performance.

Note that page level pruning based on ArrowPredicate is applied after all of these selections

§Example

For example, given a Parquet file with 4 row groups, a ParquetAccessPlan can be used to specify skipping row group 0 and 2, scanning a range of rows in row group 1, and scanning all rows in row group 3 as follows:

// Default to scan all row groups
let mut access_plan = ParquetAccessPlan::new_all(4);
access_plan.skip(0); // skip row group
// Use parquet reader RowSelector to specify scanning rows 100-200 and 350-400
// in a row group that has 1000 rows
let row_selection = RowSelection::from(vec![
   RowSelector::skip(100),
   RowSelector::select(100),
   RowSelector::skip(150),
   RowSelector::select(50),
   RowSelector::skip(600),  // skip last 600 rows
]);
access_plan.scan_selection(1, row_selection);
access_plan.skip(2); // skip row group 2
// row group 3 is scanned by default

The resulting plan would look like:

┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐

│                   │  SKIP

└ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘
 Row Group 0
┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐
 ┌────────────────┐    SCAN ONLY ROWS
│└────────────────┘ │  100-200
 ┌────────────────┐    350-400
│└────────────────┘ │
 ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
 Row Group 1
┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐
                       SKIP
│                   │

└ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘
 Row Group 2
┌───────────────────┐
│                   │  SCAN ALL ROWS
│                   │
│                   │
└───────────────────┘
 Row Group 3

Fields§

§row_groups: Vec<RowGroupAccess>

How to access the i-th row group

Implementations§

Source§

impl ParquetAccessPlan

Source

pub fn new_all(row_group_count: usize) -> Self

Create a new ParquetAccessPlan that scans all row groups

Source

pub fn new_none(row_group_count: usize) -> Self

Create a new ParquetAccessPlan that scans no row groups

Source

pub fn new(row_groups: Vec<RowGroupAccess>) -> Self

Create a new ParquetAccessPlan from the specified RowGroupAccesses

Source

pub fn set(&mut self, idx: usize, access: RowGroupAccess)

Set the i-th row group to the specified RowGroupAccess

Source

pub fn skip(&mut self, idx: usize)

skips the i-th row group (should not be scanned)

Source

pub fn scan(&mut self, idx: usize)

scan the i-th row group

Source

pub fn should_scan(&self, idx: usize) -> bool

Return true if the i-th row group should be scanned

Source

pub fn scan_selection(&mut self, idx: usize, selection: RowSelection)

Set to scan only the [RowSelection] in the specified row group.

Behavior is different depending on the existing access

Source

pub fn into_overall_row_selection( self, row_group_meta_data: &[RowGroupMetaData], ) -> Result<Option<RowSelection>>

Return an overall RowSelection, if needed

This is used to compute the row selection for the parquet reader. See ArrowReaderBuilder::with_row_selection for more details.

Returns

The returned selection represents which rows to scan across any row row groups which are not skipped.

§Notes

If there are no RowGroupAccess::Selections, the overall row selection is None because each row group is either entirely skipped or scanned, which is covered by Self::row_group_indexes.

If there are any RowGroupAccess::Selection, an overall row selection is returned for all the rows in the row groups that are not skipped. Thus it includes a Select selection for any RowGroupAccess::Scan.

§Errors

Returns an error if any specified row selection does not specify the same number of rows as in it’s corresponding row_group_metadata.

§Example: No Selections

Given an access plan like this

  RowGroupAccess::Scan (scan all row group 0)
  RowGroupAccess::Skip (skip row group 1)
  RowGroupAccess::Scan (scan all row group 2)
  RowGroupAccess::Scan (scan all row group 3)

The overall row selection would be None because there are no RowGroupAccess::Selections. The row group indexes returned by Self::row_group_indexes would be 0, 2, 3 .

§Example: With Selections

Given an access plan like this:

  RowGroupAccess::Scan (scan all row group 0)
  RowGroupAccess::Skip (skip row group 1)
  RowGroupAccess::Select (skip 50, scan 50, skip 900) (scan rows 50-100 in row group 2)
  RowGroupAccess::Scan (scan all row group 3)

Assuming each row group has 1000 rows, the resulting row selection would be the rows to scan in row group 0, 2 and 4:

 RowSelection::Select(1000) (scan all rows in row group 0)
 RowSelection::Skip(50)     (skip first 50 rows in row group 2)
 RowSelection::Select(50)   (scan rows 50-100 in row group 2)
 RowSelection::Skip(900)    (skip last 900 rows in row group 2)
 RowSelection::Select(1000) (scan all rows in row group 3)

Note there is no entry for the (entirely) skipped row group 1.

The row group indexes returned by Self::row_group_indexes would still be 0, 2, 3 .

Source

pub fn row_group_index_iter(&self) -> impl Iterator<Item = usize> + '_

Return an iterator over the row group indexes that should be scanned

Source

pub fn row_group_indexes(&self) -> Vec<usize>

Return a vec of all row group indexes to scan

Source

pub fn len(&self) -> usize

Return the total number of row groups (not the total number or groups to scan)

Source

pub fn is_empty(&self) -> bool

Return true if there are no row groups

Source

pub fn inner(&self) -> &[RowGroupAccess]

Get a reference to the inner accesses

Source

pub fn into_inner(self) -> Vec<RowGroupAccess>

Covert into the inner row group accesses

Trait Implementations§

Source§

impl Clone for ParquetAccessPlan

Source§

fn clone(&self) -> ParquetAccessPlan

Returns a duplicate of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for ParquetAccessPlan

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl PartialEq for ParquetAccessPlan

Source§

fn eq(&self, other: &ParquetAccessPlan) -> bool

Tests for self and other values to be equal, and is used by ==.
1.0.0 · Source§

fn ne(&self, other: &Rhs) -> bool

Tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason.
Source§

impl StructuralPartialEq for ParquetAccessPlan

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

§

impl<T> Instrument for T

§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided [Span], returning an Instrumented wrapper. Read more
§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
§

impl<T> PolicyExt for T
where T: ?Sized,

§

fn and<P, B, E>(self, other: P) -> And<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns [Action::Follow] only if self and other return Action::Follow. Read more
§

fn or<P, B, E>(self, other: P) -> Or<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns [Action::Follow] if either self or other returns Action::Follow. Read more
Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

§

fn vzip(self) -> V

§

impl<T> WithSubscriber for T

§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a [WithDispatch] wrapper. Read more
§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a [WithDispatch] wrapper. Read more
§

impl<T> Allocation for T
where T: RefUnwindSafe + Send + Sync,

§

impl<T> ErasedDestructor for T
where T: 'static,