pub struct ParquetAccessPlan {
row_groups: Vec<RowGroupAccess>,
}Expand description
A selection of rows and row groups within a ParquetFile to decode.
A ParquetAccessPlan is used to limit the row groups and data pages a DataSourceExec
will read and decode to improve performance.
Note that page level pruning based on ArrowPredicate is applied after all of these selections
§Example
For example, given a Parquet file with 4 row groups, a ParquetAccessPlan
can be used to specify skipping row group 0 and 2, scanning a range of rows
in row group 1, and scanning all rows in row group 3 as follows:
// Default to scan all row groups
let mut access_plan = ParquetAccessPlan::new_all(4);
access_plan.skip(0); // skip row group
// Use parquet reader RowSelector to specify scanning rows 100-200 and 350-400
// in a row group that has 1000 rows
let row_selection = RowSelection::from(vec![
RowSelector::skip(100),
RowSelector::select(100),
RowSelector::skip(150),
RowSelector::select(50),
RowSelector::skip(600), // skip last 600 rows
]);
access_plan.scan_selection(1, row_selection);
access_plan.skip(2); // skip row group 2
// row group 3 is scanned by defaultThe resulting plan would look like:
┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐
│ │ SKIP
└ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘
Row Group 0
┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐
┌────────────────┐ SCAN ONLY ROWS
│└────────────────┘ │ 100-200
┌────────────────┐ 350-400
│└────────────────┘ │
─ ─ ─ ─ ─ ─ ─ ─ ─ ─
Row Group 1
┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐
SKIP
│ │
└ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘
Row Group 2
┌───────────────────┐
│ │ SCAN ALL ROWS
│ │
│ │
└───────────────────┘
Row Group 3Fields§
§row_groups: Vec<RowGroupAccess>Implementations§
Source§impl ParquetAccessPlan
impl ParquetAccessPlan
Sourcepub fn new_all(row_group_count: usize) -> ParquetAccessPlan
pub fn new_all(row_group_count: usize) -> ParquetAccessPlan
Create a new ParquetAccessPlan that scans all row groups
Sourcepub fn new_none(row_group_count: usize) -> ParquetAccessPlan
pub fn new_none(row_group_count: usize) -> ParquetAccessPlan
Create a new ParquetAccessPlan that scans no row groups
Sourcepub fn new(row_groups: Vec<RowGroupAccess>) -> ParquetAccessPlan
pub fn new(row_groups: Vec<RowGroupAccess>) -> ParquetAccessPlan
Create a new ParquetAccessPlan from the specified RowGroupAccesses
Sourcepub fn set(&mut self, idx: usize, access: RowGroupAccess)
pub fn set(&mut self, idx: usize, access: RowGroupAccess)
Set the i-th row group to the specified RowGroupAccess
Sourcepub fn should_scan(&self, idx: usize) -> bool
pub fn should_scan(&self, idx: usize) -> bool
Return true if the i-th row group should be scanned
Sourcepub fn scan_selection(&mut self, idx: usize, selection: RowSelection)
pub fn scan_selection(&mut self, idx: usize, selection: RowSelection)
Set to scan only the [RowSelection] in the specified row group.
Behavior is different depending on the existing access
RowGroupAccess::Skip: does nothingRowGroupAccess::Scan: Updates to scan only the rows in theRowSelectionRowGroupAccess::Selection: Updates to scan only the intersection of the existing selection and the new selection
Sourcepub fn into_overall_row_selection(
self,
row_group_meta_data: &[RowGroupMetaData],
) -> Result<Option<RowSelection>, DataFusionError>
pub fn into_overall_row_selection( self, row_group_meta_data: &[RowGroupMetaData], ) -> Result<Option<RowSelection>, DataFusionError>
Return an overall RowSelection, if needed
This is used to compute the row selection for the parquet reader. See
ArrowReaderBuilder::with_row_selection for more details.
Returns
Noneif there are noRowGroupAccess::SelectionSome(selection)if there areRowGroupAccess::Selections
The returned selection represents which rows to scan across any row row groups which are not skipped.
§Notes
If there are no RowGroupAccess::Selections, the overall row
selection is None because each row group is either entirely skipped or
scanned, which is covered by Self::row_group_indexes.
If there are any RowGroupAccess::Selection, an overall row selection
is returned for all the rows in the row groups that are not skipped.
Thus it includes a Select selection for any RowGroupAccess::Scan.
§Errors
Returns an error if any specified row selection does not specify
the same number of rows as in it’s corresponding row_group_metadata.
§Example: No Selections
Given an access plan like this
RowGroupAccess::Scan (scan all row group 0)
RowGroupAccess::Skip (skip row group 1)
RowGroupAccess::Scan (scan all row group 2)
RowGroupAccess::Scan (scan all row group 3)The overall row selection would be None because there are no
RowGroupAccess::Selections. The row group indexes
returned by Self::row_group_indexes would be 0, 2, 3 .
§Example: With Selections
Given an access plan like this:
RowGroupAccess::Scan (scan all row group 0)
RowGroupAccess::Skip (skip row group 1)
RowGroupAccess::Select (skip 50, scan 50, skip 900) (scan rows 50-100 in row group 2)
RowGroupAccess::Scan (scan all row group 3)Assuming each row group has 1000 rows, the resulting row selection would be the rows to scan in row group 0, 2 and 4:
RowSelection::Select(1000) (scan all rows in row group 0)
RowSelection::Skip(50) (skip first 50 rows in row group 2)
RowSelection::Select(50) (scan rows 50-100 in row group 2)
RowSelection::Skip(900) (skip last 900 rows in row group 2)
RowSelection::Select(1000) (scan all rows in row group 3)Note there is no entry for the (entirely) skipped row group 1.
The row group indexes returned by Self::row_group_indexes would
still be 0, 2, 3 .
Sourcepub fn row_group_index_iter(&self) -> impl Iterator<Item = usize>
pub fn row_group_index_iter(&self) -> impl Iterator<Item = usize>
Return an iterator over the row group indexes that should be scanned
Sourcepub fn row_group_indexes(&self) -> Vec<usize>
pub fn row_group_indexes(&self) -> Vec<usize>
Return a vec of all row group indexes to scan
Sourcepub fn len(&self) -> usize
pub fn len(&self) -> usize
Return the total number of row groups (not the total number or groups to scan)
Sourcepub fn inner(&self) -> &[RowGroupAccess]
pub fn inner(&self) -> &[RowGroupAccess]
Get a reference to the inner accesses
Sourcepub fn into_inner(self) -> Vec<RowGroupAccess>
pub fn into_inner(self) -> Vec<RowGroupAccess>
Covert into the inner row group accesses
Trait Implementations§
Source§impl Clone for ParquetAccessPlan
impl Clone for ParquetAccessPlan
Source§fn clone(&self) -> ParquetAccessPlan
fn clone(&self) -> ParquetAccessPlan
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read moreSource§impl Debug for ParquetAccessPlan
impl Debug for ParquetAccessPlan
Source§impl PartialEq for ParquetAccessPlan
impl PartialEq for ParquetAccessPlan
impl StructuralPartialEq for ParquetAccessPlan
Auto Trait Implementations§
impl Freeze for ParquetAccessPlan
impl RefUnwindSafe for ParquetAccessPlan
impl Send for ParquetAccessPlan
impl Sync for ParquetAccessPlan
impl Unpin for ParquetAccessPlan
impl UnwindSafe for ParquetAccessPlan
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
§impl<T> Instrument for T
impl<T> Instrument for T
§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more