Crate datafusion_datasource_parquet

Crate datafusion_datasource_parquet 

Source

Re-exportsยง

pub use access_plan::ParquetAccessPlan;
pub use access_plan::RowGroupAccess;
pub use file_format::*;

Modulesยง

access_plan
file_format
ParquetFormat: Parquet FileFormat abstractions
metadata
DFParquetMetadata for fetching Parquet file metadata, statistics and schema information.
metrics ๐Ÿ”’
opener ๐Ÿ”’
ParquetOpener for opening Parquet files
page_filter ๐Ÿ”’
Contains code to filter entire pages
reader ๐Ÿ”’
ParquetFileReaderFactory and DefaultParquetFileReaderFactory for low level control of parquet file readers
row_filter ๐Ÿ”’
Utilities to push down of DataFusion filter predicates (any DataFusion PhysicalExpr that evaluates to a [BooleanArray]) to the parquet decoder level in arrow-rs.
row_group_filter ๐Ÿ”’
source
ParquetSource implementation for reading parquet files
writer ๐Ÿ”’

Structsยง

CachedParquetFileReader
Implements [AsyncFileReader] for a Parquet file in object storage. Reads the file metadata from the [FileMetadataCache], if available, otherwise reads it directly from the file and then updates the cache.
CachedParquetFileReaderFactory
Implementation of ParquetFileReaderFactory supporting the caching of footer and page metadata. Reads and updates the [FileMetadataCache] with the [ParquetMetaData] data. This reader always loads the entire metadata (including page index, unless the file is encrypted), even if not required by the current query, to ensure it is always available for those that need it.
CachedParquetMetaData
Wrapper to implement [FileMetadata] for [ParquetMetaData].
DefaultParquetFileReaderFactory
Default implementation of ParquetFileReaderFactory
PagePruningAccessPlanFilter
Filters a ParquetAccessPlan based on the Parquet PageIndex, if present
ParquetFileMetrics
Stores metrics about the parquet execution for a particular parquet file.
ParquetFileReader
Implements [AsyncFileReader] for a parquet file in object storage.
RowGroupAccessPlanFilter
Reduces the ParquetAccessPlan based on row group level metadata.

Traitsยง

ParquetFileReaderFactory
Interface for reading parquet files.

Functionsยง

build_row_filter
Build a [RowFilter] from the given predicate Expr if possible
can_expr_be_pushed_down_with_schemas
Recurses through expr as a tree, finds all columns, and checks if any of them would prevent this expression from being predicate pushed down. If any of them would, this returns false. Otherwise, true. Note that the schema passed in here is not the physical file schema (as it is not available at that point in time); it is the schema of the table that this expression is being evaluated against minus any projected columns and partition columns.
plan_to_parquet
Executes a query and writes the results to a partitioned Parquet file.