pub struct DFParquetMetadata<'a> {
store: &'a dyn ObjectStore,
object_meta: &'a ObjectMeta,
metadata_size_hint: Option<usize>,
decryption_properties: Option<Arc<FileDecryptionProperties>>,
file_metadata_cache: Option<Arc<dyn FileMetadataCache>>,
pub coerce_int96: Option<TimeUnit>,
}Expand description
Handles fetching Parquet file schema, metadata and statistics from object store.
This component is exposed for low level integrations through
ParquetFileReaderFactory.
Fields§
§store: &'a dyn ObjectStore§object_meta: &'a ObjectMeta§metadata_size_hint: Option<usize>§decryption_properties: Option<Arc<FileDecryptionProperties>>§file_metadata_cache: Option<Arc<dyn FileMetadataCache>>§coerce_int96: Option<TimeUnit>timeunit to coerce INT96 timestamps to
Implementations§
Source§impl<'a> DFParquetMetadata<'a>
impl<'a> DFParquetMetadata<'a>
pub fn new(store: &'a dyn ObjectStore, object_meta: &'a ObjectMeta) -> Self
Sourcepub fn with_metadata_size_hint(self, metadata_size_hint: Option<usize>) -> Self
pub fn with_metadata_size_hint(self, metadata_size_hint: Option<usize>) -> Self
set metadata size hint
Sourcepub fn with_decryption_properties(
self,
decryption_properties: Option<Arc<FileDecryptionProperties>>,
) -> Self
pub fn with_decryption_properties( self, decryption_properties: Option<Arc<FileDecryptionProperties>>, ) -> Self
set decryption properties
Sourcepub fn with_file_metadata_cache(
self,
file_metadata_cache: Option<Arc<dyn FileMetadataCache>>,
) -> Self
pub fn with_file_metadata_cache( self, file_metadata_cache: Option<Arc<dyn FileMetadataCache>>, ) -> Self
set file metadata cache
Sourcepub fn with_coerce_int96(self, time_unit: Option<TimeUnit>) -> Self
pub fn with_coerce_int96(self, time_unit: Option<TimeUnit>) -> Self
Set timeunit to coerce INT96 timestamps to
Sourcepub async fn fetch_metadata(&self) -> Result<Arc<ParquetMetaData>>
pub async fn fetch_metadata(&self) -> Result<Arc<ParquetMetaData>>
Fetch parquet metadata from the remote object store
Sourcepub async fn fetch_schema(&self) -> Result<Schema>
pub async fn fetch_schema(&self) -> Result<Schema>
Read and parse the schema of the Parquet file
Sourcepub(crate) async fn fetch_schema_with_location(&self) -> Result<(Path, Schema)>
pub(crate) async fn fetch_schema_with_location(&self) -> Result<(Path, Schema)>
Return (path, schema) tuple by fetching the schema from Parquet file
Sourcepub async fn fetch_statistics(
&self,
table_schema: &SchemaRef,
) -> Result<Statistics>
pub async fn fetch_statistics( &self, table_schema: &SchemaRef, ) -> Result<Statistics>
Fetch the metadata from the Parquet file via Self::fetch_metadata and convert
the statistics in the metadata using Self::statistics_from_parquet_metadata
Sourcepub fn statistics_from_parquet_metadata(
metadata: &ParquetMetaData,
table_schema: &SchemaRef,
) -> Result<Statistics>
pub fn statistics_from_parquet_metadata( metadata: &ParquetMetaData, table_schema: &SchemaRef, ) -> Result<Statistics>
Convert statistics in [ParquetMetaData] into Statistics using [StatisticsConverter]
The statistics are calculated for each column in the table schema using the row group statistics in the parquet metadata.
§Key behaviors:
- Extracts row counts and byte sizes from all row groups
- Applies schema type coercions to align file schema with table schema
- Collects and aggregates statistics across row groups when available
§When there are no statistics:
If the Parquet file doesn’t contain any statistics (has_statistics is false), the function returns a Statistics object with:
- Exact row count
- Exact byte size
- All column statistics marked as unknown via Statistics::unknown_column(&table_schema)
§When only some columns have statistics:
For columns with statistics:
- Min/max values are properly extracted and represented as Precision::Exact
- Null counts are calculated by summing across row groups
For columns without statistics,
- For min/max, there are two situations:
- The column isn’t in arrow schema, then min/max values are set to Precision::Absent
- The column is in arrow schema, but not in parquet schema due to schema revolution, min/max values are set to Precision::Exact(null)
- Null counts are set to Precision::Exact(num_rows) (conservatively assuming all values could be null)
Trait Implementations§
Auto Trait Implementations§
impl<'a> Freeze for DFParquetMetadata<'a>
impl<'a> !RefUnwindSafe for DFParquetMetadata<'a>
impl<'a> Send for DFParquetMetadata<'a>
impl<'a> Sync for DFParquetMetadata<'a>
impl<'a> Unpin for DFParquetMetadata<'a>
impl<'a> !UnwindSafe for DFParquetMetadata<'a>
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
§impl<T> Instrument for T
impl<T> Instrument for T
§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more