pub struct ListingTable {
table_paths: Vec<ListingTableUrl>,
file_schema: SchemaRef,
table_schema: SchemaRef,
schema_source: SchemaSource,
options: ListingOptions,
definition: Option<String>,
collected_statistics: FileStatisticsCache,
constraints: Constraints,
column_defaults: HashMap<String, Expr>,
schema_adapter_factory: Option<Arc<dyn SchemaAdapterFactory>>,
expr_adapter_factory: Option<Arc<dyn PhysicalExprAdapterFactory>>,
}Expand description
Built in TableProvider that reads data from one or more files as a single table.
The files are read using an [ObjectStore] instance, for example from
local files or objects from AWS S3.
§Features:
- Reading multiple files as a single table
- Hive style partitioning (e.g., directories named
date=2024-06-01) - Merges schemas from files with compatible but not identical schemas (see
ListingTableConfig::file_schema) limit,filterandprojectionpushdown for formats that support it (e.g., Parquet)- Statistics collection and pruning based on file metadata
- Pre-existing sort order (see
ListingOptions::file_sort_order) - Metadata caching to speed up repeated queries (see
FileMetadataCache) - Statistics caching (see
FileStatisticsCache)
§Reading Directories and Hive Style Partitioning
For example, given the table1 directory (or object store prefix)
table1
├── file1.parquet
└── file2.parquetA ListingTable would read the files file1.parquet and file2.parquet as
a single table, merging the schemas if the files have compatible but not
identical schemas.
Given the table2 directory (or object store prefix)
table2
├── date=2024-06-01
│ ├── file3.parquet
│ └── file4.parquet
└── date=2024-06-02
└── file5.parquetA ListingTable would read the files file3.parquet, file4.parquet, and
file5.parquet as a single table, again merging schemas if necessary.
Given the hive style partitioning structure (e.g,. directories named
date=2024-06-01 and date=2026-06-02), ListingTable also adds a date
column when reading the table:
- The files in
table2/date=2024-06-01will have the value2024-06-01 - The files in
table2/date=2024-06-02will have the value2024-06-02.
If the query has a predicate like WHERE date = '2024-06-01'
only the corresponding directory will be read.
§See Also
ListingTableConfig: Configuration optionsDataSourceExec:ExecutionPlanused byListingTable
§Caching Metadata
Some formats, such as Parquet, use the FileMetadataCache to cache file
metadata that is needed to execute but expensive to read, such as row
groups and statistics. The cache is scoped to the SessionContext and can
be configured via the runtime config options.
§Example: Read a directory of parquet files using a ListingTable
async fn get_listing_table(session: &dyn Session) -> Result<Arc<dyn TableProvider>> {
let table_path = "/path/to/parquet";
// Parse the path
let table_path = ListingTableUrl::parse(table_path)?;
// Create default parquet options
let file_format = ParquetFormat::new();
let listing_options = ListingOptions::new(Arc::new(file_format))
.with_file_extension(".parquet");
// Resolve the schema
let resolved_schema = listing_options
.infer_schema(session, &table_path)
.await?;
let config = ListingTableConfig::new(table_path)
.with_listing_options(listing_options)
.with_schema(resolved_schema);
// Create a new TableProvider
let provider = Arc::new(ListingTable::try_new(config)?);
Fields§
§table_paths: Vec<ListingTableUrl>§file_schema: SchemaReffile_schema contains only the columns physically stored in the data files themselves.
- Represents the actual fields found in files like Parquet, CSV, etc.
- Used when reading the raw data from files
table_schema: SchemaReftable_schema combines file_schema + partition columns
- Partition columns are derived from directory paths (not stored in files)
- These are columns like “year=2022/month=01” in paths like /data/year=2022/month=01/file.parquet
schema_source: SchemaSourceIndicates how the schema was derived (inferred or explicitly specified)
options: ListingOptionsOptions used to configure the listing table such as the file format and partitioning information
definition: Option<String>The SQL definition for this table, if any
collected_statistics: FileStatisticsCacheCache for collected file statistics
constraints: ConstraintsConstraints applied to this table
column_defaults: HashMap<String, Expr>Column default expressions for columns that are not physically present in the data files
schema_adapter_factory: Option<Arc<dyn SchemaAdapterFactory>>Optional SchemaAdapterFactory for creating schema adapters
expr_adapter_factory: Option<Arc<dyn PhysicalExprAdapterFactory>>Optional PhysicalExprAdapterFactory for creating physical expression adapters
Implementations§
Source§impl ListingTable
impl ListingTable
Sourcepub fn try_new(config: ListingTableConfig) -> Result<Self>
pub fn try_new(config: ListingTableConfig) -> Result<Self>
Create new ListingTable
See documentation and example on ListingTable and ListingTableConfig
Sourcepub fn with_constraints(self, constraints: Constraints) -> Self
pub fn with_constraints(self, constraints: Constraints) -> Self
Assign constraints
Sourcepub fn with_column_defaults(
self,
column_defaults: HashMap<String, Expr>,
) -> Self
pub fn with_column_defaults( self, column_defaults: HashMap<String, Expr>, ) -> Self
Assign column defaults
Sourcepub fn with_cache(self, cache: Option<FileStatisticsCache>) -> Self
pub fn with_cache(self, cache: Option<FileStatisticsCache>) -> Self
Set the FileStatisticsCache used to cache parquet file statistics.
Setting a statistics cache on the SessionContext can avoid refetching statistics
multiple times in the same session.
If None, creates a new DefaultFileStatisticsCache scoped to this query.
Sourcepub fn with_definition(self, definition: Option<String>) -> Self
pub fn with_definition(self, definition: Option<String>) -> Self
Specify the SQL definition for this table, if any
Sourcepub fn table_paths(&self) -> &Vec<ListingTableUrl>
pub fn table_paths(&self) -> &Vec<ListingTableUrl>
Get paths ref
Sourcepub fn options(&self) -> &ListingOptions
pub fn options(&self) -> &ListingOptions
Get options ref
Sourcepub fn schema_source(&self) -> SchemaSource
pub fn schema_source(&self) -> SchemaSource
Get the schema source
Sourcepub fn with_schema_adapter_factory(
self,
schema_adapter_factory: Arc<dyn SchemaAdapterFactory>,
) -> Self
pub fn with_schema_adapter_factory( self, schema_adapter_factory: Arc<dyn SchemaAdapterFactory>, ) -> Self
Set the SchemaAdapterFactory for this ListingTable
The schema adapter factory is used to create schema adapters that can handle schema evolution and type conversions when reading files with different schemas than the table schema.
§Example: Adding Schema Evolution Support
let table_with_evolution = table
.with_schema_adapter_factory(Arc::new(DefaultSchemaAdapterFactory));See ListingTableConfig::with_schema_adapter_factory for an example of custom SchemaAdapterFactory.
Sourcepub fn schema_adapter_factory(&self) -> Option<&Arc<dyn SchemaAdapterFactory>>
pub fn schema_adapter_factory(&self) -> Option<&Arc<dyn SchemaAdapterFactory>>
Get the SchemaAdapterFactory for this table
Sourcefn create_schema_adapter(&self) -> Box<dyn SchemaAdapter>
fn create_schema_adapter(&self) -> Box<dyn SchemaAdapter>
Creates a schema adapter for mapping between file and table schemas
Uses the configured schema adapter factory if available, otherwise falls back to the default implementation.
Sourcefn create_file_source_with_schema_adapter(&self) -> Result<Arc<dyn FileSource>>
fn create_file_source_with_schema_adapter(&self) -> Result<Arc<dyn FileSource>>
Creates a file source and applies schema adapter factory if available
Sourcepub fn try_create_output_ordering(
&self,
execution_props: &ExecutionProps,
) -> Result<Vec<LexOrdering>>
pub fn try_create_output_ordering( &self, execution_props: &ExecutionProps, ) -> Result<Vec<LexOrdering>>
If file_sort_order is specified, creates the appropriate physical expressions
Source§impl ListingTable
impl ListingTable
Sourcepub async fn list_files_for_scan<'a>(
&'a self,
ctx: &'a dyn Session,
filters: &'a [Expr],
limit: Option<usize>,
) -> Result<(Vec<FileGroup>, Statistics)>
pub async fn list_files_for_scan<'a>( &'a self, ctx: &'a dyn Session, filters: &'a [Expr], limit: Option<usize>, ) -> Result<(Vec<FileGroup>, Statistics)>
Get the list of files for a scan as well as the file level statistics. The list is grouped to let the execution plan know how the files should be distributed to different threads / executors.
Sourceasync fn do_collect_statistics(
&self,
ctx: &dyn Session,
store: &Arc<dyn ObjectStore>,
part_file: &PartitionedFile,
) -> Result<Arc<Statistics>>
async fn do_collect_statistics( &self, ctx: &dyn Session, store: &Arc<dyn ObjectStore>, part_file: &PartitionedFile, ) -> Result<Arc<Statistics>>
Collects statistics for a given partitioned file.
This method first checks if the statistics for the given file are already cached. If they are, it returns the cached statistics. If they are not, it infers the statistics from the file and stores them in the cache.
Trait Implementations§
Source§impl Clone for ListingTable
impl Clone for ListingTable
Source§fn clone(&self) -> ListingTable
fn clone(&self) -> ListingTable
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read moreSource§impl Debug for ListingTable
impl Debug for ListingTable
Source§impl TableProvider for ListingTable
impl TableProvider for ListingTable
Source§fn as_any(&self) -> &dyn Any
fn as_any(&self) -> &dyn Any
Any so that it can be
downcast to a specific implementation.Source§fn constraints(&self) -> Option<&Constraints>
fn constraints(&self) -> Option<&Constraints>
Source§fn table_type(&self) -> TableType
fn table_type(&self) -> TableType
Source§fn scan<'life0, 'life1, 'life2, 'life3, 'async_trait>(
&'life0 self,
state: &'life1 dyn Session,
projection: Option<&'life2 Vec<usize>>,
filters: &'life3 [Expr],
limit: Option<usize>,
) -> Pin<Box<dyn Future<Output = Result<Arc<dyn ExecutionPlan>>> + Send + 'async_trait>>where
Self: 'async_trait,
'life0: 'async_trait,
'life1: 'async_trait,
'life2: 'async_trait,
'life3: 'async_trait,
fn scan<'life0, 'life1, 'life2, 'life3, 'async_trait>(
&'life0 self,
state: &'life1 dyn Session,
projection: Option<&'life2 Vec<usize>>,
filters: &'life3 [Expr],
limit: Option<usize>,
) -> Pin<Box<dyn Future<Output = Result<Arc<dyn ExecutionPlan>>> + Send + 'async_trait>>where
Self: 'async_trait,
'life0: 'async_trait,
'life1: 'async_trait,
'life2: 'async_trait,
'life3: 'async_trait,
ExecutionPlan for scanning the table with optionally
specified projection, filter and limit, described below. Read moreSource§fn scan_with_args<'a, 'life0, 'life1, 'async_trait>(
&'life0 self,
state: &'life1 dyn Session,
args: ScanArgs<'a>,
) -> Pin<Box<dyn Future<Output = Result<ScanResult>> + Send + 'async_trait>>where
Self: 'async_trait,
'a: 'async_trait,
'life0: 'async_trait,
'life1: 'async_trait,
fn scan_with_args<'a, 'life0, 'life1, 'async_trait>(
&'life0 self,
state: &'life1 dyn Session,
args: ScanArgs<'a>,
) -> Pin<Box<dyn Future<Output = Result<ScanResult>> + Send + 'async_trait>>where
Self: 'async_trait,
'a: 'async_trait,
'life0: 'async_trait,
'life1: 'async_trait,
ExecutionPlan for scanning the table using structured arguments. Read moreSource§fn supports_filters_pushdown(
&self,
filters: &[&Expr],
) -> Result<Vec<TableProviderFilterPushDown>>
fn supports_filters_pushdown( &self, filters: &[&Expr], ) -> Result<Vec<TableProviderFilterPushDown>>
Source§fn get_table_definition(&self) -> Option<&str>
fn get_table_definition(&self) -> Option<&str>
Source§fn insert_into<'life0, 'life1, 'async_trait>(
&'life0 self,
state: &'life1 dyn Session,
input: Arc<dyn ExecutionPlan>,
insert_op: InsertOp,
) -> Pin<Box<dyn Future<Output = Result<Arc<dyn ExecutionPlan>>> + Send + 'async_trait>>where
Self: 'async_trait,
'life0: 'async_trait,
'life1: 'async_trait,
fn insert_into<'life0, 'life1, 'async_trait>(
&'life0 self,
state: &'life1 dyn Session,
input: Arc<dyn ExecutionPlan>,
insert_op: InsertOp,
) -> Pin<Box<dyn Future<Output = Result<Arc<dyn ExecutionPlan>>> + Send + 'async_trait>>where
Self: 'async_trait,
'life0: 'async_trait,
'life1: 'async_trait,
ExecutionPlan to insert data into this table, if
supported. Read moreSource§fn get_column_default(&self, column: &str) -> Option<&Expr>
fn get_column_default(&self, column: &str) -> Option<&Expr>
Source§fn get_logical_plan(&self) -> Option<Cow<'_, LogicalPlan>>
fn get_logical_plan(&self) -> Option<Cow<'_, LogicalPlan>>
LogicalPlan of this table, if available.Source§fn statistics(&self) -> Option<Statistics>
fn statistics(&self) -> Option<Statistics>
Auto Trait Implementations§
impl Freeze for ListingTable
impl !RefUnwindSafe for ListingTable
impl Send for ListingTable
impl Sync for ListingTable
impl Unpin for ListingTable
impl !UnwindSafe for ListingTable
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
§impl<T> Instrument for T
impl<T> Instrument for T
§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more