Struct ListingTable

Source

pub struct ListingTable {
    table_paths: Vec<ListingTableUrl>,
    file_schema: SchemaRef,
    table_schema: SchemaRef,
    schema_source: SchemaSource,
    options: ListingOptions,
    definition: Option<String>,
    collected_statistics: FileStatisticsCache,
    constraints: Constraints,
    column_defaults: HashMap<String, Expr>,
    schema_adapter_factory: Option<Arc<dyn SchemaAdapterFactory>>,
    expr_adapter_factory: Option<Arc<dyn PhysicalExprAdapterFactory>>,
}

Expand description

Built in TableProvider that reads data from one or more files as a single table.

The files are read using an [ObjectStore] instance, for example from local files or objects from AWS S3.

§Features:

Reading multiple files as a single table
Hive style partitioning (e.g., directories named date=2024-06-01)
Merges schemas from files with compatible but not identical schemas (see ListingTableConfig::file_schema)
limit, filter and projection pushdown for formats that support it (e.g., Parquet)
Statistics collection and pruning based on file metadata
Pre-existing sort order (see ListingOptions::file_sort_order)
Metadata caching to speed up repeated queries (see FileMetadataCache)
Statistics caching (see FileStatisticsCache)

§Reading Directories and Hive Style Partitioning

For example, given the table1 directory (or object store prefix)

table1
 ├── file1.parquet
 └── file2.parquet

A ListingTable would read the files file1.parquet and file2.parquet as a single table, merging the schemas if the files have compatible but not identical schemas.

Given the table2 directory (or object store prefix)

table2
 ├── date=2024-06-01
 │    ├── file3.parquet
 │    └── file4.parquet
 └── date=2024-06-02
      └── file5.parquet

A ListingTable would read the files file3.parquet, file4.parquet, and file5.parquet as a single table, again merging schemas if necessary.

Given the hive style partitioning structure (e.g,. directories named date=2024-06-01 and date=2026-06-02), ListingTable also adds a date column when reading the table:

The files in table2/date=2024-06-01 will have the value 2024-06-01
The files in table2/date=2024-06-02 will have the value 2024-06-02.

If the query has a predicate like WHERE date = '2024-06-01' only the corresponding directory will be read.

§See Also

ListingTableConfig: Configuration options
DataSourceExec: ExecutionPlan used by ListingTable

§Caching Metadata

Some formats, such as Parquet, use the FileMetadataCache to cache file metadata that is needed to execute but expensive to read, such as row groups and statistics. The cache is scoped to the SessionContext and can be configured via the runtime config options.

§Example: Read a directory of parquet files using a `ListingTable`

async fn get_listing_table(session: &dyn Session) -> Result<Arc<dyn TableProvider>> {
let table_path = "/path/to/parquet";

// Parse the path
let table_path = ListingTableUrl::parse(table_path)?;

// Create default parquet options
let file_format = ParquetFormat::new();
let listing_options = ListingOptions::new(Arc::new(file_format))
  .with_file_extension(".parquet");

// Resolve the schema
let resolved_schema = listing_options
   .infer_schema(session, &table_path)
   .await?;

let config = ListingTableConfig::new(table_path)
  .with_listing_options(listing_options)
  .with_schema(resolved_schema);

// Create a new TableProvider
let provider = Arc::new(ListingTable::try_new(config)?);

Fields§

§table_paths: Vec<ListingTableUrl>§file_schema: SchemaRef

file_schema contains only the columns physically stored in the data files themselves. - Represents the actual fields found in files like Parquet, CSV, etc. - Used when reading the raw data from files

§table_schema: SchemaRef

table_schema combines file_schema + partition columns - Partition columns are derived from directory paths (not stored in files) - These are columns like “year=2022/month=01” in paths like /data/year=2022/month=01/file.parquet

§schema_source: SchemaSource

Indicates how the schema was derived (inferred or explicitly specified)

§options: ListingOptions

Options used to configure the listing table such as the file format and partitioning information

§definition: Option<String>

The SQL definition for this table, if any

§collected_statistics: FileStatisticsCache

Cache for collected file statistics

§constraints: Constraints

Constraints applied to this table

§column_defaults: HashMap<String, Expr>

Column default expressions for columns that are not physically present in the data files

§schema_adapter_factory: Option<Arc<dyn SchemaAdapterFactory>>

Optional SchemaAdapterFactory for creating schema adapters

§expr_adapter_factory: Option<Arc<dyn PhysicalExprAdapterFactory>>

Optional PhysicalExprAdapterFactory for creating physical expression adapters

Struct ListingTable Copy item path

§Features:

§Reading Directories and Hive Style Partitioning

§See Also

§Caching Metadata

§Example: Read a directory of parquet files using a ListingTable

Fields§

Implementations§

impl ListingTable

pub fn try_new(config: ListingTableConfig) -> Result<Self>

pub fn with_constraints(self, constraints: Constraints) -> Self

pub fn with_column_defaults( self, column_defaults: HashMap<String, Expr>, ) -> Self

pub fn with_cache(self, cache: Option<FileStatisticsCache>) -> Self

pub fn with_definition(self, definition: Option<String>) -> Self

pub fn table_paths(&self) -> &Vec<ListingTableUrl>

pub fn options(&self) -> &ListingOptions

pub fn schema_source(&self) -> SchemaSource

pub fn with_schema_adapter_factory( self, schema_adapter_factory: Arc<dyn SchemaAdapterFactory>, ) -> Self

§Example: Adding Schema Evolution Support

pub fn schema_adapter_factory(&self) -> Option<&Arc<dyn SchemaAdapterFactory>>

fn create_schema_adapter(&self) -> Box<dyn SchemaAdapter>

fn create_file_source_with_schema_adapter(&self) -> Result<Arc<dyn FileSource>>

pub fn try_create_output_ordering( &self, execution_props: &ExecutionProps, ) -> Result<Vec<LexOrdering>>

impl ListingTable

pub async fn list_files_for_scan<'a>( &'a self, ctx: &'a dyn Session, filters: &'a [Expr], limit: Option<usize>, ) -> Result<(Vec<FileGroup>, Statistics)>

async fn do_collect_statistics( &self, ctx: &dyn Session, store: &Arc<dyn ObjectStore>, part_file: &PartitionedFile, ) -> Result<Arc<Statistics>>

Trait Implementations§

impl Clone for ListingTable

fn clone(&self) -> ListingTable

fn clone_from(&mut self, source: &Self)

impl Debug for ListingTable

fn fmt(&self, f: &mut Formatter<'_>) -> Result

impl TableProvider for ListingTable

fn as_any(&self) -> &dyn Any

fn schema(&self) -> SchemaRef

fn constraints(&self) -> Option<&Constraints>

fn table_type(&self) -> TableType

fn scan_with_args<'a, 'life0, 'life1, 'async_trait>( &'life0 self, state: &'life1 dyn Session, args: ScanArgs<'a>, ) -> Pin<Box<dyn Future<Output = Result<ScanResult>> + Send + 'async_trait>>where Self: 'async_trait, 'a: 'async_trait, 'life0: 'async_trait, 'life1: 'async_trait,

fn supports_filters_pushdown( &self, filters: &[&Expr], ) -> Result<Vec<TableProviderFilterPushDown>>

fn get_table_definition(&self) -> Option<&str>

fn insert_into<'life0, 'life1, 'async_trait>( &'life0 self, state: &'life1 dyn Session, input: Arc<dyn ExecutionPlan>, insert_op: InsertOp, ) -> Pin<Box<dyn Future<Output = Result<Arc<dyn ExecutionPlan>>> + Send + 'async_trait>>where Self: 'async_trait, 'life0: 'async_trait, 'life1: 'async_trait,

fn get_column_default(&self, column: &str) -> Option<&Expr>

fn get_logical_plan(&self) -> Option<Cow<'_, LogicalPlan>>

fn statistics(&self) -> Option<Statistics>

Auto Trait Implementations§

impl Freeze for ListingTable

impl !RefUnwindSafe for ListingTable

impl Send for ListingTable

impl Sync for ListingTable

impl Unpin for ListingTable

impl !UnwindSafe for ListingTable

Blanket Implementations§

impl<T> Any for Twhere T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for Twhere T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for Twhere T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T> CloneToUninit for Twhere T: Clone,

unsafe fn clone_to_uninit(&self, dest: *mut u8)

impl<T> From<T> for T

fn from(t: T) -> T

impl<T> Instrument for T

fn instrument(self, span: Span) -> Instrumented<Self>

fn in_current_span(self) -> Instrumented<Self>

impl<T, U> Into<U> for Twhere U: From<T>,

fn into(self) -> U

impl<T> IntoEither for T

fn into_either(self, into_left: bool) -> Either<Self, Self>

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>where F: FnOnce(&Self) -> bool,

impl<T> PolicyExt for Twhere T: ?Sized,

fn and<P, B, E>(self, other: P) -> And<T, P>where T: Policy<B, E>, P: Policy<B, E>,

fn or<P, B, E>(self, other: P) -> Or<T, P>where T: Policy<B, E>, P: Policy<B, E>,

impl<T> Same for T

type Output = T

impl<T> ToOwned for Twhere T: Clone,

type Owned = T

fn to_owned(&self) -> T

fn clone_into(&self, target: &mut T)

impl<T, U> TryFrom<U> for Twhere U: Into<T>,

Struct ListingTable

§Example: Read a directory of parquet files using a `ListingTable`

fn scan_with_args<'a, 'life0, 'life1, 'async_trait>( &'life0 self, state: &'life1 dyn Session, args: ScanArgs<'a>, ) -> Pin<Box<dyn Future<Output = Result<ScanResult>> + Send + 'async_trait>>
where Self: 'async_trait, 'a: 'async_trait, 'life0: 'async_trait, 'life1: 'async_trait,

fn insert_into<'life0, 'life1, 'async_trait>( &'life0 self, state: &'life1 dyn Session, input: Arc<dyn ExecutionPlan>, insert_op: InsertOp, ) -> Pin<Box<dyn Future<Output = Result<Arc<dyn ExecutionPlan>>> + Send + 'async_trait>>
where Self: 'async_trait, 'life0: 'async_trait, 'life1: 'async_trait,

impl<T> Any for T
where T: 'static + ?Sized,

impl<T> Borrow<T> for T
where T: ?Sized,

impl<T> BorrowMut<T> for T
where T: ?Sized,

impl<T> CloneToUninit for T
where T: Clone,

impl<T, U> Into<U> for T
where U: From<T>,

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

impl<T> PolicyExt for T
where T: ?Sized,

fn and<P, B, E>(self, other: P) -> And<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

fn or<P, B, E>(self, other: P) -> Or<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

impl<T> ToOwned for T
where T: Clone,

impl<T, U> TryFrom<U> for T
where U: Into<T>,

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

impl<T> ErasedDestructor for T
where T: 'static,