TableSchema

Struct TableSchema 

Source
pub struct TableSchema {
    file_schema: SchemaRef,
    table_partition_cols: Vec<FieldRef>,
    table_schema: SchemaRef,
}
Expand description

Helper to hold table schema information for partitioned data sources.

When reading partitioned data (such as Hive-style partitioning), a table’s schema consists of two parts:

  1. File schema: The schema of the actual data files on disk
  2. Partition columns: Columns that are encoded in the directory structure, not stored in the files themselves

§Example: Partitioned Table

Consider a table with the following directory structure:

/data/date=2025-10-10/region=us-west/data.parquet
/data/date=2025-10-11/region=us-east/data.parquet

In this case:

  • File schema: The schema of data.parquet files (e.g., [user_id, amount])
  • Partition columns: [date, region] extracted from the directory path
  • Table schema: The full schema combining both (e.g., [user_id, amount, date, region])

§When to Use

Use TableSchema when:

  • Reading partitioned data sources (Parquet, CSV, etc. with Hive-style partitioning)
  • You need to efficiently access different schema representations without reconstructing them
  • You want to avoid repeatedly concatenating file and partition schemas

For non-partitioned data or when working with a single schema representation, working directly with Arrow’s Schema or SchemaRef is simpler.

§Performance

This struct pre-computes and caches the full table schema, allowing cheap references to any representation without repeated allocations or reconstructions.

Fields§

§file_schema: SchemaRef

The schema of the data files themselves, without partition columns.

For example, if your Parquet files contain [user_id, amount], this field holds that schema.

§table_partition_cols: Vec<FieldRef>

Columns that are derived from the directory structure (partitioning scheme).

For Hive-style partitioning like /date=2025-10-10/region=us-west/, this contains the date and region fields.

These columns are NOT present in the data files but are appended to each row during query execution based on the file’s location.

§table_schema: SchemaRef

The complete table schema: file_schema columns followed by partition columns.

This is pre-computed during construction by concatenating file_schema and table_partition_cols, so it can be returned as a cheap reference.

Implementations§

Source§

impl TableSchema

Source

pub fn new(file_schema: SchemaRef, table_partition_cols: Vec<FieldRef>) -> Self

Create a new TableSchema from a file schema and partition columns.

The table schema is automatically computed by appending the partition columns to the file schema.

You should prefer calling this method over chaining TableSchema::from_file_schema and TableSchema::with_table_partition_cols if you have both the file schema and partition columns available at construction time since it avoids re-computing the table schema.

§Arguments
  • file_schema - Schema of the data files (without partition columns)
  • table_partition_cols - Partition columns to append to each row
§Example
let file_schema = Arc::new(Schema::new(vec![
    Field::new("user_id", DataType::Int64, false),
    Field::new("amount", DataType::Float64, false),
]));

let partition_cols = vec![
    Arc::new(Field::new("date", DataType::Utf8, false)),
    Arc::new(Field::new("region", DataType::Utf8, false)),
];

let table_schema = TableSchema::new(file_schema, partition_cols);

// Table schema will have 4 columns: user_id, amount, date, region
assert_eq!(table_schema.table_schema().fields().len(), 4);
Source

pub fn from_file_schema(file_schema: SchemaRef) -> Self

Create a new TableSchema with no partition columns.

You should prefer calling TableSchema::new if you have partition columns at construction time since it avoids re-computing the table schema.

Source

pub fn with_table_partition_cols(self, partition_cols: Vec<FieldRef>) -> Self

Add partition columns to an existing TableSchema, returning a new instance.

You should prefer calling TableSchema::new instead of chaining TableSchema::from_file_schema into TableSchema::with_table_partition_cols if you have partition columns at construction time since it avoids re-computing the table schema.

Source

pub fn file_schema(&self) -> &SchemaRef

Get the file schema (without partition columns).

This is the schema of the actual data files on disk.

Source

pub fn table_partition_cols(&self) -> &Vec<FieldRef>

Get the table partition columns.

These are the columns derived from the directory structure that will be appended to each row during query execution.

Source

pub fn table_schema(&self) -> &SchemaRef

Get the full table schema (file schema + partition columns).

This is the complete schema that will be seen by queries, combining both the columns from the files and the partition columns.

Trait Implementations§

Source§

impl Clone for TableSchema

Source§

fn clone(&self) -> TableSchema

Returns a duplicate of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for TableSchema

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

§

impl<T> Instrument for T

§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided [Span], returning an Instrumented wrapper. Read more
§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
§

impl<T> PolicyExt for T
where T: ?Sized,

§

fn and<P, B, E>(self, other: P) -> And<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns [Action::Follow] only if self and other return Action::Follow. Read more
§

fn or<P, B, E>(self, other: P) -> Or<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns [Action::Follow] if either self or other returns Action::Follow. Read more
Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

§

fn vzip(self) -> V

§

impl<T> WithSubscriber for T

§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a [WithDispatch] wrapper. Read more
§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a [WithDispatch] wrapper. Read more
§

impl<T> Allocation for T
where T: RefUnwindSafe + Send + Sync,

§

impl<T> ErasedDestructor for T
where T: 'static,