Expand description
Execution plans that read file formats
Modules§
- arrow
- Reexports the
datafusion_datasource_arrow::sourcemodule, containing Arrow basedFileSource. - avro
- Reexports the
datafusion_datasource_json::sourcemodule, containing Avro basedFileSource. - csv
- Reexports the
datafusion_datasource_json::sourcemodule, containing CSV basedFileSource. - json
- Reexports the
datafusion_datasource_json::sourcemodule, containing JSON basedFileSource. - parquet
- Reexports the
datafusion_datasource_parquetcrate, containing Parquet basedFileSource.
Structs§
- Arrow
Opener - The struct arrow that implements
[FileOpener]trait - Arrow
Source - Arrow configuration struct that is given to DataSourceExec
Does not hold anything special, since
FileScanConfigis sufficient for arrow - Avro
Source - AvroSource holds the extra configuration that is necessary for opening avro files
- CsvOpener
- A
FileOpenerthat opens a CSV file and yields aFileOpenFuture - CsvSource
- A Config for
CsvOpener - File
Group - Represents a group of partitioned files that’ll be processed by a single thread. Maintains optional statistics across all files in the group.
- File
Group Partitioner - Repartition input files into
target_partitionspartitions, if total file size exceedrepartition_file_min_size - File
Scan Config - The base configurations for a
DataSourceExec, the a physical plan for any given file format. - File
Scan Config Builder - A builder for
FileScanConfig’s. - File
Sink Config - The base configurations to provide when creating a physical plan for writing to any given file format.
- File
Stream - A stream that iterates record batch by record batch, file over file.
- Json
Opener - A
FileOpenerthat opens a JSON file and yields aFileOpenFuture - Json
Source - JsonSource holds the extra configuration that is necessary for
JsonOpener - Parquet
File Metrics - Stores metrics about the parquet execution for a particular parquet file.
- Parquet
Source - Execution plan for reading one or more Parquet files.
Enums§
- OnError
- Describes the behavior of the
FileStreamif file opening or scanning fails
Traits§
- File
Opener - Generic API for opening a file using an
ObjectStoreand resolving to a stream ofRecordBatch - File
Sink - General behaviors for files that do
DataSinkoperations - File
Source - file format specific behaviors for elements in
DataSource - Parquet
File Reader Factory - Interface for reading parquet files.
Functions§
- wrap_
partition_ type_ in_ dict - Convert type to a type suitable for use as a
ListingTablepartition column. ReturnsDictionary(UInt16, val_type), which is a reasonable trade off between a reasonable number of partition values and space efficiency. - wrap_
partition_ value_ in_ dict - Convert a
ScalarValueof partition columns to a type, as described in the documentation ofwrap_partition_type_in_dict, which can wrap the types.
Type Aliases§
- File
Open Future - A fallible future that resolves to a stream of
RecordBatch