CsvReadOptions

Struct CsvReadOptions 

Source
pub struct CsvReadOptions<'a> {
Show 15 fields pub has_header: bool, pub delimiter: u8, pub quote: u8, pub terminator: Option<u8>, pub escape: Option<u8>, pub comment: Option<u8>, pub newlines_in_values: bool, pub schema: Option<&'a Schema>, pub schema_infer_max_records: usize, pub file_extension: &'a str, pub table_partition_cols: Vec<(String, DataType)>, pub file_compression_type: FileCompressionType, pub file_sort_order: Vec<Vec<SortExpr>>, pub null_regex: Option<String>, pub truncated_rows: bool,
}
Expand description

Options that control the reading of CSV files.

Note this structure is supplied when a datasource is created and can not not vary from statement to statement. For settings that can vary statement to statement see ConfigOptions.

Fields§

§has_header: bool

Does the CSV file have a header?

If schema inference is run on a file with no headers, default column names are created.

§delimiter: u8

An optional column delimiter. Defaults to b','.

§quote: u8

An optional quote character. Defaults to b'"'.

§terminator: Option<u8>

An optional terminator character. Defaults to None (CRLF).

§escape: Option<u8>

An optional escape character. Defaults to None.

§comment: Option<u8>

If enabled, lines beginning with this byte are ignored.

§newlines_in_values: bool

Specifies whether newlines in (quoted) values are supported.

Parsing newlines in quoted values may be affected by execution behaviour such as parallel file scanning. Setting this to true ensures that newlines in values are parsed successfully, which may reduce performance.

The default behaviour depends on the datafusion.catalog.newlines_in_values setting.

§schema: Option<&'a Schema>

An optional schema representing the CSV files. If None, CSV reader will try to infer it based on data in file.

§schema_infer_max_records: usize

Max number of rows to read from CSV files for schema inference if needed. Defaults to DEFAULT_SCHEMA_INFER_MAX_RECORD.

§file_extension: &'a str

File extension; only files with this extension are selected for data input. Defaults to FileType::CSV.get_ext().as_str().

§table_partition_cols: Vec<(String, DataType)>

Partition Columns

§file_compression_type: FileCompressionType

File compression type

§file_sort_order: Vec<Vec<SortExpr>>

Indicates how the file is sorted

§null_regex: Option<String>

Optional regex to match null values

§truncated_rows: bool

Whether to allow truncated rows when parsing. By default this is set to false and will error if the CSV rows have different lengths. When set to true then it will allow records with less than the expected number of columns and fill the missing columns with nulls. If the record’s schema is not nullable, then it will still return an error.

Implementations§

Source§

impl<'a> CsvReadOptions<'a>

Source

pub fn new() -> Self

Create a CSV read option with default presets

Source

pub fn has_header(self, has_header: bool) -> Self

Configure has_header setting

Source

pub fn comment(self, comment: u8) -> Self

Specify comment char to use for CSV read

Source

pub fn delimiter(self, delimiter: u8) -> Self

Specify delimiter to use for CSV read

Source

pub fn quote(self, quote: u8) -> Self

Specify quote to use for CSV read

Source

pub fn terminator(self, terminator: Option<u8>) -> Self

Specify terminator to use for CSV read

Source

pub fn escape(self, escape: u8) -> Self

Specify delimiter to use for CSV read

Source

pub fn newlines_in_values(self, newlines_in_values: bool) -> Self

Specifies whether newlines in (quoted) values are supported.

Parsing newlines in quoted values may be affected by execution behaviour such as parallel file scanning. Setting this to true ensures that newlines in values are parsed successfully, which may reduce performance.

The default behaviour depends on the datafusion.catalog.newlines_in_values setting.

Source

pub fn file_extension(self, file_extension: &'a str) -> Self

Specify the file extension for CSV file selection

Source

pub fn delimiter_option(self, delimiter: Option<u8>) -> Self

Configure delimiter setting with Option, None value will be ignored

Source

pub fn schema(self, schema: &'a Schema) -> Self

Specify schema to use for CSV read

Source

pub fn table_partition_cols( self, table_partition_cols: Vec<(String, DataType)>, ) -> Self

Specify table_partition_cols for partition pruning

Source

pub fn schema_infer_max_records(self, max_records: usize) -> Self

Configure number of max records to read for schema inference

Source

pub fn file_compression_type( self, file_compression_type: FileCompressionType, ) -> Self

Configure file compression type

Source

pub fn file_sort_order(self, file_sort_order: Vec<Vec<SortExpr>>) -> Self

Configure if file has known sort order

Source

pub fn null_regex(self, null_regex: Option<String>) -> Self

Configure the null parsing regex.

Source

pub fn truncated_rows(self, truncated_rows: bool) -> Self

Configure whether to allow truncated rows when parsing. By default this is set to false and will error if the CSV rows have different lengths When set to true then it will allow records with less than the expected number of columns and fill the missing columns with nulls. If the record’s schema is not nullable, then it will still return an error.

Trait Implementations§

Source§

impl<'a> Clone for CsvReadOptions<'a>

Source§

fn clone(&self) -> CsvReadOptions<'a>

Returns a duplicate of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Default for CsvReadOptions<'_>

Source§

fn default() -> Self

Returns the “default value” for a type. Read more
Source§

impl ReadOptions<'_> for CsvReadOptions<'_>

Source§

fn to_listing_options( &self, config: &SessionConfig, table_options: TableOptions, ) -> ListingOptions

Helper to convert these user facing options to ListingTable options
Source§

fn get_resolved_schema<'life0, 'life1, 'async_trait>( &'life0 self, config: &'life1 SessionConfig, state: SessionState, table_path: ListingTableUrl, ) -> Pin<Box<dyn Future<Output = Result<SchemaRef>> + Send + 'async_trait>>
where Self: 'async_trait, 'life0: 'async_trait, 'life1: 'async_trait,

Infer and resolve the schema from the files/sources provided.
Source§

fn _get_resolved_schema<'life0, 'async_trait>( &'a self, config: &'life0 SessionConfig, state: SessionState, table_path: ListingTableUrl, schema: Option<&'a Schema>, ) -> Pin<Box<dyn Future<Output = Result<SchemaRef>> + Send + 'async_trait>>
where Self: Sync + 'async_trait, 'a: 'async_trait, 'life0: 'async_trait,

helper function to reduce repetitive code. Infers the schema from sources if not provided. Infinite data sources not supported through this function.

Auto Trait Implementations§

§

impl<'a> Freeze for CsvReadOptions<'a>

§

impl<'a> !RefUnwindSafe for CsvReadOptions<'a>

§

impl<'a> Send for CsvReadOptions<'a>

§

impl<'a> Sync for CsvReadOptions<'a>

§

impl<'a> Unpin for CsvReadOptions<'a>

§

impl<'a> !UnwindSafe for CsvReadOptions<'a>

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

§

impl<T> Instrument for T

§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided [Span], returning an Instrumented wrapper. Read more
§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
§

impl<T> PolicyExt for T
where T: ?Sized,

§

fn and<P, B, E>(self, other: P) -> And<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns [Action::Follow] only if self and other return Action::Follow. Read more
§

fn or<P, B, E>(self, other: P) -> Or<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns [Action::Follow] if either self or other returns Action::Follow. Read more
Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

§

fn vzip(self) -> V

§

impl<T> WithSubscriber for T

§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a [WithDispatch] wrapper. Read more
§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a [WithDispatch] wrapper. Read more
§

impl<T> ErasedDestructor for T
where T: 'static,