output_single_parquet_file_parallelized

Function output_single_parquet_file_parallelized 

Source
async fn output_single_parquet_file_parallelized(
    object_store_writer: Box<dyn AsyncWrite + Send + Unpin>,
    data: Receiver<RecordBatch>,
    output_schema: Arc<Schema>,
    parquet_props: &WriterProperties,
    skip_arrow_metadata: bool,
    parallel_options: ParallelParquetWriterOptions,
    pool: Arc<dyn MemoryPool>,
) -> Result<ParquetMetaData>
Expand description

Parallelizes the serialization of a single parquet file, by first serializing N independent RecordBatch streams in parallel to RowGroups in memory. Another task then stitches these independent RowGroups together and streams this large single parquet file to an ObjectStore in multiple parts.