Expand description
Module containing helper methods/traits related to enabling dividing input stream into multiple output files at execution time
Functionsยง
- compute_
hive_ ๐style_ file_ path - compute_
partition_ ๐keys_ by_ row - compute_
take_ ๐arrays - create_
new_ ๐file_ stream - Helper for row count demuxer
- generate_
file_ ๐path - Helper for row count demuxer
- hive_
style_ ๐partitions_ demuxer - Splits an input stream based on the distinct values of a set of columns Assumes standard hive style partition paths such as /col1=val1/col2=val2/outputfile.parquet
- remove_
partition_ ๐by_ columns - row_
count_ ๐demuxer - Dynamically partitions input stream to achieve desired maximum rows per file
- start_
demuxer_ ๐task - Splits a single [SendableRecordBatchStream] into a dynamically determined number of partitions at execution time.