get_files_with_limit

Function get_files_with_limit 

Source
async fn get_files_with_limit(
    files: impl Stream<Item = Result<PartitionedFile>>,
    limit: Option<usize>,
    collect_stats: bool,
) -> Result<(FileGroup, bool)>
Expand description

Processes a stream of partitioned files and returns a FileGroup containing the files.

This function collects files from the provided stream until either:

  1. The stream is exhausted
  2. The accumulated number of rows exceeds the provided limit (if specified)

§Arguments

  • files - A stream of Result<PartitionedFile> items to process
  • limit - An optional row count limit. If provided, the function will stop collecting files once the accumulated number of rows exceeds this limit
  • collect_stats - Whether to collect and accumulate statistics from the files

§Returns

A Result containing a FileGroup with the collected files and a boolean indicating whether the statistics are inexact.

§Note

The function will continue processing files if statistics are not available or if the limit is not provided. If collect_stats is false, statistics won’t be accumulated but files will still be collected.