build_batch

Function build_batch 

Source
fn build_batch(
    batch: &RecordBatch,
    schema: &SchemaRef,
    list_type_columns: &[ListUnnest],
    struct_column_indices: &HashSet<usize>,
    options: &UnnestOptions,
) -> Result<Option<RecordBatch>>
Expand description

For each row in a RecordBatch, some list/struct columns need to be unnested.

  • For list columns: We will expand the values in each list into multiple rows, taking the longest length among these lists, and shorter lists are padded with NULLs.
  • For struct columns: We will expand the struct columns into multiple subfield columns.

For columns that don’t need to be unnested, repeat their values until reaching the longest length.

Note: unnest has a big difference in behavior between Postgres and DuckDB

Take this example

  1. Postgres
create table temp (
    i integer[][][], j integer[]
)
insert into temp values ('{{{1,2},{3,4}},{{5,6},{7,8}}}', '{1,2}');
select unnest(i), unnest(j) from temp;

Result

    1   1
    2   2
    3
    4
    5
    6
    7
    8
  1. DuckDB
    create table temp (i integer[][][], j integer[]);
    insert into temp values ([[[1,2],[3,4]],[[5,6],[7,8]]], [1,2]);
    select unnest(i,recursive:=true), unnest(j,recursive:=true) from temp;

Result:


    ┌────────────────────────────────────────────────┬────────────────────────────────────────────────┐
    │ unnest(i, "recursive" := CAST('t' AS BOOLEAN)) │ unnest(j, "recursive" := CAST('t' AS BOOLEAN)) │
    │                     int32                      │                     int32                      │
    ├────────────────────────────────────────────────┼────────────────────────────────────────────────┤
    │                                              1 │                                              1 │
    │                                              2 │                                              2 │
    │                                              3 │                                              1 │
    │                                              4 │                                              2 │
    │                                              5 │                                              1 │
    │                                              6 │                                              2 │
    │                                              7 │                                              1 │
    │                                              8 │                                              2 │
    └────────────────────────────────────────────────┴────────────────────────────────────────────────┘

The following implementation refer to DuckDB’s implementation