fn build_batch(
batch: &RecordBatch,
schema: &SchemaRef,
list_type_columns: &[ListUnnest],
struct_column_indices: &HashSet<usize>,
options: &UnnestOptions,
) -> Result<Option<RecordBatch>>Expand description
For each row in a RecordBatch, some list/struct columns need to be unnested.
- For list columns: We will expand the values in each list into multiple rows, taking the longest length among these lists, and shorter lists are padded with NULLs.
- For struct columns: We will expand the struct columns into multiple subfield columns.
For columns that don’t need to be unnested, repeat their values until reaching the longest length.
Note: unnest has a big difference in behavior between Postgres and DuckDB
Take this example
- Postgres
create table temp (
i integer[][][], j integer[]
)
insert into temp values ('{{{1,2},{3,4}},{{5,6},{7,8}}}', '{1,2}');
select unnest(i), unnest(j) from temp;Result
1 1
2 2
3
4
5
6
7
8- DuckDB
ⓘ
create table temp (i integer[][][], j integer[]);
insert into temp values ([[[1,2],[3,4]],[[5,6],[7,8]]], [1,2]);
select unnest(i,recursive:=true), unnest(j,recursive:=true) from temp;Result:
┌────────────────────────────────────────────────┬────────────────────────────────────────────────┐
│ unnest(i, "recursive" := CAST('t' AS BOOLEAN)) │ unnest(j, "recursive" := CAST('t' AS BOOLEAN)) │
│ int32 │ int32 │
├────────────────────────────────────────────────┼────────────────────────────────────────────────┤
│ 1 │ 1 │
│ 2 │ 2 │
│ 3 │ 1 │
│ 4 │ 2 │
│ 5 │ 1 │
│ 6 │ 2 │
│ 7 │ 1 │
│ 8 │ 2 │
└────────────────────────────────────────────────┴────────────────────────────────────────────────┘The following implementation refer to DuckDB’s implementation