Module hyperloglog

Module hyperloglog 

Source
Expand description

§HyperLogLog

hyperloglog is a module that contains a modified version of redis’s implementation with some modification based on strong assumption of usage within datafusion, so that function can be efficiently implemented.

Specifically, like Redis’s version, this HLL structure uses 214 = 16384 registers, which means the standard error is 1.04/(163840.5) = 0.8125%. Unlike Redis, the register takes up full u8 size instead of a raw int* and thus saves some tricky bit shifting techniques used in the original version. This results in a memory usage increase from 12Kib to 16Kib. Also only the dense version is adopted, so there’s no automatic conversion, largely to simplify the code.

This module also borrows some code structure from pdatastructs.rs.

Structs§

HyperLogLog 🔒

Constants§

HLL_P 🔒
The greater is P, the smaller the error.
HLL_P_MASK 🔒
Mask to obtain index into the registers
HLL_Q 🔒
The number of bits of the hash value used determining the number of leading zeros
NUM_REGISTERS 🔒
SEED 🔒
Fixed seed for the hashing so that values are consistent across runs

Functions§

hll_sigma 🔒
Helper function sigma as defined in “New cardinality estimation algorithms for HyperLogLog sketches” Otmar Ertl, arXiv:1702.01284
hll_tau 🔒
Helper function tau as defined in “New cardinality estimation algorithms for HyperLogLog sketches” Otmar Ertl, arXiv:1702.01284