Expand description
§HyperLogLog
hyperloglog is a module that contains a modified version
of redis’s implementation
with some modification based on strong assumption of usage
within datafusion, so that function can
be efficiently implemented.
Specifically, like Redis’s version, this HLL structure uses
214 = 16384 registers, which means the standard error is
1.04/(163840.5) = 0.8125%. Unlike Redis, the register takes
up full u8 size instead of a raw int* and thus saves some
tricky bit shifting techniques used in the original version.
This results in a memory usage increase from 12Kib to 16Kib.
Also only the dense version is adopted, so there’s no automatic
conversion, largely to simplify the code.
This module also borrows some code structure from pdatastructs.rs.
Structs§
Constants§
- HLL_P 🔒
- The greater is P, the smaller the error.
- HLL_
P_ 🔒MASK - Mask to obtain index into the registers
- HLL_Q 🔒
- The number of bits of the hash value used determining the number of leading zeros
- NUM_
REGISTERS 🔒 - SEED 🔒
- Fixed seed for the hashing so that values are consistent across runs