Skip to content

[NEW] Quicklist - Replace LZF with configurable LZ4 compression for LIST nodes #3526

@roshkhatri

Description

@roshkhatri

The problem/use-case that the feature addresses

Valkey's LIST type values uses the quicklists internally, which is a doubly-linked list of listpack nodes. When list-compress-depth > 0, presently the nodes are compressed using LZF algorithm. It is simple but slow as compared to other LZ4. LZ4 decompresses 3 to 4 times faster and we will be vendoring it as part of #3195

Approach

Swap lzf_compress/lzf_decompress for LZ4_compress_default/LZ4_decompress_safe in __quicklistCompressNode and __quicklistDecompressNode. Block API, no framing overhead.

Decompress before RDB save to avoid RDB format changes.

Quicklist LZ4 vs LZF Benchmark Results

Workload: 200 lists × 2000 elements × 512B, list-compress-depth 2

Memory

Algo Total Memory Per-List Savings vs Uncompressed
None 197.2 MB 1005 KB
LZF 13.4 MB 421 KB 93.2%
LZ4 12.6 MB 419 KB 93.6%

Operation Latency (pipeline batch, lower is better)

Operation None LZF LZ4 LZ4 vs LZF
LRANGE 0 -1 (100 lists) 124ms 218ms 141ms 1.5x faster
LRANGE 800-899 (500 ops) 33ms 59ms 36ms 1.6x faster
LINDEX (5000 interior) 45ms 85ms 53ms 1.6x faster
LSET (1000 interior modify) 6ms 14ms 7ms 1.9x faster
RPOP (5000 tail pops) 53ms 53ms 53ms 1.0x (uncompressed)

Throughput (valkey-benchmark, 50 clients)

Operation None LZF LZ4 LZ4 vs LZF
LPUSH (head) 102K rps 101K rps 100K rps 1.0x
LINDEX idx=500 102K rps 63K rps 98K rps 1.6x
LRANGE 800-899 29K rps 13K rps 27K rps 2.2x

Bulk Populate (200 × 2000 = 400K elements)

Algo Time vs LZF
None 745ms
LZF 873ms baseline
LZ4 670ms 1.3x faster

Key Takeaways

  • LZ4 is 1.5–2.2x faster than LZF for reads on compressed interior nodes
  • LZ4 compresses slightly better than LZF (93.6% vs 93.2% savings)
  • LZ4 nearly eliminates the compression penalty, LINDEX at 98K rps is within 4% of uncompressed (102K rps), vs LZF at 63K rps (38% penalty)
  • LZ4 writes are 1.3x faster too (670ms vs 873ms bulk populate)
  • Head/tail operations (LPUSH, RPOP) are unaffected — they access uncompressed nodes

Alternatives you've considered

ZSTD seems give similar performance to LZF.

Related
#3195: Streaming Compression for RDB and Replication (vendors LZ4)
#3423: Inline in-memory value compression (broader vision)

I can work on the design and implementation once we vendor LZ4 through #3195

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions