The problem/use-case that the feature addresses
Valkey's LIST type values uses the quicklists internally, which is a doubly-linked list of listpack nodes. When list-compress-depth > 0, presently the nodes are compressed using LZF algorithm. It is simple but slow as compared to other LZ4. LZ4 decompresses 3 to 4 times faster and we will be vendoring it as part of #3195
Approach
Swap lzf_compress/lzf_decompress for LZ4_compress_default/LZ4_decompress_safe in __quicklistCompressNode and __quicklistDecompressNode. Block API, no framing overhead.
Decompress before RDB save to avoid RDB format changes.
Quicklist LZ4 vs LZF Benchmark Results
Workload: 200 lists × 2000 elements × 512B, list-compress-depth 2
Memory
| Algo |
Total Memory |
Per-List |
Savings vs Uncompressed |
| None |
197.2 MB |
1005 KB |
— |
| LZF |
13.4 MB |
421 KB |
93.2% |
| LZ4 |
12.6 MB |
419 KB |
93.6% |
Operation Latency (pipeline batch, lower is better)
| Operation |
None |
LZF |
LZ4 |
LZ4 vs LZF |
| LRANGE 0 -1 (100 lists) |
124ms |
218ms |
141ms |
1.5x faster |
| LRANGE 800-899 (500 ops) |
33ms |
59ms |
36ms |
1.6x faster |
| LINDEX (5000 interior) |
45ms |
85ms |
53ms |
1.6x faster |
| LSET (1000 interior modify) |
6ms |
14ms |
7ms |
1.9x faster |
| RPOP (5000 tail pops) |
53ms |
53ms |
53ms |
1.0x (uncompressed) |
Throughput (valkey-benchmark, 50 clients)
| Operation |
None |
LZF |
LZ4 |
LZ4 vs LZF |
| LPUSH (head) |
102K rps |
101K rps |
100K rps |
1.0x |
| LINDEX idx=500 |
102K rps |
63K rps |
98K rps |
1.6x |
| LRANGE 800-899 |
29K rps |
13K rps |
27K rps |
2.2x |
Bulk Populate (200 × 2000 = 400K elements)
| Algo |
Time |
vs LZF |
| None |
745ms |
— |
| LZF |
873ms |
baseline |
| LZ4 |
670ms |
1.3x faster |
Key Takeaways
- LZ4 is 1.5–2.2x faster than LZF for reads on compressed interior nodes
- LZ4 compresses slightly better than LZF (93.6% vs 93.2% savings)
- LZ4 nearly eliminates the compression penalty, LINDEX at 98K rps is within 4% of uncompressed (102K rps), vs LZF at 63K rps (38% penalty)
- LZ4 writes are 1.3x faster too (670ms vs 873ms bulk populate)
- Head/tail operations (LPUSH, RPOP) are unaffected — they access uncompressed nodes
Alternatives you've considered
ZSTD seems give similar performance to LZF.
Related
#3195: Streaming Compression for RDB and Replication (vendors LZ4)
#3423: Inline in-memory value compression (broader vision)
I can work on the design and implementation once we vendor LZ4 through #3195
The problem/use-case that the feature addresses
Valkey's LIST type values uses the quicklists internally, which is a doubly-linked list of listpack nodes. When list-compress-depth > 0, presently the nodes are compressed using LZF algorithm. It is simple but slow as compared to other LZ4. LZ4 decompresses 3 to 4 times faster and we will be vendoring it as part of #3195
Approach
Swap
lzf_compress/lzf_decompressforLZ4_compress_default/LZ4_decompress_safein__quicklistCompressNodeand__quicklistDecompressNode. Block API, no framing overhead.Decompress before RDB save to avoid RDB format changes.
Quicklist LZ4 vs LZF Benchmark Results
Workload: 200 lists × 2000 elements × 512B,
list-compress-depth 2Memory
Operation Latency (pipeline batch, lower is better)
Throughput (valkey-benchmark, 50 clients)
Bulk Populate (200 × 2000 = 400K elements)
Key Takeaways
Alternatives you've considered
ZSTD seems give similar performance to LZF.
Related
#3195: Streaming Compression for RDB and Replication (vendors LZ4)
#3423: Inline in-memory value compression (broader vision)
I can work on the design and implementation once we vendor LZ4 through #3195