The KV Cache Bottleneck: How a 1984 Lemma Became the Foundation of LLM Compression
- Ji Wook Hwang

- 5 days ago
- 1 min read
From JL Lemma to TurboQuant - Part 1 of 2

Ji Wook Hwang
Lead Engineer, AI Solutions Team

This is Part 1 of a two-part series tracing the line from a 1984 mathematical lemma to TurboQuant, the KV cache quantization method now being integrated into llama.cpp and vLLM. Part 1 covers the KV Cache memory problem, the Johnson-Lindenstrauss theoretical foundation, and QJL — the first method to compress keys to a single bit while keeping inner-product estimates unbiased.
Part 2 will cover PolarQuant, TurboQuant, the RaBitQ dispute, and practical implementation lessons.


