The KV Cache Bottleneck: How a 1984 Lemma Became the Foundation of LLM Compression

Ji Wook Hwang
5 days ago
1 min read

From JL Lemma to TurboQuant - Part 1 of 2

Ji Wook Hwang

Lead Engineer, AI Solutions Team

This is Part 1 of a two-part series tracing the line from a 1984 mathematical lemma to TurboQuant, the KV cache quantization method now being integrated into llama.cpp and vLLM. Part 1 covers the KV Cache memory problem, the Johnson-Lindenstrauss theoretical foundation, and QJL — the first method to compress keys to a single bit while keeping inner-product estimates unbiased.

Part 2 will cover PolarQuant, TurboQuant, the RaBitQ dispute, and practical implementation lessons.