top of page

The KV Cache Bottleneck: How a 1984 Lemma Became the Foundation of LLM Compression

  • Writer: Ji Wook Hwang
    Ji Wook Hwang
  • 5 days ago
  • 1 min read

From JL Lemma to TurboQuant - Part 1 of 2


Ji Wook Hwang

Lead Engineer, AI Solutions Team




This is Part 1 of a two-part series tracing the line from a 1984 mathematical lemma to TurboQuant, the KV cache quantization method now being integrated into llama.cpp and vLLM. Part 1 covers the KV Cache memory problem, the Johnson-Lindenstrauss theoretical foundation, and QJL — the first method to compress keys to a single bit while keeping inner-product estimates unbiased.


Part 2 will cover PolarQuant, TurboQuant, the RaBitQ dispute, and practical implementation lessons.


bottom of page