a.x61.sh

TurboQuant compresses LLM key-value caches down to 3 bits per value. 6× memory reduction, up to 8× faster attention, and no 0 degradation. (research.google)
in technology@lemmy.ml from yogthos@lemmy.ml on 25 Mar 22:17
comments (0)