home
groups
about
login
help
TurboQuant compresses LLM key-value caches down to 3 bits per value. 6× memory reduction, up to 8× faster attention, and no 0 degradation.
(
research.google
)
in
technology@lemmy.ml
from
yogthos@lemmy.ml
on 25 Mar 22:17
comments
(
0
)