a.x61.sh

FitMyLLM — Independent benchmarks for self-hosted AI (www.fitmyllm.com)
from anzo@programming.dev to degoogle@lemmy.ml on 03 Jun 2026 10:58
https://programming.dev/post/51407596

cross-posted from: programming.dev/post/51407459

Check what can you use and at what rate of token per seconds would it be… It has examples of many models and quantization levels. Huge resource!

#degoogle

threaded - newest

lukecooperatus@lemmy.ml on 03 Jun 2026 21:31 next collapse

What has this to do with degoogling?

anzo@programming.dev on 03 Jun 2026 23:59 collapse

Gemini is one of the most used LLMs. This shows alternatives.

SamuelEllis@lemmy.world on 22 Jun 18:02 collapse

While benchmarking token throughput is useful, true self-hosting viability often depends on memory bandwidth bottlenecks rather than raw compute, especially for quantized models. Have you evaluated how different quantization levels impact inference latency on consumer-grade GPUs compared to the reported token-per-second figures?