Open AI shenanigans in Cerberas IPO. They have a deal to pay a minimum of $10.53/m tokens (under 100% 24/7 capacity utilization) that is more like $40-$50/m in real world. They charge $14/m
from humanspiral@lemmy.ca to technology@lemmy.ml on 15 May 01:39
https://lemmy.ca/post/64880124

Cerberas had a good IPO today. Their technology is very fast, but in terms of throughput, Nvidia NVL72 is 7x cheaper. FYI, Huawei current and last generation is even cheaper than NVIDIA per token. One advantage they have, though is that production is not dependent on HBM3/4 ram. A big disadvantage, is their software is difficult, and they don’t offer any models newer than about 1 year old, as they are slow in implementing bleeding edge optimizations. Long contexts also are extra slow relatively on cerberas.

Their 2nd customer is OpenAI. Under a $20B 3 year lease for up to 750mw of compute (equal to 6 years of 250mw blocks) the most optimistic cost per token possible for OpenAI is $10.53/m. 100% utilization. Real world realistic optimism is 20% capacity = over $50/m. OpenAI is initially using cluster to run Codex-Spark 5.3, which they charge customers $14/m tokens. OpenAI also has the privilege of paying for all OPEX. Power alone at just 7c/kwh, adds 50c/m tokens ideal.

Their first customer was UAE monarchy owned group, g42. Even if UAE has permission for NVidia, cerberas has quicker delivery, and UAE helped with/controls software. Apparently, Arabic has advantages on the chip, but they are still planning on Nvidia dominated based expansions with patriot air defense guarding systems.

#technology

threaded - newest