Key Features or Updates
Lambda reports up to 29% higher throughput on NVIDIA Blackwell Ultra compared with prior Blackwell hardware for GPT-OSS-120B. It also says that upgrading CUDA from 12.9 to 13.1 improved throughput by up to 9% on the same hardware. On the routing side, Lambda's BLAZE optimization cut time-to-first-token P99 latency by 31% without requiring model retraining.
Impact on Cloud Costs & Architecture
These results show that software maturity can produce meaningful cost and latency gains even before hardware changes. For AI operators, that can improve the economics of serving large models at scale. The broader architectural takeaway is that inference performance depends on the full stack: GPU selection, runtime tuning, and routing strategy all matter.
Next Steps
If you are planning an inference deployment, benchmark after every major software or driver upgrade instead of assuming the hardware is the only variable. Track throughput, tail latency, and token economics together. For capacity planning, use these kinds of results to decide whether you can squeeze more out of existing hardware before paying for a refresh.