d-Matrix, a pioneer in low latency AI inference compute for data centers, and Gimlet Labs, an applied AI research and product company, today announced that Gimlet is incorporating d-Matrix CorsairTM accelerators into the Gimlet Cloud alongside traditional GPUs to deliver 10x speed ups for agentic AI inference workloads.
d-Matrix and Gimlet’s combined solution can deliver order-of-magnitude performance increases on both inference latency and throughput per Watt compared to traditional GPU-only deployments. The solution is ideal for latency-sensitive workloads including speculative decoding, which is commonly adopted by large-scale AI deployments to reduce latency.
With d-Matrix Corsair accelerators on Gimlet’s Cloud, workloads already well-optimized for agentic AI can achieve even greater performance gains, enabling token delivery speeds that enable industry-leading levels of interactivity required for today’s most critical applications.
“Model providers are spending billions on inference, and the demand for fast tokens is higher than ever – but power remains a scarce resource,” said Zain Asgar, founder and CEO of Gimlet Labs. “d-Matrix hardware is the ideal solution for the phases of inference that GPUs waste energy on. By leveraging Corsair for use cases like speculative decoding, we can deliver dramatically faster performance for our customers for the same footprint.”
“From day one, d-Matrix has been uniquely focused on inference, founded on our belief that inference would not be a one-size-fits-all compute problem. As the only multi-silicon inference cloud, Gimlet is leading the industry with a fundamental new approach that delivers dramatic leaps forward in performance that homogeneous infrastructure simply cannot deliver,” said Sid Sheth, founder and CEO of d-Matrix. “With power limits capping how fast AI can advance, it’s imperative that AI service providers have the right tools for the right job and that we embrace doing more with less.”
Gimlet’s software stack is the first to intelligently divide and map agentic workloads across a variety of accelerators spanning multiple vendors, generations and architectures and runs each segment on the most optimal hardware. Gimlet’s datacenters incorporate these different hardware types and connect them via high-speed interconnects to serve frontier labs and other AI native companies.
d-Matrix Corsair’s unique memory-optimized architecture delivers high memory bandwidth and low latency, making it ideal for running memory-bound portions of the AI model. Corsair ships as a standard PCIe card with air cooling, which enables rapid deployments in existing data centers.
Read Also: LTM Recognized as Innovator in Avasant’s GenAI Services 2025 RadarView™











































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































