Redefining efficiency
Industry-best token throughput per TB/s.
Overcoming the Generative AI bottleneck:
Memory Bandwidth
Performance approaching the physical limit — over 90% efficiency at scale.
Industry-best token throughput per TB/s.
Overcoming the Generative AI bottleneck:
Memory Bandwidth
Performance approaching the physical limit — over 90% efficiency at scale.
Reduced Power Consumption and Silicon cost
Maximum Tokens per Watt with Minimum silicon area
from foundry to deployment
One core, multiple architectures
Discover a Generative AI inference solution that scales linearly.
|
Multiple memory technologies DDR LPDDR HBM Scalable blocks to match memory bandwidth |
Extreme scalability from 250 MHz on FPGA Generative AI with no limits |
|
Future-proof target agnostic architecture 12nm, 7nm, and 3nm One flexible architecture for any target |
An open design that adapts to each application Edge & Cloud One NPU to power your vision |
Minimalism as a principle
100% Hardware accelerated Transformers architecture.
Model support out of the box
Reconfigurable data pipeline through embedded RISC-V
Support future models without hardware redesign
Edge AI inference ASIC
Llama, Phi, DeepSeek, Qwen ... all Transformer models on a small form-factor chip.
Generative AI everywhere you need.
Reprogrammable. Your way.
Configure your own Generative AI accelerator for your Adaptive SoC.
Choose the device that best fits your needs.
Cloud AI inference ASIC
More tokens per second, less cost per token. Leading cloud inference performance.
+100% Nvidia H100
+50% Google TPU v6e
Flexibility: The core of our technology
From the fastest Edge inferencer at 9mm2 to HBM server chips: the performance your application needs.
Discover our custom NPU design services