GenAI IP

Embedded AI acceleration

Simple

Local

Reprogrammable

— Our IP —

The GenAI IP is the smallest version of our NPU, tailored to small devices such as FPGAs and Adaptive SoCs, where the maximum Frequency is limited (<=250 MHz) and Memory Bandwidth is lower (<=100 GB/s).

For the highest throughput versions of RaiderChip NPU, please see Kair贸s Edge AI and Ai贸n Cloud AI ASIC devices. You can also draft your own AI accelerator in a few steps with a Custom NPU

— Highest Performance — 馃敆

Tokens per second
90
80
70
60
50
40
30
20
10
0
82.23 tok/s
+110%
39.11 tok/s
45.06 tok/s
+93%
23.35 tok/s
37.87 tok/s
+98%
19.11 tok/s
23.68 tok/s
+44%
16.41 tok/s
17.18 tok/s
+55%
11.11 tok/s
16.28 tok/s
+56%
10.43 tok/s
Llama 3.2 1B Q4 Qwen3 0.6B Llama 3.2 3B Q4 Qwen3 1.7B DeepSeek R1 Llama 8B Q4 DeepSeek R1 0528 8B Q4
RaiderChip NPU (on FPGA)
NVIDIA Jetson Orin

Reference: "Edge Deployment of Small Language Models, a comprehensive comparison of CPU, GPU and NPU backends"

Computer Engineering Research Group - University of Cantabria

— Outstanding Power Efficiency — 馃敆

Tokens per Joule
4.0
3.0
2.0
1.0
0
3.36 tok/J
+63%
2.07 tok/J
1.95 tok/J
+30%
1.50 tok/J
1.50 tok/J
+61%
0.93 tok/J
0.99 tok/J
+13%
0.88 tok/J
0.68 tok/J
+32%
0.51 tok/J
0.65 tok/J
+30%
0.49 tok/J
Llama 3.2 1B Q4 Qwen3 0.6B Llama 3.2 3B Q4 Qwen3 1.7B DeepSeek R1 Llama 8B Q4 DeepSeek R1 0528 8B Q4
RaiderChip NPU (on FPGA)
NVIDIA Jetson Orin

Reference: "Edge Deployment of Small Language Models, a comprehensive comparison of CPU, GPU and NPU backends"

Computer Engineering Research Group - University of Cantabria

— GenAI NPU — 馃敆

The core of Generative AI on FPGA

AMD presenting RaiderChip GenAI NPU Edge LLM demo in its booth at ISE 2025 (video by Charbax)

— Maximum flexibility — 馃敆

GenAI NPU IP core

Variants, Performance and Size

IP CORE VARIANT
1x
2x
4x
8x
MEMORY BANDWIDTH
(directly proportional to achievable AI speed)
12 GB/s
24 GB/s
48 GB/s
96 GB/s
AI INFERENCE SPEED
with 4-bits Quantization
@ 250 MHz
Llama 3.2-1B:
16 tokens/sec
Llama 3.1-8B:
3 tokens/sec
Llama 3.2-1B:
32 tokens/sec
Llama 3.1-8B:
6 tokens/sec
Llama 3.2-1B:
58 tokens/sec
Llama 3.1-8B:
11 tokens/sec
Llama 3.2-1B:
95 tokens/sec
Llama 3.1-8B:
18 tokens/sec
Supported Data Formats
FP32, FP16, BF16, FP8, Q5_K (5-bits), Q4_K (4-bits)
IP CORE SIZE
LUT
REG
DSP
B/URAM

74K
141K
384
19/256

110K
187K
648
19/256

181K
277K
1176
19/256

321K
458K
2232
19/256

Our highly configurable hardware is compatible with a wide range of FPGAs.

Choose your device

We will do the rest!

Try our Technology Demonstrator 馃敆

Download and run Generative AI models from HuggingFace, or try your own fine-tuned ones, on your premises

We provide everything needed to boot your own FPGA board as a GenAI accelerator!

A hands-on, interactive demo.

Chat directly with the model.

Evaluate latency, speed and response quality.

Multi-modal capabilities 馃敆

Using Moondream 2 AI VLM

running at 21 tok/s using our FPGA demo

Current Image
Click image to advance, or wait for conversations to complete

Performance and Power 馃敆

across AI models

Maximum tokens per Memory Bandwidth

&
Minimum watts per token

Model

Speed

tok/s FP16

Efficiency

tok/s / watt F16

Speed

tok/s Q4

Efficiency

tok/s / watt Q4

Meta Llama 2 7B

6.91
0.27
19.77
0.78

Meta Llama 3.1 8B

6.22
-
18.14
-

Meta Llama 3.2 1B

34.43
1.39
93.03
3.72

Meta Llama 3.2 3B

12.69
0.5
41.47
1.64

Google Gemma 31B

30.94
-
-

Alibaba Qwen 2.5 Coder 1.5B

22.67
0.97
53.62
2.27

Alibaba Qwen 3 32B

-
4.40
-

Alibaba Qwen 3 8B

6.05
0.24
17.32
0.7

Alibaba Qwen 3 4B

-
26.06
-

Alibaba Qwen 3 1.7B

24.68
1.02
58.46
2.47

Alibaba Qwen 3 0.6B

48.73
2.1
106.74
4.68

Microsoft Phi 2 2.7B

9.04
-
16.64
-

Microsoft Phi 3 mini 4B

10.84
-
-

Microsoft Phi 4 mini 4B

8.22
-
-

TII Falcon 3 1B

31.78
1.28
87.75
3.48

Fraunhofer Teuken 7B

-
5.94
-

DeepSeek R1 Distill Llama 8B

6.22
0.25
18.13
0.7

DeepSeek R1 Distill Qwen 14B

2.98
0.12
9.11
0.37

DeepSeek R1 Distill Qwen 1.5B

22.68
-
-

DeepSeek R1 0528 Qwen 3 8B

6.05
0.24
17.31
0.68

OpenAI Whisper Small

107.4
-
-

Vyvo-TTS 0.6B

54
-
-

Moondream 2 2B

21
-
-

FlowTransformer

13K
-
-

Harness the power of the most complex LLMs locally,

ensuring sensitive data stays on-premises

Fully reprogrammable solutions

Flexible updates and adjustments, ensuring maximum technological adaptability.

Autonomy in remote environments

Stable operation, forget variable connectivity latency and availability.

Offline Generative AI

Designed to operate independently: no security breaches, no subscriptions, no dependencies.

Any AI models, plus yours

Run commercially licensed as well as open-source LMs, or deploy fine-tuned/post-trained versions tailored to your specific needs.

Get Started Today!

Contact us to begin evaluating our accelerators

See firsthand how our AI solutions transform your devices

Experience the future of Generative AI acceleration with RaiderChip