Kairós is edge:

All the intelligence in a single chip.

Unleash the power of GenAI wherever you need it.

Kairós in action

Edge demos powered by the RaiderChip NPU — voice, text, and vision models running locally in real time.

Kairós ASIC Prototype Demonstrator

Download RaiderChip Kairós-1200 Brochure

Ultra-efficient. Ultra-lightweight. Ultra-simple.

TSMC 7nm

Power 10 W

>2 TFLOPS FP32

Up to 1.2 GHz

Compact design. Unlimited autonomy.

Fully Standalone chip

100% Offline operation

No active cooling required

Ultra-low power per token

Learn more...

Designed for the Edge and embedded systems.

- We free Generative AI from the network and heavyweight infrastructure

You choose the end-product. Kairós adds intelligence, with no strings attached.

Models run where they never had before: locally.

- Kairós is the first embedded fully offline Generative AI accelerator

- No latency		- No network
- Maximum performance		- Ultra-low power consumption

Choose. Tweak. Run.

- Kairós supports AI models' native data formats: Download and Run

Transformers run locally with Kairós

All the Intelligence in one chip

Bring vision, language, reasoning and creation
to your devices

+ Preserve the precision of the trained model

Kairós offers top-tier performance

while preserving every bit of precision

from Q4 up to FP32

+ Run model inference with full control

Model intelligence, the way it was meant.

Download and run models
in their original format

Llama, Phi, Gemma, DeepSeek, Qwen, ...

+ Quantized versions

Kairós supports 4-bit and 5-bit Quantization

- Increased inference speed: +276% more tokens/s

- Run larger models: -75% memory usage

+ Fine-tuned

Kairós seamlessly supports post-trained
and fine-tuned models

Qualcomm

36.9

Nvidia

43.9

RaiderChip

193.1

Tokens / second

Meta Llama 3.2-1B (w4 bits)

More Tokens, less resources: 🔗

Efficiency

→ 4.4x NVIDIA Jetson Orin Nano Super
→ 5.2x QUALCOMM Snapdragon 8 Elite Gen 5

Same model, smaller footprint, superior efficiency.

Click on a window to replay

Higher efficiency per memory bandwidth

Full Hardware Architecture 🔗

TSMC 7nm FinFet

128 bits LPDDR5X Memory

Embedded RISC-V CPU

>2 TFLOPS FP32 precision

10 Watts max

Click to see Block Diagram

Seamless integration: 🔗

Sampling 2027

Plug & Play simplicity:

SPI + USB onboard

Designed for instant setup:

from boot to prototyping, everything works right out of the box

As easy as plugging in a USB device

Parallelization Engineering 🔗

Every resource, every cycle, always in use.

Batch size up to 16

Multimodal:

Audio, video, text and image simultaneously

Multi-user:

Up to 16 inferences in parallel

Fast Prefill:

Don't wait for first token, process input prompt up to 16x faster

Performance across AI models 🔗

Maximum tokens per Memory Bandwidth

Kairós

offers industry-best token density per GB of memory and the lowest energy per token generated.

Model	Input Prompt Processing tok/s FP16	Max Speed per User tok/s FP16	Max Speed per User tok/s Q4
Meta Llama 2 7B	142.5	11.16	36.36
Meta Llama 3.18B	145.38	10.02	34.32
Meta Llama 3.21B	817.5	60.0	193.14
Meta Llama 3.23B	330.12	23.28	78.72
Google Gemma 31B		65.22
Alibaba Qwen 2.5 Coder1.5B	506.76	46.14	128.22
Alibaba Qwen 332B			7.92
Alibaba Qwen 314B			17.94
Alibaba Qwen 38B	134.88	9.9	33.66
Alibaba Qwen 34B			56.4
Alibaba Qwen 31.7B	509.1	42.72	139.44
Alibaba Qwen 30.6B	951.78	118.38	341.76
Microsoft Phi 22.7B		24.12	54.66
Microsoft Phi 3 mini4B		19.62
Microsoft Phi 4 mini4B		18.18
TII Falcon 31B	776.34	53.16	178.44
Fraunhofer Teuken 7B	155.04	9.9
DeepSeek R1 Distill Llama8B	145.68	10.02	34.32
DeepSeek R1 Distill Qwen14B	69.42	5.34	17.76
DeepSeek R1 Distill Qwen1.5B	509.64	46.14
DeepSeek R1 0528 Qwen 38B	134.82	9.9	33.66
DeepCoder Preview14B			17.76
OpenAI Whisper Small		311.4
Vyvo-TTS0.6B	951.78	118.38	341.76
Moondream 2 2B		28
FlowTransformer		49K

-Scalable by design-

Kairós is built to grow with you

Stack as many units as your solution requires.

No complexity. Just the performance you choose, when you need it.

Start Small

A single chip provides 342 tokens/s running Alibaba Qwen-3 0.6B

Run Bigger, Smarter, more...

Spread the load across multiple chips and overcome hardware limits.

Kairos: the core of a New Generation
of AI powered products 🔗

Choose the model.
Customize if you want...

Use it:
However, wherever and
whenever you want

On premises AI is:

Total Privacy 🔒

True autonomy 🤖

No subscription 🚫

Offline ✈️

Learn more...

Full Privacy and Independence:

Harness the power of the most complex LLMs locally, ensuring sensitive data stays on-premises, free from cloud dependencies and third-party oversight.

Offline Operation:

Operate independently in remote environments without requiring network connectivity

Customizable Models:

Run fine-tuned models tailored to specific tasks, such as industrial control, home automation, or other specialized applications

Autonomy:

Always available without reliance on external networks or cloud AI providers

Cost-Effective:

Kairós runs cutting-edge open models locally. No variable subscriptions to third parties, no API fees.

Anytime, Anywhere Intelligence:

Deliver sophisticated AI assistants that run reliably even in remote context with no internet coverage.

Security by Design:

Protect critical data and operations by running AI workloads entirely on your own premises. No cloud transfers, no third-party access — full control of your infrastructure and information.

Adding intelligence everywhere. 🔗

Automotive

An intelligent assistant always available on board

Thanks to its fully offline operation, guarantees reliable availability even in isolated areas without network coverage.

Home automation

Privacy without compromise.

A truly offline smart home.

Enjoy the full power of Generative AI, protecting the privacy of the ones you love most.

Industry

Monitor, diagnose, and act in real time without sending data outside your facility.

Local processing. Instant decisions.

No latency, no unnecessary data traffic.

Learning devices and smart toys

No connection. No compromise on privacy.

Educational materials and smart toys with minimal power consumption and fully local operation.

Get Started Today!

Contact us to begin evaluating our accelerators

See firsthand how our AI solutions transform your devices

Experience the future of Generative AI acceleration with RaiderChip

Kairós is edge:

All the intelligence in a single chip.

Unleash the power of GenAI wherever you need it.

Kairós in action

Ultra-efficient. Ultra-lightweight. Ultra-simple.

TSMC 7nm

Power 10 W

>2 TFLOPS FP32

Up to 1.2 GHz

Compact design. Unlimited autonomy.

Fully Standalone chip 100% Offline operation No active cooling required Ultra-low power per token

Learn more...

Designed for the Edge and embedded systems.

- We free Generative AI from the network and heavyweight infrastructure

You choose the end-product. Kairós adds intelligence, with no strings attached.

Models run where they never had before: locally.

- Kairós is the first embedded fully offline Generative AI accelerator

- No latency

- No network

- Maximum performance

- Ultra-low power consumption

Choose. Tweak. Run.

- Kairós supports AI models' native data formats: Download and Run

Transformers run locally with Kairós

All the Intelligence in one chip

Bring vision, language, reasoning and creationto your devices

+ Preserve the precision of the trained model

Kairós offers top-tier performance

while preserving every bit of precision

from Q4 up to FP32

+ Run model inference with full control

Model intelligence, the way it was meant.

Download and run modelsin their original format

Llama, Phi, Gemma, DeepSeek, Qwen, ...

+ Quantized versions

Kairós supports 4-bit and 5-bit Quantization

- Increased inference speed: +276% more tokens/s

- Run larger models: -75% memory usage

+ Fine-tuned

Kairós seamlessly supports post-trainedand fine-tuned models

Tokens / second

Meta Llama 3.2-1B (w4 bits)

More Tokens, less resources: 🔗

Efficiency

→ 4.4x NVIDIA Jetson Orin Nano Super→ 5.2x QUALCOMM Snapdragon 8 Elite Gen 5

Same model, smaller footprint, superior efficiency.

Click on a window to replay

Higher efficiency per memory bandwidth

Full Hardware Architecture 🔗

TSMC 7nm FinFet

128 bits LPDDR5X Memory

Embedded RISC-V CPU

>2 TFLOPS FP32 precision

10 Watts max

Click to see Block Diagram

Seamless integration: 🔗

Sampling 2027

Plug & Play simplicity:

SPI + USB onboard

Designed for instant setup:

from boot to prototyping, everything works right out of the box

As easy as plugging in a USB device

Parallelization Engineering 🔗

Every resource, every cycle, always in use.

Batch size up to 16

Multimodal:

Audio, video, text and image simultaneously

Multi-user:

Up to 16 inferences in parallel

Fast Prefill:

Don't wait for first token, process input prompt up to 16x faster

Performance across AI models 🔗

Maximum tokens per Memory Bandwidth

Kairós

offers industry-best token density per GB of memory and the lowest energy per token generated.

Model

Input Prompt Processing

Max Speed per User

Max Speed per User

Meta Llama 2 7B

Fully Standalone chip

100% Offline operation

No active cooling required

Ultra-low power per token

Bring vision, language, reasoning and creation
to your devices

Download and run models
in their original format

Kairós seamlessly supports post-trained
and fine-tuned models

→ 4.4x NVIDIA Jetson Orin Nano Super
→ 5.2x QUALCOMM Snapdragon 8 Elite Gen 5