Edge AI + Cloud AI – RaiderChip

Demonstrating Success in Cloud and Edge Applications

Edge Implementation

Our edge demo executes the Meta Llama 2, 3.0, 3.1 and Llama 3.2 models, as well as Microsoft Phi-2 LLM 2.7B model on cost-effective AMD Versal FPGA devices, achieving interactive speeds on the vanilla LLM even without quantization, this allows extracting the highest precision: achieving the maximum intelligence of the AI model or, optionally, using Quantization to obtain the maximum speed.

GenAI v1-Q running the Llama 3.2 1B LLM model with 4 bits Quantization on a low-cost Versal FPGA with LPDDR4 memory

GenAI v1-Q running the Llama 3.2 1B LLM model with 4 bits Quantization on a low-cost Versal FPGA with LPDDR4 memory

This demo does not need external CPUs, being perfectly suited for stand-alone, low-cost products.

Cloud Implementation

Integrated with AMD FPGAs in the AWS cloud, our cloud demonstrator leverages AWS F1 instances carrying AMD Virtex UltraScale+ FPGAs. This setup has enabled us to showcase a compelling and engaging AI chat model (Llama 2-7B), accessible via a web server for remote interactions, mirroring ChatGPT-style applications.