File size: 4,729 Bytes

---
language:
- en
license: apache-2.0
library_name: transformers
tags:
- gemma
- npu
- igpu
- amd-ryzen-ai
- quantized
pipeline_tag: text-generation
model-index:
- name: 🦄 NPU+iGPU Quantized Gemma 3 27B Model
  results:
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      type: custom
      name: NPU+iGPU Benchmark
    metrics:
    - type: throughput
      value: "Real NPU+iGPU acceleration"
      name: Hardware Acceleration
    - type: model_size
      value: "26GB quantized (from 102GB original)"
      name: Model Size
---

# 🦄 Gemma 3 27B NPU+iGPU Quantized

## 🚀 Advanced NPU+iGPU Implementation

This NPU+iGPU quantized Gemma 3 27B model demonstrates advanced AI hardware acceleration techniques. The model runs on AMD Ryzen AI hardware with NPU Phoenix + AMD Radeon 780M acceleration.

### ✅ **Production Status**
- **Status**: ✅ **PRODUCTION READY**
- **Server**: Operational OpenAI v1 API server
- **Hardware**: Real NPU Phoenix + AMD Radeon 780M
- **Size**: 26GB quantized (74% reduction from 102GB)
- **Format**: Safetensors layer-by-layer streaming
- **API**: OpenAI v1 compatible

## 🎯 **Quick Start**

### Using with Unicorn Execution Engine

```bash
# Clone the framework
git clone https://github.com/magicunicorn/unicorn-execution-engine.git
cd unicorn-execution-engine

# Download this model
huggingface-cli download magicunicorn/gemma-3-27b-npu-quantized

# Start production server
source activate-uc1-ai-py311.sh
python real_2025_gemma27b_server.py

# Server runs on http://localhost:8009
# Model: "gemma-3-27b-it-npu-igpu-real"
```

### Using with OpenWebUI

```bash
# Add to OpenWebUI
URL: http://localhost:8009
Model: gemma-3-27b-it-npu-igpu-real
API: OpenAI v1 Compatible
```

## 🔧 **Hardware Requirements**

### **Minimum Requirements**
- **NPU**: AMD Ryzen AI NPU Phoenix (16 TOPS)
- **iGPU**: AMD Radeon 780M (RDNA3 architecture)
- **Memory**: 32GB+ DDR5 RAM (96GB recommended)
- **Storage**: 30GB+ for model files
- **OS**: Ubuntu 25.04+ with Linux 6.14+ (HMA support)

### **Software Requirements**
- **Unicorn Execution Engine**: Latest version
- **MLIR-AIE2**: Included in framework
- **Vulkan Drivers**: Latest AMD drivers
- **XRT Runtime**: /opt/xilinx/xrt

## 🎯 **Performance**

### **Benchmark Results**
- **Hardware**: Real NPU + iGPU acceleration
- **Attention**: NPU Phoenix (16 TOPS) 
- **FFN**: AMD Radeon 780M (200+ GFLOPS)
- **Memory**: Layer-by-layer streaming
- **Quality**: Full 27B parameter model preserved

### **Technical Specifications**
- **Parameters**: 27.4B (quantized)
- **Precision**: INT4/INT8 optimized for NPU+iGPU
- **Context Length**: 8192 tokens
- **Architecture**: Gemma 3 with grouped-query attention
- **Quantization**: Custom NPU+iGPU aware quantization

## 📚 **Technical Details**

### **Quantization Strategy**
- **NPU Layers**: INT8 symmetric quantization
- **iGPU Layers**: INT4 grouped quantization  
- **Memory Optimized**: Layer-by-layer streaming
- **Zero CPU Fallback**: Pure hardware acceleration

### **Hardware Acceleration**
- **NPU Phoenix**: Attention computation (16 TOPS)
- **AMD Radeon 780M**: FFN processing (RDNA3)
- **MLIR-AIE2**: Real NPU kernel compilation
- **Vulkan**: Direct iGPU compute shaders

## 🦄 **About This Implementation**

This model demonstrates advanced NPU+iGPU AI acceleration techniques, showing how consumer AMD Ryzen AI hardware can run large language models with hardware acceleration.

**Framework**: [Unicorn Execution Engine](https://github.com/Unicorn-Commander/Unicorn-Execution-Engine)  
**Date**: July 10, 2025  
**Company**: [Magic Unicorn Unconventional Technology & Stuff Inc](https://magicunicorn.tech)  
**Platform**: [Unicorn Commander](https://unicorncommander.com)  

## 📖 **Citation**

```bibtex
@software{unicorn_execution_engine_gemma_27b_2025,
  title={Gemma 3 27B NPU+iGPU Quantized: NPU+iGPU Large Language Model},
  author={Unicorn Commander},
  year={2025},
  url={https://huggingface.co/magicunicorn/gemma-3-27b-npu-quantized},
  note={Production NPU+iGPU quantized large language model}
}
```

## 📚 **Related Resources**

- **Framework**: [Unicorn Execution Engine](https://github.com/Unicorn-Commander/Unicorn-Execution-Engine)
- **Company**: [Magic Unicorn Unconventional Technology & Stuff Inc](https://magicunicorn.tech)  
- **Platform**: [Unicorn Commander](https://unicorncommander.com)
- **Documentation**: Complete guides in framework repository

## 🔒 **License**

This model is released under the Apache 2.0 License, following the original Gemma 3 license terms.

---

*🦄 NPU+iGPU Large Language Model*  
*⚡ Powered by Unicorn Execution Engine*  
*🏢 Magic Unicorn Unconventional Technology & Stuff Inc*