File size: 4,729 Bytes
26262da
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ce7f797
26262da
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ce7f797
26262da
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
---
language:
- en
license: apache-2.0
library_name: transformers
tags:
- gemma
- npu
- igpu
- amd-ryzen-ai
- quantized
pipeline_tag: text-generation
model-index:
- name: πŸ¦„ NPU+iGPU Quantized Gemma 3 27B Model
  results:
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      type: custom
      name: NPU+iGPU Benchmark
    metrics:
    - type: throughput
      value: "Real NPU+iGPU acceleration"
      name: Hardware Acceleration
    - type: model_size
      value: "26GB quantized (from 102GB original)"
      name: Model Size
---

# πŸ¦„ Gemma 3 27B NPU+iGPU Quantized

## πŸš€ Advanced NPU+iGPU Implementation

This NPU+iGPU quantized Gemma 3 27B model demonstrates advanced AI hardware acceleration techniques. The model runs on AMD Ryzen AI hardware with NPU Phoenix + AMD Radeon 780M acceleration.

### βœ… **Production Status**
- **Status**: βœ… **PRODUCTION READY**
- **Server**: Operational OpenAI v1 API server
- **Hardware**: Real NPU Phoenix + AMD Radeon 780M
- **Size**: 26GB quantized (74% reduction from 102GB)
- **Format**: Safetensors layer-by-layer streaming
- **API**: OpenAI v1 compatible

## 🎯 **Quick Start**

### Using with Unicorn Execution Engine

```bash
# Clone the framework
git clone https://github.com/magicunicorn/unicorn-execution-engine.git
cd unicorn-execution-engine

# Download this model
huggingface-cli download magicunicorn/gemma-3-27b-npu-quantized

# Start production server
source activate-uc1-ai-py311.sh
python real_2025_gemma27b_server.py

# Server runs on http://localhost:8009
# Model: "gemma-3-27b-it-npu-igpu-real"
```

### Using with OpenWebUI

```bash
# Add to OpenWebUI
URL: http://localhost:8009
Model: gemma-3-27b-it-npu-igpu-real
API: OpenAI v1 Compatible
```

## πŸ”§ **Hardware Requirements**

### **Minimum Requirements**
- **NPU**: AMD Ryzen AI NPU Phoenix (16 TOPS)
- **iGPU**: AMD Radeon 780M (RDNA3 architecture)
- **Memory**: 32GB+ DDR5 RAM (96GB recommended)
- **Storage**: 30GB+ for model files
- **OS**: Ubuntu 25.04+ with Linux 6.14+ (HMA support)

### **Software Requirements**
- **Unicorn Execution Engine**: Latest version
- **MLIR-AIE2**: Included in framework
- **Vulkan Drivers**: Latest AMD drivers
- **XRT Runtime**: /opt/xilinx/xrt

## 🎯 **Performance**

### **Benchmark Results**
- **Hardware**: Real NPU + iGPU acceleration
- **Attention**: NPU Phoenix (16 TOPS) 
- **FFN**: AMD Radeon 780M (200+ GFLOPS)
- **Memory**: Layer-by-layer streaming
- **Quality**: Full 27B parameter model preserved

### **Technical Specifications**
- **Parameters**: 27.4B (quantized)
- **Precision**: INT4/INT8 optimized for NPU+iGPU
- **Context Length**: 8192 tokens
- **Architecture**: Gemma 3 with grouped-query attention
- **Quantization**: Custom NPU+iGPU aware quantization

## πŸ“š **Technical Details**

### **Quantization Strategy**
- **NPU Layers**: INT8 symmetric quantization
- **iGPU Layers**: INT4 grouped quantization  
- **Memory Optimized**: Layer-by-layer streaming
- **Zero CPU Fallback**: Pure hardware acceleration

### **Hardware Acceleration**
- **NPU Phoenix**: Attention computation (16 TOPS)
- **AMD Radeon 780M**: FFN processing (RDNA3)
- **MLIR-AIE2**: Real NPU kernel compilation
- **Vulkan**: Direct iGPU compute shaders

## πŸ¦„ **About This Implementation**

This model demonstrates advanced NPU+iGPU AI acceleration techniques, showing how consumer AMD Ryzen AI hardware can run large language models with hardware acceleration.

**Framework**: [Unicorn Execution Engine](https://github.com/Unicorn-Commander/Unicorn-Execution-Engine)  
**Date**: July 10, 2025  
**Company**: [Magic Unicorn Unconventional Technology & Stuff Inc](https://magicunicorn.tech)  
**Platform**: [Unicorn Commander](https://unicorncommander.com)  

## πŸ“– **Citation**

```bibtex
@software{unicorn_execution_engine_gemma_27b_2025,
  title={Gemma 3 27B NPU+iGPU Quantized: NPU+iGPU Large Language Model},
  author={Unicorn Commander},
  year={2025},
  url={https://huggingface.co/magicunicorn/gemma-3-27b-npu-quantized},
  note={Production NPU+iGPU quantized large language model}
}
```

## πŸ“š **Related Resources**

- **Framework**: [Unicorn Execution Engine](https://github.com/Unicorn-Commander/Unicorn-Execution-Engine)
- **Company**: [Magic Unicorn Unconventional Technology & Stuff Inc](https://magicunicorn.tech)  
- **Platform**: [Unicorn Commander](https://unicorncommander.com)
- **Documentation**: Complete guides in framework repository

## πŸ”’ **License**

This model is released under the Apache 2.0 License, following the original Gemma 3 license terms.

---

*πŸ¦„ NPU+iGPU Large Language Model*  
*⚑ Powered by Unicorn Execution Engine*  
*🏒 Magic Unicorn Unconventional Technology & Stuff Inc*