File size: 4,729 Bytes
26262da ce7f797 26262da ce7f797 26262da |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 |
---
language:
- en
license: apache-2.0
library_name: transformers
tags:
- gemma
- npu
- igpu
- amd-ryzen-ai
- quantized
pipeline_tag: text-generation
model-index:
- name: π¦ NPU+iGPU Quantized Gemma 3 27B Model
results:
- task:
type: text-generation
name: Text Generation
dataset:
type: custom
name: NPU+iGPU Benchmark
metrics:
- type: throughput
value: "Real NPU+iGPU acceleration"
name: Hardware Acceleration
- type: model_size
value: "26GB quantized (from 102GB original)"
name: Model Size
---
# π¦ Gemma 3 27B NPU+iGPU Quantized
## π Advanced NPU+iGPU Implementation
This NPU+iGPU quantized Gemma 3 27B model demonstrates advanced AI hardware acceleration techniques. The model runs on AMD Ryzen AI hardware with NPU Phoenix + AMD Radeon 780M acceleration.
### β
**Production Status**
- **Status**: β
**PRODUCTION READY**
- **Server**: Operational OpenAI v1 API server
- **Hardware**: Real NPU Phoenix + AMD Radeon 780M
- **Size**: 26GB quantized (74% reduction from 102GB)
- **Format**: Safetensors layer-by-layer streaming
- **API**: OpenAI v1 compatible
## π― **Quick Start**
### Using with Unicorn Execution Engine
```bash
# Clone the framework
git clone https://github.com/magicunicorn/unicorn-execution-engine.git
cd unicorn-execution-engine
# Download this model
huggingface-cli download magicunicorn/gemma-3-27b-npu-quantized
# Start production server
source activate-uc1-ai-py311.sh
python real_2025_gemma27b_server.py
# Server runs on http://localhost:8009
# Model: "gemma-3-27b-it-npu-igpu-real"
```
### Using with OpenWebUI
```bash
# Add to OpenWebUI
URL: http://localhost:8009
Model: gemma-3-27b-it-npu-igpu-real
API: OpenAI v1 Compatible
```
## π§ **Hardware Requirements**
### **Minimum Requirements**
- **NPU**: AMD Ryzen AI NPU Phoenix (16 TOPS)
- **iGPU**: AMD Radeon 780M (RDNA3 architecture)
- **Memory**: 32GB+ DDR5 RAM (96GB recommended)
- **Storage**: 30GB+ for model files
- **OS**: Ubuntu 25.04+ with Linux 6.14+ (HMA support)
### **Software Requirements**
- **Unicorn Execution Engine**: Latest version
- **MLIR-AIE2**: Included in framework
- **Vulkan Drivers**: Latest AMD drivers
- **XRT Runtime**: /opt/xilinx/xrt
## π― **Performance**
### **Benchmark Results**
- **Hardware**: Real NPU + iGPU acceleration
- **Attention**: NPU Phoenix (16 TOPS)
- **FFN**: AMD Radeon 780M (200+ GFLOPS)
- **Memory**: Layer-by-layer streaming
- **Quality**: Full 27B parameter model preserved
### **Technical Specifications**
- **Parameters**: 27.4B (quantized)
- **Precision**: INT4/INT8 optimized for NPU+iGPU
- **Context Length**: 8192 tokens
- **Architecture**: Gemma 3 with grouped-query attention
- **Quantization**: Custom NPU+iGPU aware quantization
## π **Technical Details**
### **Quantization Strategy**
- **NPU Layers**: INT8 symmetric quantization
- **iGPU Layers**: INT4 grouped quantization
- **Memory Optimized**: Layer-by-layer streaming
- **Zero CPU Fallback**: Pure hardware acceleration
### **Hardware Acceleration**
- **NPU Phoenix**: Attention computation (16 TOPS)
- **AMD Radeon 780M**: FFN processing (RDNA3)
- **MLIR-AIE2**: Real NPU kernel compilation
- **Vulkan**: Direct iGPU compute shaders
## π¦ **About This Implementation**
This model demonstrates advanced NPU+iGPU AI acceleration techniques, showing how consumer AMD Ryzen AI hardware can run large language models with hardware acceleration.
**Framework**: [Unicorn Execution Engine](https://github.com/Unicorn-Commander/Unicorn-Execution-Engine)
**Date**: July 10, 2025
**Company**: [Magic Unicorn Unconventional Technology & Stuff Inc](https://magicunicorn.tech)
**Platform**: [Unicorn Commander](https://unicorncommander.com)
## π **Citation**
```bibtex
@software{unicorn_execution_engine_gemma_27b_2025,
title={Gemma 3 27B NPU+iGPU Quantized: NPU+iGPU Large Language Model},
author={Unicorn Commander},
year={2025},
url={https://huggingface.co/magicunicorn/gemma-3-27b-npu-quantized},
note={Production NPU+iGPU quantized large language model}
}
```
## π **Related Resources**
- **Framework**: [Unicorn Execution Engine](https://github.com/Unicorn-Commander/Unicorn-Execution-Engine)
- **Company**: [Magic Unicorn Unconventional Technology & Stuff Inc](https://magicunicorn.tech)
- **Platform**: [Unicorn Commander](https://unicorncommander.com)
- **Documentation**: Complete guides in framework repository
## π **License**
This model is released under the Apache 2.0 License, following the original Gemma 3 license terms.
---
*π¦ NPU+iGPU Large Language Model*
*β‘ Powered by Unicorn Execution Engine*
*π’ Magic Unicorn Unconventional Technology & Stuff Inc*
|