Upload README.md with huggingface_hub

Files changed (1) hide show

README.md CHANGED Viewed

@@ -52,7 +52,40 @@ This is an **ultra-compact 2-bit quantized version** of Kimi K2 Instruct, optimi
 ## 🎯 Quick Start
-### Installation
 ```bash
 pip install mlx-lm

 ## 🎯 Quick Start
+#
+## Hardware Requirements
+Kimi-K2 is a massive 671B parameter MoE model. Choose your quantization based on available unified memory:
+| Quantization | Model Size | Min RAM | Quality |
+|:------------:|:----------:|:-------:|:--------|
+| **2-bit** | ~84 GB | 96 GB | Acceptable - some quality loss |
+| **3-bit** | ~126 GB | 128 GB | Good - recommended minimum |
+| **4-bit** | ~168 GB | 192 GB | Very Good - best quality/size balance |
+| **5-bit** | ~210 GB | 256 GB | Excellent |
+| **6-bit** | ~252 GB | 288 GB | Near original |
+| **8-bit** | ~336 GB | 384 GB | Original quality |
+### Recommended Configurations
+| Mac Model | Max RAM | Recommended Quantization |
+|:----------|:-------:|:-------------------------|
+| Mac Studio M2 Ultra | 192 GB | 4-bit |
+| Mac Studio M4 Ultra | 512 GB | 8-bit |
+| Mac Pro M2 Ultra | 192 GB | 4-bit |
+| MacBook Pro M3 Max | 128 GB | 3-bit |
+| MacBook Pro M4 Max | 128 GB | 3-bit |
+### Performance Notes
+- **Inference Speed**: Expect ~5-15 tokens/sec depending on quantization and hardware
+- **First Token Latency**: 10-30 seconds for model loading
+- **Context Window**: Full 128K context supported
+- **Active Parameters**: Only ~37B parameters active per token (MoE architecture)
+## Installation
 ```bash
 pip install mlx-lm