richardyoung commited on
Commit
4f61d56
·
verified ·
1 Parent(s): 48beb07

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +34 -1
README.md CHANGED
@@ -52,7 +52,40 @@ This is an **ultra-compact 2-bit quantized version** of Kimi K2 Instruct, optimi
52
 
53
  ## 🎯 Quick Start
54
 
55
- ### Installation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
56
 
57
  ```bash
58
  pip install mlx-lm
 
52
 
53
  ## 🎯 Quick Start
54
 
55
+ #
56
+
57
+ ## Hardware Requirements
58
+
59
+ Kimi-K2 is a massive 671B parameter MoE model. Choose your quantization based on available unified memory:
60
+
61
+ | Quantization | Model Size | Min RAM | Quality |
62
+ |:------------:|:----------:|:-------:|:--------|
63
+ | **2-bit** | ~84 GB | 96 GB | Acceptable - some quality loss |
64
+ | **3-bit** | ~126 GB | 128 GB | Good - recommended minimum |
65
+ | **4-bit** | ~168 GB | 192 GB | Very Good - best quality/size balance |
66
+ | **5-bit** | ~210 GB | 256 GB | Excellent |
67
+ | **6-bit** | ~252 GB | 288 GB | Near original |
68
+ | **8-bit** | ~336 GB | 384 GB | Original quality |
69
+
70
+ ### Recommended Configurations
71
+
72
+ | Mac Model | Max RAM | Recommended Quantization |
73
+ |:----------|:-------:|:-------------------------|
74
+ | Mac Studio M2 Ultra | 192 GB | 4-bit |
75
+ | Mac Studio M4 Ultra | 512 GB | 8-bit |
76
+ | Mac Pro M2 Ultra | 192 GB | 4-bit |
77
+ | MacBook Pro M3 Max | 128 GB | 3-bit |
78
+ | MacBook Pro M4 Max | 128 GB | 3-bit |
79
+
80
+ ### Performance Notes
81
+
82
+ - **Inference Speed**: Expect ~5-15 tokens/sec depending on quantization and hardware
83
+ - **First Token Latency**: 10-30 seconds for model loading
84
+ - **Context Window**: Full 128K context supported
85
+ - **Active Parameters**: Only ~37B parameters active per token (MoE architecture)
86
+
87
+
88
+ ## Installation
89
 
90
  ```bash
91
  pip install mlx-lm