title: LLM KV Cache Calculator | |
emoji: 💻 | |
colorFrom: blue | |
colorTo: purple | |
sdk: gradio | |
sdk_version: 5.45.0 | |
app_file: app.py | |
pinned: false | |
short_description: Calculate KV cache memory requirements for LLMs | |
# KV Cache Calculator | |
Calculate KV cache memory requirements for transformer models. | |
## Credits | |
This implementation is derived from and builds upon the excellent work by [gaunernst](https://huggingface.co/spaces/gaunernst/kv-cache-calculator). Special thanks for the original implementation! | |
## Features | |
- **Multi-attention support**: MHA (Multi-Head Attention), GQA (Grouped Query Attention), and MLA (Multi-head Latent Attention) | |
- **Multiple data types**: fp16/bf16, fp8, and fp4 quantization | |
- **Real-time calculation**: Instant memory requirement estimates | |
- **Model analysis**: Detailed breakdown of model configuration | |
- **Universal compatibility**: Works with any HuggingFace transformer model | |
## Usage | |
1. Enter your model ID (e.g., "Qwen/Qwen3-30B-A3B") | |
2. Set context length and number of users | |
3. Choose data type precision | |
4. Add HuggingFace token if needed for gated models | |
5. Click calculate to get memory requirements | |