--- title: LLM KV Cache Calculator emoji: 💻 colorFrom: blue colorTo: purple sdk: gradio sdk_version: 5.45.0 app_file: app.py pinned: false short_description: Calculate KV cache memory requirements for LLMs --- # KV Cache Calculator Calculate KV cache memory requirements for transformer models. ## Credits This implementation is derived from and builds upon the excellent work by [gaunernst](https://huggingface.co/spaces/gaunernst/kv-cache-calculator). Special thanks for the original implementation! ## Features - **Multi-attention support**: MHA (Multi-Head Attention), GQA (Grouped Query Attention), and MLA (Multi-head Latent Attention) - **Multiple data types**: fp16/bf16, fp8, and fp4 quantization - **Real-time calculation**: Instant memory requirement estimates - **Model analysis**: Detailed breakdown of model configuration - **Universal compatibility**: Works with any HuggingFace transformer model ## Usage 1. Enter your model ID (e.g., "Qwen/Qwen3-30B-A3B") 2. Set context length and number of users 3. Choose data type precision 4. Add HuggingFace token if needed for gated models 5. Click calculate to get memory requirements