A newer version of the Gradio SDK is available:
5.46.0
metadata
title: LLM KV Cache Calculator
emoji: 💻
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.45.0
app_file: app.py
pinned: false
short_description: Calculate KV cache memory requirements for LLMs
KV Cache Calculator
Calculate KV cache memory requirements for transformer models.
Credits
This implementation is derived from and builds upon the excellent work by gaunernst. Special thanks for the original implementation!
Features
- Multi-attention support: MHA (Multi-Head Attention), GQA (Grouped Query Attention), and MLA (Multi-head Latent Attention)
- Multiple data types: fp16/bf16, fp8, and fp4 quantization
- Real-time calculation: Instant memory requirement estimates
- Model analysis: Detailed breakdown of model configuration
- Universal compatibility: Works with any HuggingFace transformer model
Usage
- Enter your model ID (e.g., "Qwen/Qwen3-30B-A3B")
- Set context length and number of users
- Choose data type precision
- Add HuggingFace token if needed for gated models
- Click calculate to get memory requirements