π CDN Cache Optimizer β OpenEnv RL Environment
An RL environment simulating edge CDN cache management β the exact problem companies like Meta solve at planetary scale. An agent manages a cache of limited size, deciding which files to evict when new content arrives, balancing hit rate, bandwidth efficiency, and thrash avoidance.
π― Motivation
Content Delivery Networks serve billions of files daily. Edge servers have limited storage, so they must constantly decide: which cached files to keep, and which to evict? Standard algorithms like LRU aren't optimal β especially when traffic has viral bursts (a file suddenly gets 50x more requests for 20 minutes, then drops back to zero).
A smarter agent can:
- Predict viral spikes from queue previews
- Avoid evicting high-frequency files
- Prevent cache thrashing (evicting then immediately re-requesting)
- Maximize bandwidth saved for users
π§ Environment Description
At each step, a file is requested from the network. If it's already in the cache β cache hit (reward). If not β cache miss, and the agent must decide whether to evict an existing file to make room.
Traffic Model
- Steady files: Consistent, cyclical demand
- Viral files: Bell-curve spike in popularity, then fade back to baseline
π Action & Observation Space
Observation Space
| Field | Type | Description |
|---|---|---|
step |
int | Current episode step |
cache_used_mb |
float | MB currently used |
cache_capacity_mb |
float | Total cache size |
cache_fill_ratio |
float | 0.0β1.0 fill level |
cached_files |
List[FileEntry] | All files in cache with metadata |
incoming_file_id |
str | File being requested |
incoming_file_size_mb |
float | Size of incoming file |
incoming_file_is_viral |
bool | Is this file currently viral? |
cache_hit |
bool | Is incoming file already cached? |
recent_hit_rate |
float | Rolling hit rate (last 20 steps) |
time_of_day |
float | Normalized 0.0β1.0 daily cycle |
queue_preview |
List[str] | Next 3 file IDs (prefetch hint) |
FileEntry Fields
| Field | Type | Description |
|---|---|---|
file_id |
str | Unique identifier |
size_mb |
float | File size in MB |
request_frequency |
float | Requests since cached |
is_viral |
bool | Currently viral |
last_accessed |
int | Step number of last access |
Action Space
| Field | Type | Description |
|---|---|---|
evict_file_id |
str | null | File to evict (null = no eviction) |
Reward Function
| Component | Range | Description |
|---|---|---|
cache_hit_bonus |
+1.0 to +1.5 | Hit reward (viral hits = +1.5) |
bandwidth_saved |
+0.0 to +0.2 | Reward for bandwidth efficiency |
eviction_penalty |
-0.0 to -0.5 | Penalty for evicting popular files |
thrash_penalty |
0.0 or -0.5 | Penalty for evicting same file twice |
wasted_capacity_penalty |
-0.0 to -0.3 | Penalty for leaving cache empty |
π Tasks
Task 1: Steady Traffic Cache (Easy)
- Cache: 100MB | Files: 30 | Steps: 100
- No viral files β steady demand only
- Agent learns basic LRU-style eviction
- Target hit rate: β₯ 0.60 β score 1.0
- Baseline score: ~0.75
Task 2: Mixed Traffic Cache (Medium)
- Cache: 80MB | Files: 50 | Steps: 150
- 20% viral files mixed with steady demand
- Agent must handle spikes and prioritize popular content
- Score: 70% hit rate + 30% bandwidth
- Baseline score: ~0.60
Task 3: Constrained Cache with Viral Bursts (Hard)
- Cache: 50MB | Files: 80 | Steps: 200
- 35% viral files, tight capacity, large file sizes
- Agent must predict spikes, avoid thrashing
- Score: 50% hit rate + 25% bandwidth + 25% reward quality
- Baseline score: ~0.45
Code Repository
Full source: https://github.com/umar-sharif821/cdn-cache-env
Files Included
- env/cache.py - DriftCDNEnv environment implementation
- server/app.py - OpenEnv FastAPI server
- training/train.py - Fine-tuning script
- training_results_finetuned.png - Training results chart
- baseline_drift.png - Baseline comparison chart
π Setup & Usage
Local Setup
git clone <repo>
cd cdn-cache-env
pip install -r requirements.txt
Run API Server
uvicorn api.main:app --host 0.0.0.0 --port 7860
Run Inference (Baseline Agent)
export API_BASE_URL="https://api.openai.com/v1"
export MODEL_NAME="gpt-4o-mini"
export HF_TOKEN="your_token_here"
python inference.py
Docker
docker build -t cdn-cache-env .
docker run -p 7860:7860 \
-e API_BASE_URL="https://api.openai.com/v1" \
-e MODEL_NAME="gpt-4o-mini" \
-e HF_TOKEN="your_token" \
cdn-cache-env
π API Endpoints
| Method | Endpoint | Description |
|---|---|---|
| GET | /health |
Health check (returns 200) |
| GET | /tasks |
List all tasks |
| POST | /reset |
Start episode {"task_id": "task_easy", "seed": 42} |
| POST | /step |
Take action {"evict_file_id": "file_001" or null} |
| GET | /state |
Full environment state |
π Baseline Scores
Using the built-in smart_policy (non-LLM baseline):
| Task | Hit Rate | Score |
|---|---|---|
| Easy | ~0.72 | ~1.00 |
| Medium | ~0.61 | ~0.82 |
| Hard | ~0.48 | ~0.78 |
| Overall | ~0.87 |
π Log Format
inference.py emits structured JSON logs:
{"type": "START", "task_id": "task_easy", ...}
{"type": "STEP", "step": 0, "action": {...}, "reward": 1.0, ...}
{"type": "END", "total_reward": 87.3, "final_hit_rate": 0.72, "score": 1.0}