Building a CDN Cache Optimizer with OpenEnv and RL

For this hackathon, I wanted to build something that felt close to a real infrastructure problem.

A lot of reinforcement learning demos are fun, but they often feel disconnected from systems that engineers actually run in production. I wanted my project to sit closer to that world: networking, latency, logs, cost, traffic spikes, and reliability.

So I built CDN Cache Optimizer.

It is an OpenEnv-compatible environment where an agent learns how to manage an edge CDN cache.

The project is live here:

GitHub: https://github.com/umar-sharif821/cdn-cache-env-improvedone
Hugging Face Space: https://huggingface.co/spaces/umar-sharif821/cdn-cache-env-improvedone

The Problem

A CDN edge server has limited storage.

Every time a file is requested, the system has to make a decision:

Should this object stay in cache, should we ignore it, or should we evict something else to make room?

If the file is already cached, the user gets a fast edge response.

If the file is not cached, the request goes back to origin.

That is slower and more expensive.

At small scale, this looks simple.

At internet scale, it becomes a hard optimization problem.

Why This Matters

CDNs serve images, videos, scripts, documents, and application assets to users around the world.

A good cache policy can:

reduce user latency
reduce origin load
save bandwidth
improve reliability
avoid unnecessary cache churn

A poor cache policy can do the opposite.

It may evict useful files.

It may cache large files that are rarely requested.

It may miss viral traffic bursts.

It may keep sending users back to origin even when the edge cache could have served them.

This is why cache optimization is an interesting RL problem.

Why Not Just Use LRU?

LRU is simple:

Evict the least recently used file.

That is a strong baseline, and it works well in many cases.

But it has blind spots.

For example:

A file may be old but about to become popular again.
A file may be recently used once but not worth storing.
A viral object may deserve protection even if it has not been in cache for long.
A large file may consume too much cache space compared to its value.

That is where an agent can do better.

The agent does not just ask:

What was used least recently?

It can ask:

What is most valuable to keep right now?

Project Goal

The goal of this project is not just to train a model.

The goal is to build a complete benchmarkable environment around a realistic CDN caching problem.

That includes:

an OpenEnv-style environment
a baseline policy
a fine-tuned agent policy
a reward function grounded in latency and cost
schema drift handling for CDN logs
Colab reproducibility
a live Hugging Face demo
visual comparison between baseline and agent behavior

Live Demo

The Hugging Face Space runs a Gradio app.

The UI lets the judge choose a CDN task and run a benchmark.

It compares:

Baseline LRU
Fine-tuned CDN agent

The output shows:

total reward
cache hit rate
bandwidth saved
hit-rate curve
baseline-vs-agent comparison chart

Space:

https://huggingface.co/spaces/umar-sharif821/cdn-cache-env-improvedone

![Live Gradio demo](https://huggingface.co/spaces/umar-sharif821/cdn-cache-env-improvedone/resolve/main/demo_screenshot.png)

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Reinforcement Learning