JiaqiXue
/

R2-Router-RouterArena

@@ -34,25 +34,30 @@ Official leaderboard results on 8,400 queries:
 ### Installation
 ```bash
-pip install scikit-learn numpy joblib huggingface_hub
 ```
-### Load Pre-trained Checkpoints
 ```python
 from huggingface_hub import snapshot_download
 import sys
-# Download model
 path = snapshot_download("JiaqiXue/r2-router")
 sys.path.insert(0, path)
 from router import R2Router
-# Load pre-trained KNN checkpoints (no training needed)
 router = R2Router.from_pretrained(path)
-# Route a query (requires 1024-dim embedding from Qwen3-0.6B)
 result = router.route(embedding)
 print(f"Model: {result['model_full_name']}")
 print(f"Token Budget: {result['token_limit']}")
@@ -70,22 +75,17 @@ sys.path.insert(0, path)
 from router import R2Router
-# Train KNN from the provided sub_10 training data
 router = R2Router.from_training_data(path, k=80)
-# Route a query
-result = router.route(embedding)
 ```
-### Get Query Embeddings
-R2-Router uses [Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B) embeddings (1024-dim). You can generate them with:
 ```python
-from sentence_transformers import SentenceTransformer
-model = SentenceTransformer("Qwen/Qwen3-0.6B")
-embedding = model.encode("What is the capital of France?")
 ```
 Or with vLLM for faster batch inference:

 ### Installation
 ```bash
+pip install scikit-learn numpy joblib huggingface_hub sentence-transformers
 ```
+### Complete Example
 ```python
 from huggingface_hub import snapshot_download
+from sentence_transformers import SentenceTransformer
 import sys
+# 1. Download router
 path = snapshot_download("JiaqiXue/r2-router")
 sys.path.insert(0, path)
 from router import R2Router
+# 2. Load pre-trained KNN checkpoints
 router = R2Router.from_pretrained(path)
+# 3. Embed your query with Qwen3-0.6B (1024-dim)
+embedder = SentenceTransformer("Qwen/Qwen3-0.6B")
+embedding = embedder.encode("What is the capital of France?")
+# 4. Route!
 result = router.route(embedding)
 print(f"Model: {result['model_full_name']}")
 print(f"Token Budget: {result['token_limit']}")
 from router import R2Router
+# Train KNN from the provided sub_10 training data (custom hyperparameters)
 router = R2Router.from_training_data(path, k=80)
 ```
+### Alternative: vLLM Embeddings (Faster for Batches)
 ```python
+from vllm import LLM
+llm = LLM(model="Qwen/Qwen3-0.6B", runner="pooling")
+outputs = llm.embed(["What is the capital of France?"])
+embedding = outputs[0].outputs.embedding
 ```
 Or with vLLM for faster batch inference: