Fix Quick Start: complete runnable example with embedding generation
Browse files
README.md
CHANGED
|
@@ -34,25 +34,30 @@ Official leaderboard results on 8,400 queries:
|
|
| 34 |
### Installation
|
| 35 |
|
| 36 |
```bash
|
| 37 |
-
pip install scikit-learn numpy joblib huggingface_hub
|
| 38 |
```
|
| 39 |
|
| 40 |
-
###
|
| 41 |
|
| 42 |
```python
|
| 43 |
from huggingface_hub import snapshot_download
|
|
|
|
| 44 |
import sys
|
| 45 |
|
| 46 |
-
# Download
|
| 47 |
path = snapshot_download("JiaqiXue/r2-router")
|
| 48 |
sys.path.insert(0, path)
|
| 49 |
|
| 50 |
from router import R2Router
|
| 51 |
|
| 52 |
-
# Load pre-trained KNN checkpoints
|
| 53 |
router = R2Router.from_pretrained(path)
|
| 54 |
|
| 55 |
-
#
|
|
|
|
|
|
|
|
|
|
|
|
|
| 56 |
result = router.route(embedding)
|
| 57 |
print(f"Model: {result['model_full_name']}")
|
| 58 |
print(f"Token Budget: {result['token_limit']}")
|
|
@@ -70,22 +75,17 @@ sys.path.insert(0, path)
|
|
| 70 |
|
| 71 |
from router import R2Router
|
| 72 |
|
| 73 |
-
# Train KNN from the provided sub_10 training data
|
| 74 |
router = R2Router.from_training_data(path, k=80)
|
| 75 |
-
|
| 76 |
-
# Route a query
|
| 77 |
-
result = router.route(embedding)
|
| 78 |
```
|
| 79 |
|
| 80 |
-
###
|
| 81 |
-
|
| 82 |
-
R2-Router uses [Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B) embeddings (1024-dim). You can generate them with:
|
| 83 |
|
| 84 |
```python
|
| 85 |
-
from
|
| 86 |
-
|
| 87 |
-
|
| 88 |
-
embedding =
|
| 89 |
```
|
| 90 |
|
| 91 |
Or with vLLM for faster batch inference:
|
|
|
|
| 34 |
### Installation
|
| 35 |
|
| 36 |
```bash
|
| 37 |
+
pip install scikit-learn numpy joblib huggingface_hub sentence-transformers
|
| 38 |
```
|
| 39 |
|
| 40 |
+
### Complete Example
|
| 41 |
|
| 42 |
```python
|
| 43 |
from huggingface_hub import snapshot_download
|
| 44 |
+
from sentence_transformers import SentenceTransformer
|
| 45 |
import sys
|
| 46 |
|
| 47 |
+
# 1. Download router
|
| 48 |
path = snapshot_download("JiaqiXue/r2-router")
|
| 49 |
sys.path.insert(0, path)
|
| 50 |
|
| 51 |
from router import R2Router
|
| 52 |
|
| 53 |
+
# 2. Load pre-trained KNN checkpoints
|
| 54 |
router = R2Router.from_pretrained(path)
|
| 55 |
|
| 56 |
+
# 3. Embed your query with Qwen3-0.6B (1024-dim)
|
| 57 |
+
embedder = SentenceTransformer("Qwen/Qwen3-0.6B")
|
| 58 |
+
embedding = embedder.encode("What is the capital of France?")
|
| 59 |
+
|
| 60 |
+
# 4. Route!
|
| 61 |
result = router.route(embedding)
|
| 62 |
print(f"Model: {result['model_full_name']}")
|
| 63 |
print(f"Token Budget: {result['token_limit']}")
|
|
|
|
| 75 |
|
| 76 |
from router import R2Router
|
| 77 |
|
| 78 |
+
# Train KNN from the provided sub_10 training data (custom hyperparameters)
|
| 79 |
router = R2Router.from_training_data(path, k=80)
|
|
|
|
|
|
|
|
|
|
| 80 |
```
|
| 81 |
|
| 82 |
+
### Alternative: vLLM Embeddings (Faster for Batches)
|
|
|
|
|
|
|
| 83 |
|
| 84 |
```python
|
| 85 |
+
from vllm import LLM
|
| 86 |
+
llm = LLM(model="Qwen/Qwen3-0.6B", runner="pooling")
|
| 87 |
+
outputs = llm.embed(["What is the capital of France?"])
|
| 88 |
+
embedding = outputs[0].outputs.embedding
|
| 89 |
```
|
| 90 |
|
| 91 |
Or with vLLM for faster batch inference:
|