Karin routing LoRA — iter-3

LoRA adapter that fine-tunes mannix/llama3.1-8b-abliterated for tool routing in Karin, an on-device voice assistant running on NVIDIA Jetson Orin Nano 8 GB. This is the production adapter — applied on top of the mannix abliteration via Ollama's ADAPTER directive.

Files

karin-lora.gguf — 41 MB GGUF of the LoRA adapter. Drop-in for Ollama (ADAPTER ./karin-lora.gguf in a Modelfile) or llama.cpp (--lora ./karin-lora.gguf). Built at iter-3 / run_0ac17bc7.

Performance

On Karin's 135-case held-out tool-routing eval (see sft/eval_cases_novel.yaml):

Configuration	Routing	Reply	Tool-output use
Base mannix (no LoRA)	~57%	—	—
This LoRA alone (iter-3)	71.1%	~66%	—
This LoRA + Karin runtime layer (production default)	93.3%	91.9%	59.2%

The runtime layer (Phase-0 classifier patches, under-fire rescue, two-phase compose, L8 reply scrubs) lives in the Karin repo and contributes ~22 pp of the routing gains. See docs/routing-pipeline.md for the full pipeline breakdown.

Four subsequent training iterations (iter-4, 5, 6, 7) regressed on the same eval and were all rolled back. Iter-3 remains the production base. See docs/ for the per-iteration post-mortems.

Training

Base model (trained against): mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated
Base model (deployed against): mannix/llama3.1-8b-abliterated:tools-q4_k_m (same weights, mannix re-applies the abliteration with a tools template)
Training data: 294 SFT rows from Karin's phrase library + 40 DPO pairs
Hyperparameters (anti-overfit, kept across every iteration):
- lora_r=8, lora_alpha=32, lora_dropout=0.1
- sft_lr=1e-4, weight_decay=0.01
- sft_epochs=2, max_seq_length=3072
- Cosine LR + 10% eval split + early stopping (patience 3)
Notebook: sft/colab_sft.ipynb

Deployment

With Ollama already serving mannix/llama3.1-8b-abliterated:tools-q4_k_m on the Jetson:

# 1. Fetch the adapter
hf download kaminglui/karin-lora karin-lora.gguf --local-dir .

# 2. Wrap in a Modelfile on top of the mannix base
ollama show mannix/llama3.1-8b-abliterated:tools-q4_k_m --modelfile > Modelfile
echo 'ADAPTER ./karin-lora.gguf' >> Modelfile
ollama create karin-tuned -f Modelfile

# 3. Point Karin at it (in deploy/.env)
# KARIN_LLM_MODEL=karin-tuned:latest

Scope & limitations

Trained on Karin's specific tool set (14 tools: weather, news, wiki, math, schedule_reminder, find_places, web_search, update_memory, get_time, get_alerts, get_digest, graph, circuit, convert). Routing accuracy outside this tool set is not measured.
English-only system prompt; the LoRA wasn't exposed to multilingual prompts during training.
Runtime quality numbers (93.3% / 91.9% / 59.2%) are measured against the full Karin runtime layer, not the LoRA in isolation. Without the classifier patches, under-fire rescue, and reply scrubs, the LoRA alone scores ~71% routing.

License & attribution

Built with Llama. This adapter is derivative of Meta Llama 3.1 8B Instruct and inherits the Llama 3.1 Community License. See NOTICE for attribution and the Acceptable Use Policy.

Citation

@software{karin_lora_iter3,
  author = {kaminglui},
  title  = {Karin routing LoRA — iter-3},
  year   = {2026},
  url    = {https://huggingface.co/kaminglui/karin-lora},
}

Downloads last month: 17

GGUF

Model size

21M params

Architecture

llama

Hardware compatibility

We're not able to determine the quantization variants.

View all variants