Karin routing LoRA β€” iter-3

LoRA adapter that fine-tunes mannix/llama3.1-8b-abliterated for tool routing in Karin, an on-device voice assistant running on NVIDIA Jetson Orin Nano 8 GB. This is the production adapter β€” applied on top of the mannix abliteration via Ollama's ADAPTER directive.

Files

  • karin-lora.gguf β€” 41 MB GGUF of the LoRA adapter. Drop-in for Ollama (ADAPTER ./karin-lora.gguf in a Modelfile) or llama.cpp (--lora ./karin-lora.gguf). Built at iter-3 / run_0ac17bc7.

Performance

On Karin's 135-case held-out tool-routing eval (see sft/eval_cases_novel.yaml):

Configuration Routing Reply Tool-output use
Base mannix (no LoRA) ~57% β€” β€”
This LoRA alone (iter-3) 71.1% ~66% β€”
This LoRA + Karin runtime layer (production default) 93.3% 91.9% 59.2%

The runtime layer (Phase-0 classifier patches, under-fire rescue, two-phase compose, L8 reply scrubs) lives in the Karin repo and contributes ~22 pp of the routing gains. See docs/routing-pipeline.md for the full pipeline breakdown.

Four subsequent training iterations (iter-4, 5, 6, 7) regressed on the same eval and were all rolled back. Iter-3 remains the production base. See docs/ for the per-iteration post-mortems.

Training

  • Base model (trained against): mlabonne/Meta-Llama-3.1-8B-Instruct-abliterated
  • Base model (deployed against): mannix/llama3.1-8b-abliterated:tools-q4_k_m (same weights, mannix re-applies the abliteration with a tools template)
  • Training data: 294 SFT rows from Karin's phrase library + 40 DPO pairs
  • Hyperparameters (anti-overfit, kept across every iteration):
    • lora_r=8, lora_alpha=32, lora_dropout=0.1
    • sft_lr=1e-4, weight_decay=0.01
    • sft_epochs=2, max_seq_length=3072
    • Cosine LR + 10% eval split + early stopping (patience 3)
  • Notebook: sft/colab_sft.ipynb

Deployment

With Ollama already serving mannix/llama3.1-8b-abliterated:tools-q4_k_m on the Jetson:

# 1. Fetch the adapter
hf download kaminglui/karin-lora karin-lora.gguf --local-dir .

# 2. Wrap in a Modelfile on top of the mannix base
ollama show mannix/llama3.1-8b-abliterated:tools-q4_k_m --modelfile > Modelfile
echo 'ADAPTER ./karin-lora.gguf' >> Modelfile
ollama create karin-tuned -f Modelfile

# 3. Point Karin at it (in deploy/.env)
# KARIN_LLM_MODEL=karin-tuned:latest

Scope & limitations

  • Trained on Karin's specific tool set (14 tools: weather, news, wiki, math, schedule_reminder, find_places, web_search, update_memory, get_time, get_alerts, get_digest, graph, circuit, convert). Routing accuracy outside this tool set is not measured.
  • English-only system prompt; the LoRA wasn't exposed to multilingual prompts during training.
  • Runtime quality numbers (93.3% / 91.9% / 59.2%) are measured against the full Karin runtime layer, not the LoRA in isolation. Without the classifier patches, under-fire rescue, and reply scrubs, the LoRA alone scores ~71% routing.

License & attribution

Built with Llama. This adapter is derivative of Meta Llama 3.1 8B Instruct and inherits the Llama 3.1 Community License. See NOTICE for attribution and the Acceptable Use Policy.

Citation

@software{karin_lora_iter3,
  author = {kaminglui},
  title  = {Karin routing LoRA β€” iter-3},
  year   = {2026},
  url    = {https://huggingface.co/kaminglui/karin-lora},
}
Downloads last month
17
GGUF
Model size
21M params
Architecture
llama
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support