💥 Qwenite3.5-2B
📄 Overview
| Model Name | constructai/Qwenite3.5-2B |
| Base Model | Qwen3.5-2B-Base |
| Dataset | constructai/Granite-v4.1-Distilled-15K |
| Training Type | Supervised Fine-Tuning (SFT) |
| Parameters | 2B |
| Framework | Unsloth + LoRA |
| Hardware | NVIDIA T4 16GB |
🎯 Intended Use
This model is designed for step‑by‑step reasoning tasks where the answer requires logical decomposition before the final response. It is optimized for:
- Educational applications — explaining "why" and "how" questions
- On‑device assistants — runs on mobile, Raspberry Pi, or CPU‑only environments in q4_k_m
- Reasoning distillation research — studying how small models learn from large ones (Granite → Qwen)
Not recommended for: multimodal tasks, non‑reasoning chat (e.g., creative writing), or production systems requiring 100% factual accuracy.
⚠️ Limitations & Intended Use
Intended Use:
Educational & Reasoning tasks — explaining step‑by‑step logic (math, science, common sense)
On‑device assistants — runs on CPU, Raspberry Pi, mobile (small footprint, fast inference) in q4_k_m
Research baseline — for studying SFT‑only reasoning without RLHF/DPO
Distillation experiments — testing how well small models learn from large (Granite → Qwen)
Limitations:
Size matters — 2B parameters, so complex or multi‑hop reasoning may still fail
No multimodal — text only; images, video, audio are not supported
Factual accuracy — may hallucinate or give incorrect answers; always verify critical outputs
Domain restricted — trained on 15,000 reasoning examples (2 epochs); general chat or creative writing may be suboptimal
Training data bias — inherits biases from
constructai/Granite-v4.1-Distilled-15Kdataset; not safety‑filtered for harmful contentHardware specific — optimised for T4/consumer GPUs; very slow on CPU without quantisation
Train details
I continued to experiment with LoRA configurations, and in general, this experiment was successful and calm. At the end of the training, the Loss ranged from 0.9 to 1.0, the quality of the model is excellent, you can try the model with this code:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "constructai/Qwenite3.5-2B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")
def ask(question):
prompt = f"<|im_start|>user\n{question}\nAnswer concisely:<|im_end|>\n<|im_start|>assistant\n"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=1024, temperature=0.1, do_sample=True)
answer = tokenizer.decode(outputs[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True)
return answer
test_questions = [
"On one branch there are 2 monkeys. On two such branches there are 4 monkeys. Now answer: How many on 3 branches?",
]
for q in test_questions:
print(f"Q: {q}")
print(f"A: {ask(q)}\n{'-'*50}")
🙏 Acknowledgements
This project would not have been possible without the open‑source community and the following resources:
Qwen Team (Alibaba Cloud) — for releasing the Qwen3.5-2B-Base model under Apache 2.0, a perfect balance of size and intelligence.
Unsloth AI — for making fine‑tuning on consumer hardware fast and memory‑efficient.
Hugging Face — for the ecosystem (transformers, datasets, PEFT, Hub) that democratises LLM training.
Kaggle — for providing free T4 GPU runtime to run this experiment.
📖 Citation
@misc{Qwenite3.5-2B,
author = {constructai},
title = {Qwenite3.5-2B: Small Reasoning Model via SFT on Granite Traces},
year = {2026},
publisher = {Hugging Face},
howpublished = {https://huggingface.co/constructai/Qwenite3.5-2B},
}
- Downloads last month
- 48