💥 Qwenite3.5-2B

📄 Overview


Model Name	constructai/Qwenite3.5-2B
Base Model	Qwen3.5-2B-Base
Dataset	constructai/Granite-v4.1-Distilled-15K
Training Type	Supervised Fine-Tuning (SFT)
Parameters	2B
Framework	Unsloth + LoRA
Hardware	NVIDIA T4 16GB

🎯 Intended Use

This model is designed for step‑by‑step reasoning tasks where the answer requires logical decomposition before the final response. It is optimized for:

Educational applications — explaining "why" and "how" questions
On‑device assistants — runs on mobile, Raspberry Pi, or CPU‑only environments in q4_k_m
Reasoning distillation research — studying how small models learn from large ones (Granite → Qwen)

Not recommended for: multimodal tasks, non‑reasoning chat (e.g., creative writing), or production systems requiring 100% factual accuracy.

⚠️ Limitations & Intended Use

Intended Use:

Educational & Reasoning tasks — explaining step‑by‑step logic (math, science, common sense)
On‑device assistants — runs on CPU, Raspberry Pi, mobile (small footprint, fast inference) in q4_k_m
Research baseline — for studying SFT‑only reasoning without RLHF/DPO
Distillation experiments — testing how well small models learn from large (Granite → Qwen)

Limitations:

Size matters — 2B parameters, so complex or multi‑hop reasoning may still fail
No multimodal — text only; images, video, audio are not supported
Factual accuracy — may hallucinate or give incorrect answers; always verify critical outputs
Domain restricted — trained on 15,000 reasoning examples (2 epochs); general chat or creative writing may be suboptimal
Training data bias — inherits biases from constructai/Granite-v4.1-Distilled-15K dataset; not safety‑filtered for harmful content
Hardware specific — optimised for T4/consumer GPUs; very slow on CPU without quantisation

Train details

I continued to experiment with LoRA configurations, and in general, this experiment was successful and calm. At the end of the training, the Loss ranged from 0.9 to 1.0, the quality of the model is excellent, you can try the model with this code:


from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "constructai/Qwenite3.5-2B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")

def ask(question):
    prompt = f"<|im_start|>user\n{question}\nAnswer concisely:<|im_end|>\n<|im_start|>assistant\n"
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    outputs = model.generate(**inputs, max_new_tokens=1024, temperature=0.1, do_sample=True)
    answer = tokenizer.decode(outputs[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True)
    return answer

test_questions = [
    "On one branch there are 2 monkeys. On two such branches there are 4 monkeys. Now answer: How many on 3 branches?",
]

for q in test_questions:
    print(f"Q: {q}")
    print(f"A: {ask(q)}\n{'-'*50}")

🙏 Acknowledgements

This project would not have been possible without the open‑source community and the following resources:

Qwen Team (Alibaba Cloud) — for releasing the Qwen3.5-2B-Base model under Apache 2.0, a perfect balance of size and intelligence.
Unsloth AI — for making fine‑tuning on consumer hardware fast and memory‑efficient.
Hugging Face — for the ecosystem (transformers, datasets, PEFT, Hub) that democratises LLM training.
Kaggle — for providing free T4 GPU runtime to run this experiment.

📖 Citation

@misc{Qwenite3.5-2B,
  author = {constructai},
  title = {Qwenite3.5-2B: Small Reasoning Model via SFT on Granite Traces},
  year = {2026},
  publisher = {Hugging Face},
  howpublished = {https://huggingface.co/constructai/Qwenite3.5-2B},
}

Downloads last month: 48

Safetensors

Model size

2B params

Tensor type

BF16

Model tree for constructai/Qwenite3.5-2B

Quantizations

1 model

Dataset used to train constructai/Qwenite3.5-2B

Collection including constructai/Qwenite3.5-2B

Qwenite

Collection

4 items • Updated 5 days ago