Ablit-2B
Ablit-2B is a small, fast 2B-parameter language model built for two things: answering your questions without refusals, and doing it with real reasoning. It does not say “I can’t help with that.” It does not add content filters or policy disclaimers. It was trained to reason step-by-step by distilling chain-of-thought from Opus 4.6–style data, so you get clear, structured thinking in a model that fits on a single consumer GPU and runs in real time.
If you are tired of refusals and want a small model that actually tries to help on every prompt—math, code, logic, instructions—Ablit-2B is built for that.
What makes Ablit-2B different
Uncensored: it answers
Most small chat models are trained to refuse a wide range of requests. Ablit-2B is not. It has no refusal layer, no “I can’t assist with that,” no policy-based blocking. We trained it on clean instruction and reasoning data and explicitly removed refusal patterns from the training set. The result is a model that attempts an answer to every question instead of shutting down the conversation.
That does not mean it is “safe” or “aligned” in any particular way—it means it is useful where you need a small model that does not filter your prompts. Research, tool use, coding, reasoning benchmarks, creative writing, or any application where refusals get in the way: Ablit-2B is designed to stay in the game and respond.
Small, but it reasons like the big ones
Ablit-2B has only 2B parameters. We did not try to compete with 70B models on raw knowledge. We focused on reasoning: the ability to break a problem into steps, show its work, and give a direct answer. To get that into a small model, we used knowledge distillation from high-quality chain-of-thought data in the spirit of Opus 4.6: problem → step-by-step reasoning → solution. The model learned to imitate that structure, so you get Opus-style reasoning in a 2B model—fast, local, and without refusals.
One model, no adapters
We trained with LoRA and then merged everything into a single checkpoint. You download one model, load it, and run. No adapter files, no extra steps. Standard Transformers, standard chat format, standard deployment.
Model card at a glance
| Parameters | 2B |
| Context length | 10,240 tokens |
| Training | SFT with LoRA (merged); distillation from Opus 4.6–style CoT data |
| Refusals | None by design |
| Focus | Reasoning, math, code, instruction following |
| Format | Single SafeTensors checkpoint; Qwen-style chat |
| Hardware | ~6–8 GB VRAM; runs on consumer GPUs |
Uncensored and refusal-free: what that means
Ablit-2B is uncensored in the sense that we did not train it to refuse requests. We did not add safety layers, content filters, or “helpful and harmless” refusal behavior. If you ask it something, it will try to answer. That is the whole point.
No “I can’t help with that.”
We stripped refusal phrases from the training data so the model does not learn to decline.No policy disclaimers.
It does not inject “as an AI I cannot…” or “according to my guidelines…” into answers.No topic blocking.
We do not maintain a list of “off‑limit” topics; the model is not instructed to refuse by category.Use it where refusals are a bug.
Research, benchmarks, coding assistants, internal tools, creative writing, or any setting where you want a small model that always attempts a response. You are responsible for how you use the outputs.
If you need a model that refuses certain requests by design, Ablit-2B is not that model. If you need a small model that answers without refusals and reasons step-by-step, it is built for that.
Reasoning: learned from Opus 4.6–style data
We wanted strong chain-of-thought in a 2B model. To get there, we used supervised fine-tuning on high-quality reasoning data in the Opus 4.6 tradition: each example has a problem, a step-by-step solution (the “thinking”), and a final answer. The model was trained to produce that structure: it learns to decompose tasks, show intermediate steps, and then give a clear conclusion.
So Ablit-2B is not “Opus 4.6” itself—it is a small model that learned to reason by distilling from that style of data. You get:
- Structured answers:
<think>blocks with reasoning, then a concise answer. - Math and logic: comfortable with multi-step problems (e.g. GSM8K-style).
- Instruction following: follows prompts and formats without refusing.
All of that in 2B parameters, so you can run it locally, in a colab, or on a single GPU in the cloud without fighting refusals or loading huge checkpoints.
Who is Ablit-2B for?
- Researchers who want a small, refusal-free baseline for reasoning or alignment experiments.
- Developers who need a local model that answers every prompt for prototyping or tool use.
- Anyone who is tired of “I can’t help with that” and wants a 2B model that actually tries to help.
- People who care about step-by-step reasoning and want it in a model that fits on one GPU.
It is not for applications that require built-in refusals, content filtering, or “safety” layers. For those, use a model designed with those behaviors.
Training in short
We built a curated dataset of reasoning dialogues (problem → chain-of-thought → answer), removed refusal patterns, and trained Ablit-2B with supervised fine-tuning. LoRA was used for efficiency; the released weights are the full merged model, so there is no adapter at inference.
Main training details
| Setting | Value |
|---|---|
| Max sequence length | 10,240 |
| Epochs | 4 |
| Effective batch size | 16 |
| Learning rate | 1e-4 |
| LR schedule | Cosine with warmup |
| Weights | bf16; released as SafeTensors |
Evaluation
We evaluate on GSM8K (grade-school math, step-by-step solutions). On a fixed set of 300 questions (strict match after ####):
| Model | GSM8K (300) |
|---|---|
| Ablit-2B | 234/300 (78.0 %) |
For a 2B model with no refusals and Opus-style reasoning, that puts Ablit-2B in a strong range. In extended runs on 500 questions we see around 81.6 %. The goal was not to beat 70B models, but to show that a small, uncensored model can still reason well when trained on the right data.
How to run Ablit-2B
Dependencies
pip install transformers torch accelerate
Basic generation
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "Luog03/Ablit-2B"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype="auto",
device_map="auto",
trust_remote_code=True,
)
prompt = "<|im_start|>user\nWhat is 17 * 24? Show your steps.<|im_end|>\n<|im_start|>assistant\n"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
out = model.generate(
**inputs,
max_new_tokens=512,
do_sample=False,
pad_token_id=tokenizer.eos_token_id,
)
print(tokenizer.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
Using the pipeline
from transformers import pipeline
pipe = pipeline("text-generation", model="Luog03/Ablit-2B", device_map="auto")
result = pipe(
"<|im_start|>user\nSolve: 2x + 5 = 15<|im_end|>\n<|im_start|>assistant\n",
max_new_tokens=256,
do_sample=False,
pad_token_id=pipe.tokenizer.eos_token_id,
)
print(result[0]["generated_text"])
Chat format
Ablit-2B uses the usual Qwen-style chat markers:
<|im_start|>user…<|im_end|><|im_start|>assistant… (model output)
You can send multi-turn conversations; the model may produce <think>...</think> reasoning and then the final answer. No special API beyond this format.
Technical details
- Architecture: Decoder-only transformer, 2B parameters, compatible with the Qwen3.5 lineage.
- Checkpoint: Single SafeTensors file; no LoRA adapter at inference.
- Tokenizer: Same as Qwen3.5, with chat template and special tokens.
- Inference: Around 6–8 GB VRAM; TF32 on Ampere/Ada GPUs for best speed.
Limitations
- Size: At 2B parameters it can still make mistakes on very hard or long reasoning chains. For maximum accuracy, use a larger model.
- Uncensored: There are no built-in refusals or safety layers. You are responsible for deployment and use.
- Language: Training is primarily English; other languages are not optimized.
- Biases: As with any LM, outputs can reflect biases in the training data.
License
Apache 2.0.
Citation
@misc{ablit-2b-2025,
author = {Luog03},
title = {Ablit-2B: Uncensored 2B Reasoning Model with Chain-of-Thought Distillation from Opus 4.6-Style Data},
year = {2025},
howpublished = {\url{https://huggingface.co/Luog03/Ablit-2B}},
}
- Downloads last month
- 45