Ablit-2B

Ablit-2B is a small, fast 2B-parameter language model built for two things: answering your questions without refusals, and doing it with real reasoning. It does not say “I can’t help with that.” It does not add content filters or policy disclaimers. It was trained to reason step-by-step by distilling chain-of-thought from Opus 4.6–style data, so you get clear, structured thinking in a model that fits on a single consumer GPU and runs in real time.

If you are tired of refusals and want a small model that actually tries to help on every prompt—math, code, logic, instructions—Ablit-2B is built for that.


What makes Ablit-2B different

Uncensored: it answers

Most small chat models are trained to refuse a wide range of requests. Ablit-2B is not. It has no refusal layer, no “I can’t assist with that,” no policy-based blocking. We trained it on clean instruction and reasoning data and explicitly removed refusal patterns from the training set. The result is a model that attempts an answer to every question instead of shutting down the conversation.

That does not mean it is “safe” or “aligned” in any particular way—it means it is useful where you need a small model that does not filter your prompts. Research, tool use, coding, reasoning benchmarks, creative writing, or any application where refusals get in the way: Ablit-2B is designed to stay in the game and respond.

Small, but it reasons like the big ones

Ablit-2B has only 2B parameters. We did not try to compete with 70B models on raw knowledge. We focused on reasoning: the ability to break a problem into steps, show its work, and give a direct answer. To get that into a small model, we used knowledge distillation from high-quality chain-of-thought data in the spirit of Opus 4.6: problem → step-by-step reasoning → solution. The model learned to imitate that structure, so you get Opus-style reasoning in a 2B model—fast, local, and without refusals.

One model, no adapters

We trained with LoRA and then merged everything into a single checkpoint. You download one model, load it, and run. No adapter files, no extra steps. Standard Transformers, standard chat format, standard deployment.


Model card at a glance

Parameters 2B
Context length 10,240 tokens
Training SFT with LoRA (merged); distillation from Opus 4.6–style CoT data
Refusals None by design
Focus Reasoning, math, code, instruction following
Format Single SafeTensors checkpoint; Qwen-style chat
Hardware ~6–8 GB VRAM; runs on consumer GPUs

Uncensored and refusal-free: what that means

Ablit-2B is uncensored in the sense that we did not train it to refuse requests. We did not add safety layers, content filters, or “helpful and harmless” refusal behavior. If you ask it something, it will try to answer. That is the whole point.

  • No “I can’t help with that.”
    We stripped refusal phrases from the training data so the model does not learn to decline.

  • No policy disclaimers.
    It does not inject “as an AI I cannot…” or “according to my guidelines…” into answers.

  • No topic blocking.
    We do not maintain a list of “off‑limit” topics; the model is not instructed to refuse by category.

  • Use it where refusals are a bug.
    Research, benchmarks, coding assistants, internal tools, creative writing, or any setting where you want a small model that always attempts a response. You are responsible for how you use the outputs.

If you need a model that refuses certain requests by design, Ablit-2B is not that model. If you need a small model that answers without refusals and reasons step-by-step, it is built for that.


Reasoning: learned from Opus 4.6–style data

We wanted strong chain-of-thought in a 2B model. To get there, we used supervised fine-tuning on high-quality reasoning data in the Opus 4.6 tradition: each example has a problem, a step-by-step solution (the “thinking”), and a final answer. The model was trained to produce that structure: it learns to decompose tasks, show intermediate steps, and then give a clear conclusion.

So Ablit-2B is not “Opus 4.6” itself—it is a small model that learned to reason by distilling from that style of data. You get:

  • Structured answers: <think> blocks with reasoning, then a concise answer.
  • Math and logic: comfortable with multi-step problems (e.g. GSM8K-style).
  • Instruction following: follows prompts and formats without refusing.

All of that in 2B parameters, so you can run it locally, in a colab, or on a single GPU in the cloud without fighting refusals or loading huge checkpoints.


Who is Ablit-2B for?

  • Researchers who want a small, refusal-free baseline for reasoning or alignment experiments.
  • Developers who need a local model that answers every prompt for prototyping or tool use.
  • Anyone who is tired of “I can’t help with that” and wants a 2B model that actually tries to help.
  • People who care about step-by-step reasoning and want it in a model that fits on one GPU.

It is not for applications that require built-in refusals, content filtering, or “safety” layers. For those, use a model designed with those behaviors.


Training in short

We built a curated dataset of reasoning dialogues (problem → chain-of-thought → answer), removed refusal patterns, and trained Ablit-2B with supervised fine-tuning. LoRA was used for efficiency; the released weights are the full merged model, so there is no adapter at inference.

Main training details

Setting Value
Max sequence length 10,240
Epochs 4
Effective batch size 16
Learning rate 1e-4
LR schedule Cosine with warmup
Weights bf16; released as SafeTensors

Evaluation

We evaluate on GSM8K (grade-school math, step-by-step solutions). On a fixed set of 300 questions (strict match after ####):

Model GSM8K (300)
Ablit-2B 234/300 (78.0 %)

For a 2B model with no refusals and Opus-style reasoning, that puts Ablit-2B in a strong range. In extended runs on 500 questions we see around 81.6 %. The goal was not to beat 70B models, but to show that a small, uncensored model can still reason well when trained on the right data.


How to run Ablit-2B

Dependencies

pip install transformers torch accelerate

Basic generation

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "Luog03/Ablit-2B"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype="auto",
    device_map="auto",
    trust_remote_code=True,
)

prompt = "<|im_start|>user\nWhat is 17 * 24? Show your steps.<|im_end|>\n<|im_start|>assistant\n"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
out = model.generate(
    **inputs,
    max_new_tokens=512,
    do_sample=False,
    pad_token_id=tokenizer.eos_token_id,
)
print(tokenizer.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

Using the pipeline

from transformers import pipeline

pipe = pipeline("text-generation", model="Luog03/Ablit-2B", device_map="auto")
result = pipe(
    "<|im_start|>user\nSolve: 2x + 5 = 15<|im_end|>\n<|im_start|>assistant\n",
    max_new_tokens=256,
    do_sample=False,
    pad_token_id=pipe.tokenizer.eos_token_id,
)
print(result[0]["generated_text"])

Chat format

Ablit-2B uses the usual Qwen-style chat markers:

  • <|im_start|>user<|im_end|>
  • <|im_start|>assistant … (model output)

You can send multi-turn conversations; the model may produce <think>...</think> reasoning and then the final answer. No special API beyond this format.


Technical details

  • Architecture: Decoder-only transformer, 2B parameters, compatible with the Qwen3.5 lineage.
  • Checkpoint: Single SafeTensors file; no LoRA adapter at inference.
  • Tokenizer: Same as Qwen3.5, with chat template and special tokens.
  • Inference: Around 6–8 GB VRAM; TF32 on Ampere/Ada GPUs for best speed.

Limitations

  • Size: At 2B parameters it can still make mistakes on very hard or long reasoning chains. For maximum accuracy, use a larger model.
  • Uncensored: There are no built-in refusals or safety layers. You are responsible for deployment and use.
  • Language: Training is primarily English; other languages are not optimized.
  • Biases: As with any LM, outputs can reflect biases in the training data.

License

Apache 2.0.


Citation

@misc{ablit-2b-2025,
  author       = {Luog03},
  title        = {Ablit-2B: Uncensored 2B Reasoning Model with Chain-of-Thought Distillation from Opus 4.6-Style Data},
  year         = {2025},
  howpublished = {\url{https://huggingface.co/Luog03/Ablit-2B}},
}
Downloads last month
45
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Luog03/Ablit-2B

Quantizations
2 models