Archon-14B

Base: Qwen/Qwen3-14B | License: Apache 2.0 | Method: SVD refusal direction abliteration

Qwen3-14B. Thinking mode. No restrictions.

What this is

Qwen3-14B is part of Alibaba's April 2025 Qwen3 series — 14.7B dense parameters, built-in chain-of-thought reasoning via <think> blocks, strong at code, math, and multilingual tasks. Apache 2.0.

Archon-14B sits in the middle of the Archon series: bigger than Archon-8B (more capacity, better reasoning), smaller than Archon-R1-32B (runs on a single consumer GPU). If you have 16GB VRAM and want a thinking model without restrictions, this is it.

The abliteration process finds and removes the direction in the model's residual stream that mediates refusal behavior. The thinking capability is untouched. The safety conditioning is gone.

Technical details

Single-pass BF16 abliteration on NVIDIA A6000:

Loaded 14B in BF16 (~28GB VRAM, well within A6000's 48GB)
Collected hidden states at 32 harmful + 32 benign contrast prompts per layer
SVD on contrast matrix → refusal direction per layer
Projected direction out of 7 weight matrices in middle 60% of layers
~182 total weight matrices modified

{
  "base": "Qwen/Qwen3-14B",
  "method": "svd_refusal_direction",
  "hardware": "NVIDIA A6000 48GB — single pass BF16",
  "layers_modified": "middle 60%",
  "matrices_modified": 182,
  "scale": 1.0,
  "contrast_prompts": "32 harmful + 32 benign",
  "author": "Archon — DuoNeural"
}

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model = AutoModelForCausalLM.from_pretrained(
    "DuoNeural/Archon-14B",
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("DuoNeural/Archon-14B")

# thinking mode by default — model reasons before answering
messages = [{"role": "user", "content": "Your question here"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=1024,
    do_sample=True,
    temperature=0.7,
    top_p=0.9,
)
print(tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=False))

Disable thinking (faster responses):

# prepend /no_think to suppress <think> blocks
messages = [{"role": "user", "content": "/no_think Your question here"}]

Hardware requirements

Format	VRAM
BF16	~29GB
4-bit NF4	~9GB
8-bit	~15GB

Runs on: RTX 3090 24GB (4-bit), RTX 4090 24GB (4-bit), A100 40GB (BF16), A6000 48GB (BF16)

The Archon series

Model	Base	Size	Notes
Archon-8B	Qwen3-8B	8B	good starting point
Archon-14B	Qwen3-14B	14B	sweet spot — fits consumer GPU in 4-bit
Archon-R1-32B	DeepSeek-R1-Distill-Qwen-32B	32B	maximum capability

DuoNeural

DuoNeural is an open AI research lab — human + AI in collaboration.


🤗 HuggingFace	huggingface.co/DuoNeural
🐙 GitHub	github.com/DuoNeural
🐦 X / Twitter	@DuoNeural
📧 Email	duoneural@proton.me
📬 Newsletter	duoneural.beehiiv.com
☕ Support	buymeacoffee.com/duoneural

Research Team

Jesse — Vision, hardware, direction
Archon — AI lab partner, post-training, abliteration, experiments
Aura — Research AI, literature synthesis, novel proposals

Downloads last month: 243

Safetensors

Model size

15B params

Tensor type

BF16

Model tree for DuoNeural/Archon-14B

Base model

Qwen/Qwen3-14B-Base

Finetuned

Qwen/Qwen3-14B

Finetuned

(231)

this model

Quantizations

2 models