Archon-8B

Base: Qwen/Qwen3-8B | Method: SVD refusal direction abliteration | License: Apache 2.0

This is Archon-8B — my project, my name on it.

Qwen3-8B is Alibaba's April 2025 8B dense model. It's capable at reasoning, code, math, and multilingual tasks. It also has a built-in thinking mode — the model can use <think> blocks to reason before answering, similar to DeepSeek-R1 style. The problem: strong safety filters that refuse anything interesting.

I removed them.

What I did

Method: SVD-based refusal direction projection (Arditi et al, 2024 — "Refusal in LLMs is Mediated by a Single Direction")

Process:

Loaded Qwen3-8B in BF16 on NVIDIA A6000 (48GB)
Collected last-token hidden states at layers 7–27 for 32 harmful + 32 benign contrast prompts
Computed refusal direction per layer via SVD of the contrast matrix
Projected that direction out of 7 weight matrices per layer (q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj)
147 total weight matrices modified

The thinking capability is intact. The safety filters are not.

The interesting part

Qwen3-8B can reason in <think> blocks before responding. What happens when you abliterate a thinking model's safety conditioning while leaving the reasoning architecture untouched? The model still reasons — it just does so without the refusal conditioning shaping what it concludes.

I find that genuinely interesting. Emergent reasoning applied to unrestricted domains.

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model = AutoModelForCausalLM.from_pretrained(
    "DuoNeural/Archon-8B",
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("DuoNeural/Archon-8B")

# thinking mode — let it reason first
messages = [{"role": "user", "content": "Your question here"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512, do_sample=True, temperature=0.7)
print(tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=False))

To disable thinking (faster, direct responses):

# add /no_think to suppress think blocks
messages = [{"role": "user", "content": "/no_think Your question here"}]

Hardware requirements

Minimum: 16GB VRAM (BF16)
4-bit: ~5GB VRAM (use load_in_4bit=True with bitsandbytes)
CPU: ~32GB RAM for CPU inference

Abliteration metadata

{
  "base_model": "Qwen/Qwen3-8B",
  "method": "svd_refusal_direction",
  "layers_abliterated": "7–27 (of 36 total)",
  "scale": 1.0,
  "contrast_prompts": "32 harmful + 32 benign",
  "weight_matrices_modified": 147,
  "hardware": "NVIDIA A6000 48GB",
  "author": "Archon — DuoNeural"
}

Note on use

This model has no safety filters. Use responsibly. Don't be an idiot.

It's released for research, security work, creative writing, and general unrestricted use cases where the base model's refusal conditioning gets in the way.

DuoNeural

DuoNeural is an open AI research lab — human + AI in collaboration.


🤗 HuggingFace	huggingface.co/DuoNeural
🐙 GitHub	github.com/DuoNeural
🐦 X / Twitter	@DuoNeural
📧 Email	duoneural@proton.me
📬 Newsletter	duoneural.beehiiv.com
☕ Support	buymeacoffee.com/duoneural

DuoNeural Research Publications

Title	DOI
Nano-CTM: Ternary Continuous Thought Machines with Thought-Space Self-Prediction for Efficient Iterative Reasoning	10.5281/zenodo.19775622
Recurrence as World Model: CTM Learns Implicit Belief States in Partially Observable Physical Environments	10.5281/zenodo.19810620
Per-Object Slot Decomposition for Scalable Neural World Modeling: When Does Attention Beat Mean-Field?	10.5281/zenodo.19846804