Qwen3-VL-30B-A3B-Instruct-abliterated
Qwen3-VL-30B-A3B-Instruct-abliterated is an abliterated (v1.0) variant of Qwen3-VL-30B-A3B-Instruct**, designed for Abliterated Reasoning and Captioning. This model leverages the Qwen3-VL-MoE (Mixture of Experts) architecture to deliver deeply descriptive, context-rich, and reasoning-oriented multimodal outputs. It handles complex, sensitive, and nuanced visual content while maintaining balanced interpretive coherence and multilingual adaptability.

Key Highlights
Abliterated / Uncensored Captioning and Reasoning Fine-tuned to bypass standard content filters while preserving factual accuracy, descriptive depth, and logical reasoning.
High-Fidelity Reasoning and Visual Understanding Generates detailed captions and structured reasoning for diverse visual categories—artistic, technical, abstract, or low-context.
Mixture of Experts (MoE) Efficiency Built on Qwen3-VL-MoE, dynamically routing computation through specialized experts for enhanced precision and scalability.
Aspect-Ratio Robustness Performs consistently across wide, tall, square, panoramic, and irregular visual formats.
Variational Detail Control Supports both concise summaries and highly detailed reasoning narratives, depending on prompt configuration.
Multilingual Output Capability Defaults to English but adaptable for multilingual use through prompt engineering.
Quick Start with Transformers
from transformers import Qwen3VLMoeForConditionalGeneration, AutoProcessor
from qwen_vl_utils import process_vision_info
import torch
model = Qwen3VLMoeForConditionalGeneration.from_pretrained(
"prithivMLmods/Qwen3-VL-30B-A3B-Instruct-abliterated",
torch_dtype="auto",
device_map="auto"
)
processor = AutoProcessor.from_pretrained("prithivMLmods/Qwen3-VL-30B-A3B-Instruct-abliterated")
messages = [
{
"role": "user",
"content": [
{
"type": "image",
"image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg",
},
{"type": "text", "text": "Provide a detailed caption and reasoning for this image."},
],
}
]
text = processor.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True
)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
text=[text],
images=image_inputs,
videos=video_inputs,
padding=True,
return_tensors="pt",
).to("cuda")
generated_ids = model.generate(**inputs, max_new_tokens=128)
generated_ids_trimmed = [
out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
generated_ids_trimmed,
skip_special_tokens=True,
clean_up_tokenization_spaces=False
)
print(output_text)
Intended Use
This model is suited for:
- Generating detailed, uncensored captions and reasoning for complex or creative visual datasets.
- Research in multimodal reasoning, safety evaluation, and content moderation studies.
- Enabling descriptive captioning and analytical reasoning for datasets excluded from mainstream models.
- Creative applications such as narrative generation, artistic interpretation, and visual storytelling.
- Advanced reasoning over diverse visual structures and aspect ratios.
Limitations
- May produce explicit, sensitive, or offensive content depending on input and prompt.
- Not recommended for deployment in production systems that require strict moderation or filtering.
- Style, tone, and reasoning detail can vary based on prompt phrasing.
- May show variable performance on synthetic, abstract, or highly stylized visual inputs.
- Downloads last month
- 57
Model tree for prithivMLmods/Qwen3-VL-30B-A3B-Instruct-abliterated
Base model
Qwen/Qwen3-VL-30B-A3B-Instruct