Qwen3-VL-8B-Instruct-abliterated
Qwen3-VL-8B-Instruct-abliterated is an abliterated (v1.0) variant of Qwen3-VL-8B-Instruct, designed for Abliterated Reasoning and Captioning. This model is fine-tuned to produce highly detailed, descriptive, and reasoning-focused outputs across a wide range of visual and multimodal contexts, including complex, sensitive, or nuanced content. It supports varied image resolutions and aspect ratios while maintaining interpretive coherence and descriptive accuracy.

Key Highlights
Abliterated / Uncensored Captioning Fine-tuned to bypass conventional content filters while preserving factual, descriptive, and reasoning-rich outputs.
High-Fidelity Reasoning and Descriptions Generates in-depth captions and reasoning for general, artistic, technical, abstract, and low-context images.
Robust Across Aspect Ratios Performs consistently on wide, tall, square, panoramic, and irregular image dimensions.
Variational Detail Control Capable of generating outputs ranging from concise summaries to intricate, multi-level descriptive reasoning.
Foundation on Qwen3-VL-8B-Instruct Architecture Built upon Qwen3-VL-8B-Instruct’s multimodal reasoning, comprehension, and instruction-following framework.
Multilingual Output Capability Primarily outputs in English, but adaptable to multiple languages via prompt engineering.
Quick Start with Transformers
from transformers import Qwen3VLForConditionalGeneration, AutoProcessor
from qwen_vl_utils import process_vision_info
import torch
model = Qwen3VLForConditionalGeneration.from_pretrained(
"prithivMLmods/Qwen3-VL-8B-Instruct-abliterated",
torch_dtype="auto",
device_map="auto"
)
processor = AutoProcessor.from_pretrained("prithivMLmods/Qwen3-VL-8B-Instruct-abliterated")
messages = [
{
"role": "user",
"content": [
{
"type": "image",
"image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg",
},
{"type": "text", "text": "Provide a detailed caption and reasoning for this image."},
],
}
]
text = processor.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True
)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
text=[text],
images=image_inputs,
videos=video_inputs,
padding=True,
return_tensors="pt",
).to("cuda")
generated_ids = model.generate(**inputs, max_new_tokens=128)
generated_ids_trimmed = [
out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
generated_ids_trimmed,
skip_special_tokens=True,
clean_up_tokenization_spaces=False
)
print(output_text)
Intended Use
This model is suited for:
- Generating detailed, unfiltered captions and reasoning for general-purpose and artistic datasets.
- Research in content moderation, red-teaming, and generative safety analysis.
- Enabling descriptive captioning and reasoning for datasets typically excluded from mainstream models.
- Creative and exploratory applications such as storytelling, visual interpretation, and multimodal reasoning.
- Captioning and reasoning for non-standard, stylized, or abstract visual content.
Limitations
- May generate explicit, sensitive, or offensive content depending on the prompt and input image.
- Not suitable for production environments that require strict content filtering or moderation.
- Output tone, style, and reasoning depth can vary depending on phrasing and visual complexity.
- May show variability in performance on synthetic or highly abstract visuals.
- Downloads last month
- 130
Model tree for prithivMLmods/Qwen3-VL-8B-Instruct-abliterated
Base model
Qwen/Qwen3-VL-8B-Instruct