You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

ReCAP-32B

ReCAP-32B is a vision-language model fine-tuned from
Qwen/Qwen3-VL-32B-Thinking, designed to enable robust CAPTCHA solving within native GUI agents while preserving general GUI interaction capabilities.

This model is introduced in “CAPTCHA Solving for Native GUI Agents: Automated Reasoning-Action Data Generation and Self-Corrective Training”.

🚀 Overview

ReCAP-32B extends a general-purpose GUI agent with CAPTCHA-solving ability by learning from structured reasoning-action trajectories.

It operates end-to-end:

Input: raw screenshots
Output: reasoning + executable GUI actions (click, type, drag)

✨ Key Features

Unified agent: Handles both CAPTCHA and general GUI tasks
Reasoning-action modeling: Learns both decisions and execution
Self-correction: Improves robustness by learning from failures
Efficient interaction: Generates multiple actions per step

🧠 Capabilities

Supports diverse CAPTCHA types:

Text / OCR
Icon selection & matching
Image grid reasoning
Slider / drag tasks
Multi-step interaction challenges

Core skills:

Visual understanding
Spatial reasoning
Continuous control
Multi-step planning

📊 Performance

~81.0% success rate on synthetic CAPTCHA benchmark
Strong improvements on interaction-heavy tasks (e.g., slider, image grid)
Maintains strong performance on general GUI benchmarks

🔒 Ethical Considerations

This model is released for research purposes only.
It is intended to study and improve the robustness of human-verification systems, not to bypass them.

Downloads last month: 8

Safetensors

Model size

33B params

Tensor type

BF16

Model tree for ReCAP-Agent/ReCAP-32B

Base model

Qwen/Qwen3-VL-32B-Thinking

Finetuned

(9)

this model

Collection including ReCAP-Agent/ReCAP-32B

ReCAP Agent

Collection

ReCAP is a framework for training and evaluating CAPTCHA-capable GUI agents using dynamic tasks, benchmarks, and unified evaluation. • 3 items • Updated about 16 hours ago