Pathology-CoT: Learning Visual Chain-of-Thought Agent from Expert Whole Slide Image Diagnosis Behavior
Abstract
A framework records and utilizes expert navigation behavior in whole-slide imaging to build an agentic system for pathology diagnosis, achieving high precision and recall in metastasis detection.
Diagnosing a whole-slide image is an interactive, multi-stage process involving changes in magnification and movement between fields. Although recent pathology foundation models are strong, practical agentic systems that decide what field to examine next, adjust magnification, and deliver explainable diagnoses are still lacking. The blocker is data: scalable, clinically aligned supervision of expert viewing behavior that is tacit and experience-based, not written in textbooks or online, and therefore absent from large language model training. We introduce the AI Session Recorder, which works with standard WSI viewers to unobtrusively record routine navigation and convert the viewer logs into standardized behavioral commands (inspect or peek at discrete magnifications) and bounding boxes. A lightweight human-in-the-loop review turns AI-drafted rationales into the Pathology-CoT dataset, a form of paired "where to look" and "why it matters" supervision produced at roughly six times lower labeling time. Using this behavioral data, we build Pathologist-o3, a two-stage agent that first proposes regions of interest and then performs behavior-guided reasoning. On gastrointestinal lymph-node metastasis detection, it achieved 84.5% precision, 100.0% recall, and 75.4% accuracy, exceeding the state-of-the-art OpenAI o3 model and generalizing across backbones. To our knowledge, this constitutes one of the first behavior-grounded agentic systems in pathology. Turning everyday viewer logs into scalable, expert-validated supervision, our framework makes agentic pathology practical and establishes a path to human-aligned, upgradeable clinical AI.
Community
Diagnosing a whole-slide image is an interactive, multi-stage process of changing magnification and moving between fields. Although recent pathology foundation models demonstrated superior performances, practical agentic systems that decide what field to examine next, adjust magnification, and deliver explainable diagnoses are still lacking. Such limitation is largely bottlenecked by data – scalable, clinically aligned supervision of expert viewing behavior that is tacit and experience-based, not documented in textbooks or internet, and therefore absent from LLM training. Here we introduce Pathology-CoT, a framework designed to address this challenge through three key breakthroughs. First, the AI Session Recorder seamlessly integrates with standard whole-slide image (WSI) viewers to unobtrusively record routine navigation and convert the viewer logs into standardized behavioral commands (inspect/peek at discrete magnifications) and bounding boxes. Second, a lightweight human-in-the-loop review turns AI-drafted rationales for behavioral commands into the Pathology-CoT dataset, a form of paired “where to look” and “why it matters”, enabling six-fold faster labeling compared to manual constructing such Chain-of-Thought dataset. Using this behavioral data, we build Pathology-o3, a two-stage agent that first proposes important regions of interest and then performs behavior-guided reasoning. On the gastrointestinal lymph-node metastasis detection task, our method achieved 100.0% recall on the internal validation dataset from Stanford Medicine and 97.6% recall on an independent external validation dataset from Sweden, exceeding the state-of-the-art OpenAI o3 model and generalizing across backbones. To our knowledge, Pathology-CoT constitutes one of the first behavior-grounded agentic systems in pathology. Turning everyday viewer logs into scalable, expert-validated supervision, our framework makes agentic pathology practical and establishes a path to human-aligned, upgradeable clinical AI.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Think Twice to See More: Iterative Visual Reasoning in Medical VLMs (2025)
- Teaching AI Stepwise Diagnostic Reasoning with Report-Guided Chain-of-Thought Learning (2025)
- X-Ray-CoT: Interpretable Chest X-ray Diagnosis with Vision-Language Models via Chain-of-Thought Reasoning (2025)
- PathMR: Multimodal Visual Reasoning for Interpretable Pathology Diagnosis (2025)
- Zoom-In to Sort AI-Generated Images Out (2025)
- MedCLM: Learning to Localize and Reason via a CoT-Curriculum in Medical Vision-Language Models (2025)
- Boosting Pathology Foundation Models via Few-shot Prompt-tuning for Rare Cancer Subtyping (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper