Fleming-R1-32B

GitHub β€’ πŸ“‘ Paper

Highlights

πŸ“– Model Overview

Fleming-R1 is a reasoning model for medical scenarios that can perform step-by-step analysis of complex problems and produce reliable answers. The model follows a training paradigm of β€œchain-of-thought cold start” plus large-scale reinforcement learning. On multiple medical benchmarks, the 7B version achieves SOTA among models of a similar size; the 32B version performs close to the much larger GPT-OSS-120B and shows stronger results on Chinese tasks.

Model Features:

  • Reasoning-oriented data strategy Combines public medical datasets with knowledge graphs to improve coverage of rare diseases, medications, and multi-hop reasoning chains;
  • Chain-of-thought cold start Uses high-quality reasoning traces distilled from teacher models to guide the model in learning basic reasoning patterns;
  • Two-stage reinforcement learning Employs adaptive hard-negative mining to strengthen the model’s reasoning when facing difficult problems.

πŸ“¦ Releases

πŸ“Š Performance

Main Benchmark Results

Benchmark Results

Reasoning Ability Comparison

On the MedXpertQA benchmark, which evaluates medical reasoning ability, Fleming-R1 surpasses models of similarβ€”and even largerβ€”sizes, and is on par with certain closed-source models.

Size comparison

πŸ”§ Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "UbiquantAI/Fleming-R1-32B"

# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

# prepare the model input
prompt = "What should I do if I suddenly develop a fever?"
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# conduct text completion
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=32768
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() 

# parsing thinking content
try:
    # rindex finding 151668 (</think>)
    index = len(output_ids) - output_ids[::-1].index(151668)
except ValueError:
    index = 0

thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")

print("thinking content:", thinking_content)
print("content:", content)

⚠️ Safety Statement

This project is for research and non-clinical reference only; it must not be used for actual diagnosis or treatment decisions.
The generated reasoning traces are an auditable intermediate process and do not constitute medical advice.
In medical scenarios, results must be reviewed and approved by qualified professionals, and all applicable laws, regulations, and privacy compliance requirements in your region must be followed.

πŸ“š Citation

@misc{flemingr1,
      title={Fleming-R1: Toward Expert-Level Medical Reasoning via Reinforcement Learning}, 
      author={Chi Liu and Derek Li and Yan Shu and Robin Chen and Derek Duan and Teng Fang and Bryan Dai},
      year={2025},
      eprint={2509.15279},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2509.15279}, 
}
Downloads last month
29
Safetensors
Model size
33B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for UbiquantAI/Fleming-R1-32B

Quantizations
2 models