RTMO-s (body7) — acaua mirror (pure-PyTorch port)

This is a pure-PyTorch port of RTMO-s hosted under CondadosAI/ for use with the acaua computer vision library.

RTMO (Lu et al., CVPR 2024) is a one-stage real-time multi-person pose estimator that integrates coordinate classification into a YOLO-style architecture. This variant was trained on the body7 composite dataset (COCO + AI Challenger + CrowdPose + MPII + sub-JHMDB + Halpe + PoseTrack18), producing a 17-keypoint COCO-schema skeleton.

The architecture has been re-implemented in pure PyTorch under acaua.adapters.rtmo — no mmcv, no mmengine, no mmpose, no trust_remote_code. The model.safetensors in this mirror is converted from the upstream .pth checkpoint to safetensors with the acaua adapter's state_dict key naming. It is NOT drop-in compatible with mmpose — weights are laid out to load cleanly into our nn.Module tree via load_state_dict(strict=True).

Provenance

Upstream code open-mmlab/mmpose @ 759b39c13fea6ba094afc1fa932f51dc1b11cbf9 (Apache-2.0)
Upstream weights URL https://download.openmmlab.com/mmpose/v1/projects/rtmo/rtmo-s_8xb32-600e_body7-640x640-dac2bf74_20231211.pth
Upstream weights SHA256 dac2bf749bbfb51e69ca577ca0327dff4433e3be9a56b782f0b7ef94fb45247e
Conversion script scripts/convert_rtmo.py
Paper Lu et al., "RTMO: Towards High-Performance One-Stage Real-Time Multi-Person Pose Estimation", CVPR 2024, arXiv:2312.07526
Mirrored on 2026-04-22
Mirrored by CondadosAI/acaua

Usage

import acaua

model = acaua.Model.from_pretrained("CondadosAI/rtmo_s_body7")
result = model.predict("image.jpg")

# Result is a PoseResult with shape:
#   result.boxes            -> (N, 4) float32, xyxy
#   result.labels           -> (N,)   int64  (person = 0)
#   result.scores           -> (N,)   float32
#   result.keypoints        -> (N, 17, 2) float32, xy in image pixels
#   result.keypoint_scores  -> (N, 17)    float32

# Skeleton edges + keypoint names live on the adapter:
import supervision as sv
kp = result.to_supervision()
sv.EdgeAnnotator(edges=model.skeleton).annotate(image, kp)

Architecture

  • Backbone: CSPDarknet (YOLOX-lineage), widen_factor=0.5, deepen_factor=0.33
  • Neck: HybridEncoder (RT-DETR–style transformer encoder + FPN/PAN fusion), hidden_dim=256
  • Head: RTMOHead with per-level YOLO-style box + visibility predictions and a Dynamic Coordinate Classifier (DCC) decoded via softmax expectation over (192 × 256) coordinate bins
  • Parameters: ~9.87M
  • Input: 640 × 640 letterboxed, RGB raw pixel values (no mean/std normalization per upstream PoseDataPreprocessor)

Reported performance (upstream)

Variant Dataset COCO val AP COCO val AR V100 FPS
RTMO-s body7 68.6 74.3 ~141

License and attribution

Redistributed under Apache-2.0, consistent with the upstream code and weights declarations. The acaua adapter is itself a derivative work of the upstream PyTorch implementation — see NOTICE for the required attribution chain (code AND weights).

Citation

@misc{lu2023rtmo,
      title={{RTMO}: Towards High-Performance One-Stage Real-Time Multi-Person Pose Estimation},
      author={Peng Lu and Tao Jiang and Yining Li and Xiangtai Li and Kai Chen and Wenming Yang},
      year={2023},
      eprint={2312.07526},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
Downloads last month
-
Safetensors
Model size
9.9M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including CondadosAI/rtmo_s_body7

Paper for CondadosAI/rtmo_s_body7