RTMO-s (body7) — acaua mirror (pure-PyTorch port)

This is a pure-PyTorch port of RTMO-s hosted under CondadosAI/ for use with the acaua computer vision library.

RTMO (Lu et al., CVPR 2024) is a one-stage real-time multi-person pose estimator that integrates coordinate classification into a YOLO-style architecture. This variant was trained on the body7 composite dataset (COCO + AI Challenger + CrowdPose + MPII + sub-JHMDB + Halpe + PoseTrack18), producing a 17-keypoint COCO-schema skeleton.

The architecture has been re-implemented in pure PyTorch under acaua.adapters.rtmo — no mmcv, no mmengine, no mmpose, no trust_remote_code. The model.safetensors in this mirror is converted from the upstream .pth checkpoint to safetensors with the acaua adapter's state_dict key naming. It is NOT drop-in compatible with mmpose — weights are laid out to load cleanly into our nn.Module tree via load_state_dict(strict=True).

Provenance


Upstream code	`open-mmlab/mmpose` @ `759b39c13fea6ba094afc1fa932f51dc1b11cbf9` (Apache-2.0)
Upstream weights URL	`https://download.openmmlab.com/mmpose/v1/projects/rtmo/rtmo-s_8xb32-600e_body7-640x640-dac2bf74_20231211.pth`
Upstream weights SHA256	`dac2bf749bbfb51e69ca577ca0327dff4433e3be9a56b782f0b7ef94fb45247e`
Conversion script	`scripts/convert_rtmo.py`
Paper	Lu et al., "RTMO: Towards High-Performance One-Stage Real-Time Multi-Person Pose Estimation", CVPR 2024, arXiv:2312.07526
Mirrored on	2026-04-22
Mirrored by	CondadosAI/acaua

Usage

import acaua

model = acaua.Model.from_pretrained("CondadosAI/rtmo_s_body7")
result = model.predict("image.jpg")

# Result is a PoseResult with shape:
#   result.boxes            -> (N, 4) float32, xyxy
#   result.labels           -> (N,)   int64  (person = 0)
#   result.scores           -> (N,)   float32
#   result.keypoints        -> (N, 17, 2) float32, xy in image pixels
#   result.keypoint_scores  -> (N, 17)    float32

# Skeleton edges + keypoint names live on the adapter:
import supervision as sv
kp = result.to_supervision()
sv.EdgeAnnotator(edges=model.skeleton).annotate(image, kp)

Architecture

Backbone: CSPDarknet (YOLOX-lineage), widen_factor=0.5, deepen_factor=0.33
Neck: HybridEncoder (RT-DETR–style transformer encoder + FPN/PAN fusion), hidden_dim=256
Head: RTMOHead with per-level YOLO-style box + visibility predictions and a Dynamic Coordinate Classifier (DCC) decoded via softmax expectation over (192 × 256) coordinate bins
Parameters: ~9.87M
Input: 640 × 640 letterboxed, RGB raw pixel values (no mean/std normalization per upstream PoseDataPreprocessor)

Reported performance (upstream)

Variant	Dataset	COCO val AP	COCO val AR	V100 FPS
RTMO-s	body7	68.6	74.3	~141

License and attribution

Redistributed under Apache-2.0, consistent with the upstream code and weights declarations. The acaua adapter is itself a derivative work of the upstream PyTorch implementation — see NOTICE for the required attribution chain (code AND weights).

Citation

@misc{lu2023rtmo,
      title={{RTMO}: Towards High-Performance One-Stage Real-Time Multi-Person Pose Estimation},
      author={Peng Lu and Tao Jiang and Yining Li and Xiangtai Li and Kai Chen and Wenming Yang},
      year={2023},
      eprint={2312.07526},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Downloads last month: -

Safetensors

Model size

9.9M params

Tensor type

F32

Inference Providers NEW

Keypoint Detection

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including CondadosAI/rtmo_s_body7

acaua v0.1 weights

Collection

Apache-2.0 verified model weights for acaua v0.1. Mirrors pin upstream SHAs. • 9 items • Updated about 2 hours ago

Paper for CondadosAI/rtmo_s_body7

RTMO: Towards High-Performance One-Stage Real-Time Multi-Person Pose Estimation

Paper • 2312.07526 • Published Apr 8, 2024