Owl IDM - Owl IDM v0-tiny
Inverse Dynamics Model (IDM) trained to predict keyboard (WASD) and mouse inputs from gameplay video frames.
Model Description
This model predicts player controls from visual observations:
- Input: Sequence of RGB frames (256x256)
- Output:
- WASD key predictions (4 binary outputs)
- Mouse movement (dx, dy in pixels)
Architecture
- Backbone: Spatial Conv3D encoder → Temporal Transformer
- Window size: 8 frames
- Model size: 70M parameters
- Inference speed: ~1500 FPS on H100 GPU
Training
- Dataset: FPS gameplay recordings
- Preprocessing:
- Frames scaled to [-1, 1]
- Log1p scaling for mouse: True
- Loss: BCE for WASD + Huber for mouse
Usage
Installation
# Install the package directly from GitHub
pip install git+https://github.com/overworld/owl-idm-3.git
# Or with inference dependencies
pip install "owl-idm[inference] @ git+https://github.com/overworld/owl-idm-3.git"
Inference
from owl_idms import InferencePipeline
import torch
# Load from Hugging Face Hub
pipeline = InferencePipeline.from_pretrained(
"Overworld/owl-idm-v0-tiny",
device="cuda"
)
# Prepare video: [batch, frames, channels, height, width] in range [-1, 1]
video = torch.randn(1, 128, 3, 256, 256) * 2 - 1 # Example
# Run inference
wasd_preds, mouse_preds = pipeline(video)
# wasd_preds: [1, 128, 4] boolean - W, A, S, D key states
# mouse_preds: [1, 128, 2] float - dx, dy mouse movements
Model Files
config.yml: Training configurationmodel.pt: Model checkpoint (EMA weights)inference.py: Inference pipeline (download from repo)
Citation
@software{owl_idm_2024,
title = {Owl IDM: Inverse Dynamics Models for Gameplay},
author = {Your Name},
year = {2024},
url = {https://huggingface.co/Overworld/owl-idm-v0-tiny}
}
License
MIT License
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support