Owl IDM - Owl IDM v0-tiny

Inverse Dynamics Model (IDM) trained to predict keyboard (WASD) and mouse inputs from gameplay video frames.

Model Description

This model predicts player controls from visual observations:

Input: Sequence of RGB frames (256x256)
Output:
- WASD key predictions (4 binary outputs)
- Mouse movement (dx, dy in pixels)

Architecture

Backbone: Spatial Conv3D encoder → Temporal Transformer
Window size: 8 frames
Model size: 70M parameters
Inference speed: ~1500 FPS on H100 GPU

Training

Dataset: FPS gameplay recordings
Preprocessing:
- Frames scaled to [-1, 1]
- Log1p scaling for mouse: True
Loss: BCE for WASD + Huber for mouse

Usage

Installation

# Install the package directly from GitHub
pip install git+https://github.com/overworld/owl-idm-3.git

# Or with inference dependencies
pip install "owl-idm[inference] @ git+https://github.com/overworld/owl-idm-3.git"

Inference

from owl_idms import InferencePipeline
import torch

# Load from Hugging Face Hub
pipeline = InferencePipeline.from_pretrained(
    "Overworld/owl-idm-v0-tiny",
    device="cuda"
)

# Prepare video: [batch, frames, channels, height, width] in range [-1, 1]
video = torch.randn(1, 128, 3, 256, 256) * 2 - 1  # Example

# Run inference
wasd_preds, mouse_preds = pipeline(video)
# wasd_preds: [1, 128, 4] boolean - W, A, S, D key states
# mouse_preds: [1, 128, 2] float - dx, dy mouse movements

Model Files

config.yml: Training configuration
model.pt: Model checkpoint (EMA weights)
inference.py: Inference pipeline (download from repo)

Citation

@software{owl_idm_2024,
  title = {Owl IDM: Inverse Dynamics Models for Gameplay},
  author = {Your Name},
  year = {2024},
  url = {https://huggingface.co/Overworld/owl-idm-v0-tiny}
}

License

MIT License

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support