Owl IDM - Owl IDM v0-tiny

Inverse Dynamics Model (IDM) trained to predict keyboard (WASD) and mouse inputs from gameplay video frames.

Model Description

This model predicts player controls from visual observations:

  • Input: Sequence of RGB frames (256x256)
  • Output:
    • WASD key predictions (4 binary outputs)
    • Mouse movement (dx, dy in pixels)

Architecture

  • Backbone: Spatial Conv3D encoder → Temporal Transformer
  • Window size: 8 frames
  • Model size: 70M parameters
  • Inference speed: ~1500 FPS on H100 GPU

Training

  • Dataset: FPS gameplay recordings
  • Preprocessing:
    • Frames scaled to [-1, 1]
    • Log1p scaling for mouse: True
  • Loss: BCE for WASD + Huber for mouse

Usage

Installation

# Install the package directly from GitHub
pip install git+https://github.com/overworld/owl-idm-3.git

# Or with inference dependencies
pip install "owl-idm[inference] @ git+https://github.com/overworld/owl-idm-3.git"

Inference

from owl_idms import InferencePipeline
import torch

# Load from Hugging Face Hub
pipeline = InferencePipeline.from_pretrained(
    "Overworld/owl-idm-v0-tiny",
    device="cuda"
)

# Prepare video: [batch, frames, channels, height, width] in range [-1, 1]
video = torch.randn(1, 128, 3, 256, 256) * 2 - 1  # Example

# Run inference
wasd_preds, mouse_preds = pipeline(video)
# wasd_preds: [1, 128, 4] boolean - W, A, S, D key states
# mouse_preds: [1, 128, 2] float - dx, dy mouse movements

Model Files

  • config.yml: Training configuration
  • model.pt: Model checkpoint (EMA weights)
  • inference.py: Inference pipeline (download from repo)

Citation

@software{owl_idm_2024,
  title = {Owl IDM: Inverse Dynamics Models for Gameplay},
  author = {Your Name},
  year = {2024},
  url = {https://huggingface.co/Overworld/owl-idm-v0-tiny}
}

License

MIT License

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support