---
tags:
- inverse-dynamics-model
- gameplay
- computer-vision
- fps-games
library_name: owl-idm
---

# Owl IDM - Owl IDM v0-tiny

Inverse Dynamics Model (IDM) trained to predict keyboard (WASD) and mouse inputs from gameplay video frames.

## Model Description

This model predicts player controls from visual observations:
- **Input**: Sequence of RGB frames (256x256)
- **Output**:
  - WASD key predictions (4 binary outputs)
  - Mouse movement (dx, dy in pixels)

## Architecture

- **Backbone**: Spatial Conv3D encoder → Temporal Transformer
- **Window size**: 8 frames
- **Model size**: 70M parameters
- **Inference speed**: ~1500 FPS on H100 GPU

## Training

- **Dataset**: FPS gameplay recordings
- **Preprocessing**:
  - Frames scaled to [-1, 1]
  - Log1p scaling for mouse: True
- **Loss**: BCE for WASD + Huber for mouse

## Usage

### Installation

```bash
# Install the package directly from GitHub
pip install git+https://github.com/overworld/owl-idm-3.git

# Or with inference dependencies
pip install "owl-idm[inference] @ git+https://github.com/overworld/owl-idm-3.git"
```

### Inference

```python
from owl_idms import InferencePipeline
import torch

# Load from Hugging Face Hub
pipeline = InferencePipeline.from_pretrained(
    "Overworld/owl-idm-v0-tiny",
    device="cuda"
)

# Prepare video: [batch, frames, channels, height, width] in range [-1, 1]
video = torch.randn(1, 128, 3, 256, 256) * 2 - 1  # Example

# Run inference
wasd_preds, mouse_preds = pipeline(video)
# wasd_preds: [1, 128, 4] boolean - W, A, S, D key states
# mouse_preds: [1, 128, 2] float - dx, dy mouse movements
```

## Model Files

- `config.yml`: Training configuration
- `model.pt`: Model checkpoint (EMA weights)
- `inference.py`: Inference pipeline (download from repo)

## Citation

```bibtex
@software{owl_idm_2024,
  title = {Owl IDM: Inverse Dynamics Models for Gameplay},
  author = {Your Name},
  year = {2024},
  url = {https://huggingface.co/Overworld/owl-idm-v0-tiny}
}
```

## License

MIT License