owl-idm-v0-tiny / README.md
shahbuland's picture
Update README.md
2b0f0d6 verified
---
tags:
- inverse-dynamics-model
- gameplay
- computer-vision
- fps-games
library_name: owl-idm
---
# Owl IDM - Owl IDM v0-tiny
Inverse Dynamics Model (IDM) trained to predict keyboard (WASD) and mouse inputs from gameplay video frames.
## Model Description
This model predicts player controls from visual observations:
- **Input**: Sequence of RGB frames (256x256)
- **Output**:
- WASD key predictions (4 binary outputs)
- Mouse movement (dx, dy in pixels)
## Architecture
- **Backbone**: Spatial Conv3D encoder → Temporal Transformer
- **Window size**: 8 frames
- **Model size**: 70M parameters
- **Inference speed**: ~1500 FPS on H100 GPU
## Training
- **Dataset**: FPS gameplay recordings
- **Preprocessing**:
- Frames scaled to [-1, 1]
- Log1p scaling for mouse: True
- **Loss**: BCE for WASD + Huber for mouse
## Usage
### Installation
```bash
# Install the package directly from GitHub
pip install git+https://github.com/overworld/owl-idm-3.git
# Or with inference dependencies
pip install "owl-idm[inference] @ git+https://github.com/overworld/owl-idm-3.git"
```
### Inference
```python
from owl_idms import InferencePipeline
import torch
# Load from Hugging Face Hub
pipeline = InferencePipeline.from_pretrained(
"Overworld/owl-idm-v0-tiny",
device="cuda"
)
# Prepare video: [batch, frames, channels, height, width] in range [-1, 1]
video = torch.randn(1, 128, 3, 256, 256) * 2 - 1 # Example
# Run inference
wasd_preds, mouse_preds = pipeline(video)
# wasd_preds: [1, 128, 4] boolean - W, A, S, D key states
# mouse_preds: [1, 128, 2] float - dx, dy mouse movements
```
## Model Files
- `config.yml`: Training configuration
- `model.pt`: Model checkpoint (EMA weights)
- `inference.py`: Inference pipeline (download from repo)
## Citation
```bibtex
@software{owl_idm_2024,
title = {Owl IDM: Inverse Dynamics Models for Gameplay},
author = {Your Name},
year = {2024},
url = {https://huggingface.co/Overworld/owl-idm-v0-tiny}
}
```
## License
MIT License