File size: 2,014 Bytes
8942ffb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2b0f0d6
 
8942ffb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
---
tags:
- inverse-dynamics-model
- gameplay
- computer-vision
- fps-games
library_name: owl-idm
---

# Owl IDM - Owl IDM v0-tiny

Inverse Dynamics Model (IDM) trained to predict keyboard (WASD) and mouse inputs from gameplay video frames.

## Model Description

This model predicts player controls from visual observations:
- **Input**: Sequence of RGB frames (256x256)
- **Output**:
  - WASD key predictions (4 binary outputs)
  - Mouse movement (dx, dy in pixels)

## Architecture

- **Backbone**: Spatial Conv3D encoder → Temporal Transformer
- **Window size**: 8 frames
- **Model size**: 70M parameters
- **Inference speed**: ~1500 FPS on H100 GPU

## Training

- **Dataset**: FPS gameplay recordings
- **Preprocessing**:
  - Frames scaled to [-1, 1]
  - Log1p scaling for mouse: True
- **Loss**: BCE for WASD + Huber for mouse

## Usage

### Installation

```bash
# Install the package directly from GitHub
pip install git+https://github.com/overworld/owl-idm-3.git

# Or with inference dependencies
pip install "owl-idm[inference] @ git+https://github.com/overworld/owl-idm-3.git"
```

### Inference

```python
from owl_idms import InferencePipeline
import torch

# Load from Hugging Face Hub
pipeline = InferencePipeline.from_pretrained(
    "Overworld/owl-idm-v0-tiny",
    device="cuda"
)

# Prepare video: [batch, frames, channels, height, width] in range [-1, 1]
video = torch.randn(1, 128, 3, 256, 256) * 2 - 1  # Example

# Run inference
wasd_preds, mouse_preds = pipeline(video)
# wasd_preds: [1, 128, 4] boolean - W, A, S, D key states
# mouse_preds: [1, 128, 2] float - dx, dy mouse movements
```

## Model Files

- `config.yml`: Training configuration
- `model.pt`: Model checkpoint (EMA weights)
- `inference.py`: Inference pipeline (download from repo)

## Citation

```bibtex
@software{owl_idm_2024,
  title = {Owl IDM: Inverse Dynamics Models for Gameplay},
  author = {Your Name},
  year = {2024},
  url = {https://huggingface.co/Overworld/owl-idm-v0-tiny}
}
```

## License

MIT License