--- tags: - inverse-dynamics-model - gameplay - computer-vision - fps-games library_name: owl-idm --- # Owl IDM - Owl IDM v0-tiny Inverse Dynamics Model (IDM) trained to predict keyboard (WASD) and mouse inputs from gameplay video frames. ## Model Description This model predicts player controls from visual observations: - **Input**: Sequence of RGB frames (256x256) - **Output**: - WASD key predictions (4 binary outputs) - Mouse movement (dx, dy in pixels) ## Architecture - **Backbone**: Spatial Conv3D encoder → Temporal Transformer - **Window size**: 8 frames - **Model size**: 70M parameters - **Inference speed**: ~1500 FPS on H100 GPU ## Training - **Dataset**: FPS gameplay recordings - **Preprocessing**: - Frames scaled to [-1, 1] - Log1p scaling for mouse: True - **Loss**: BCE for WASD + Huber for mouse ## Usage ### Installation ```bash # Install the package directly from GitHub pip install git+https://github.com/overworld/owl-idm-3.git # Or with inference dependencies pip install "owl-idm[inference] @ git+https://github.com/overworld/owl-idm-3.git" ``` ### Inference ```python from owl_idms import InferencePipeline import torch # Load from Hugging Face Hub pipeline = InferencePipeline.from_pretrained( "Overworld/owl-idm-v0-tiny", device="cuda" ) # Prepare video: [batch, frames, channels, height, width] in range [-1, 1] video = torch.randn(1, 128, 3, 256, 256) * 2 - 1 # Example # Run inference wasd_preds, mouse_preds = pipeline(video) # wasd_preds: [1, 128, 4] boolean - W, A, S, D key states # mouse_preds: [1, 128, 2] float - dx, dy mouse movements ``` ## Model Files - `config.yml`: Training configuration - `model.pt`: Model checkpoint (EMA weights) - `inference.py`: Inference pipeline (download from repo) ## Citation ```bibtex @software{owl_idm_2024, title = {Owl IDM: Inverse Dynamics Models for Gameplay}, author = {Your Name}, year = {2024}, url = {https://huggingface.co/Overworld/owl-idm-v0-tiny} } ``` ## License MIT License