--- license: mit tags: - jepa - world-model - reinforcement-learning - audio - pytorch library_name: pytorch --- # silent — JEPA world model that plays predator by listening A 13M-parameter Joint Embedding Predictive Architecture (JEPA) trained to predict next-step audio embeddings on a custom predator-prey environment. The predator senses the world through four cardioid microphones (N/E/S/W) on its body and chooses thrust + sonar ping actions to hunt the player. - **Live demo**: https://sotoalt.dev/experiments/silent.html - **Code**: https://github.com/SotoAlt/silent - **Research journal**: https://github.com/SotoAlt/silent/blob/main/docs/JOURNAL.md ## Architecture - ViT-Tiny encoder (4-channel input, trained from scratch, ~6M params) - Linear action encoder (frameskip x 3 -> 192) - 6-layer AR causal transformer predictor with AdaLN-zero conditioning - 192 -> 2048 -> 192 projector MLP with BatchNorm - SIGReg regularizer on projected embeddings - Jointly-trained state head MLP (192 -> 256 -> 256 -> 8) at lambda=10 Total: ~13M params. Runs at ~10 Hz on a single shared CPU vCPU. ## Files | File | Purpose | |-------------------------------------|-----------------------------------------------------| | `silent_v1_3e_ep030.pt` | Shipping checkpoint -- joint DexWM, lambda=10 | | `3e_ep030_head_uniform.pt` | Post-hoc state head for planner CEM cost | ## Quick start ```bash pip install torch torchvision timm einops fastapi uvicorn websockets \ librosa pymunk h5py pygame scipy # Download checkpoints huggingface-cli download sotoalt/silent --local-dir checkpoints/ # Clone the code git clone https://github.com/SotoAlt/silent.git cd silent # Run the inference server python -m world_model.infer_silent_env \ --jepa-ckpt checkpoints/silent_v1_3e_ep030.pt \ --jepa-head checkpoints/3e_ep030_head_uniform.pt \ --host 0.0.0.0 --port 8801 # Open http://localhost:8801/ in a browser. WASD to move, space to voice. ``` ## Training The full pipeline (data generation, pure-LeWM smoke test, preflight v2 probe, joint DexWM validation gate, full 100-epoch run, post-hoc head, ship audit) is documented in the [research journal](https://github.com/SotoAlt/silent/blob/main/docs/JOURNAL.md) and the [README](https://github.com/SotoAlt/silent#training-from-scratch). ## Related work - LeWM (Maes, Le Lidec, Scieur, LeCun, Balestriero, 2026) - `arxiv 2603.19312` - DexWM - `arxiv 2512.13644` (the joint state-head technique) - V-JEPA 2-AC (FAIR, 2026) ## License MIT