metadata
license: mit
tags:
- jepa
- world-model
- reinforcement-learning
- audio
- pytorch
library_name: pytorch
silent — JEPA world model that plays predator by listening
A 13M-parameter Joint Embedding Predictive Architecture (JEPA) trained to predict next-step audio embeddings on a custom predator-prey environment. The predator senses the world through four cardioid microphones (N/E/S/W) on its body and chooses thrust + sonar ping actions to hunt the player.
- Live demo: https://sotoalt.dev/experiments/silent.html
- Code: https://github.com/SotoAlt/silent
- Research journal: https://github.com/SotoAlt/silent/blob/main/docs/JOURNAL.md
Architecture
- ViT-Tiny encoder (4-channel input, trained from scratch, ~6M params)
- Linear action encoder (frameskip x 3 -> 192)
- 6-layer AR causal transformer predictor with AdaLN-zero conditioning
- 192 -> 2048 -> 192 projector MLP with BatchNorm
- SIGReg regularizer on projected embeddings
- Jointly-trained state head MLP (192 -> 256 -> 256 -> 8) at lambda=10
Total: ~13M params. Runs at ~10 Hz on a single shared CPU vCPU.
Files
| File | Purpose |
|---|---|
silent_v1_3e_ep030.pt |
Shipping checkpoint -- joint DexWM, lambda=10 |
3e_ep030_head_uniform.pt |
Post-hoc state head for planner CEM cost |
Quick start
pip install torch torchvision timm einops fastapi uvicorn websockets \
librosa pymunk h5py pygame scipy
# Download checkpoints
huggingface-cli download sotoalt/silent --local-dir checkpoints/
# Clone the code
git clone https://github.com/SotoAlt/silent.git
cd silent
# Run the inference server
python -m world_model.infer_silent_env \
--jepa-ckpt checkpoints/silent_v1_3e_ep030.pt \
--jepa-head checkpoints/3e_ep030_head_uniform.pt \
--host 0.0.0.0 --port 8801
# Open http://localhost:8801/ in a browser. WASD to move, space to voice.
Training
The full pipeline (data generation, pure-LeWM smoke test, preflight v2 probe, joint DexWM validation gate, full 100-epoch run, post-hoc head, ship audit) is documented in the research journal and the README.
Related work
- LeWM (Maes, Le Lidec, Scieur, LeCun, Balestriero, 2026) -
arxiv 2603.19312 - DexWM -
arxiv 2512.13644(the joint state-head technique) - V-JEPA 2-AC (FAIR, 2026)
License
MIT