V-JEPA2 (MLX)
Collection
Apple MLX fp16 ports of Meta V-JEPA2 ViT-L — video embeddings, JEPA predictor, SSv2 classifier. MIT. • 3 items • Updated
How to use mlx-community/V-JEPA2-AC-vitg with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir V-JEPA2-AC-vitg mlx-community/V-JEPA2-AC-vitg
Apple MLX fp16 port of Meta's V-JEPA2-AC (vjepa2-ac-vitg): a ViT-g
video encoder + an action-conditioned predictor that, given encoder context
tokens + per-frame 7-DoF robot poses (action/state), predicts future latent
states — the world-model used for robot planning. MIT.
from vjepa2_mlx.utils.weights import build_ac_encoder, build_ac_predictor
enc = build_ac_encoder() # ViT-g encoder (hidden 1408, 40 layers)
pred = build_ac_predictor() # AC predictor (frame-causal, 3D-RoPE)
# tokens = enc(video); future = pred(tokens, actions, states)
MIT (© Meta Platforms). Converted from vjepa2-ac-vitg.pt.
Quantized