Instructions to use zimengxiong/tribev2-mlx with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use zimengxiong/tribev2-mlx with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir tribev2-mlx zimengxiong/tribev2-mlx
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
TRIBE v2 MLX
This repository is an MLX conversion of the trainable brain encoder weights from the original facebook/tribev2 model.
It is a derivative conversion, not a new fine-tune. No additional training was performed. The tensors were converted from the released TRIBE v2 PyTorch checkpoint into an MLX .npz bundle for local Apple Silicon inference.
Original Project
- Original Hugging Face model: facebook/tribev2
- Original GitHub repository: facebookresearch/tribev2
- Paper page: A foundation model of vision, audition, and language for in-silico neuroscience
- Original authors: Stéphane d'Ascoli, Jérémy Rapin, Yohann Benchetrit, Teon Brookes, Katelyn Begany, Joséphine Raugel, Hubert Banville, Jean-Rémi King
What Is Included
tribev2_mlx_float32.npz: MLX-compatible float32 weights for the released TRIBE v2 brain encoder.config.json: small architecture metadata used by the local MLX runner.
This repo does not include the frozen upstream feature extractor weights. TRIBE v2 uses feature tensors from:
- LLaMA 3.2 for text
- Wav2Vec-BERT for audio
- V-JEPA2 for video
The MLX bundle expects precomputed TRIBE-compatible feature tensors and predicts fMRI-like BOLD responses on the fsaverage5 cortical mesh.
Local Code
The local conversion and inference code is in:
tribev2-mlx local repo is not upstreamed; this HF repo only hosts the converted weights. The conversion was generated from the local workspace repo tribev2-mlx.
Key local commands:
tribev2-mlx-convert \
--checkpoint /path/to/facebook-tribev2/best.ckpt \
--out-dir weights \
--dtype float32
tribev2-mlx-infer \
--weights weights/tribev2_mlx_float32.npz \
--features features.npz \
--out preds.npz
Input Format
The MLX runner expects a .npz with any subset of:
text:(B, 2, 3072, T)or(B, 6144, T)audio:(B, 2, 1024, T)or(B, 2048, T)video:(B, 2, 1408, T)or(B, 2816, T)
The default output is shaped:
preds:(B, 20484, 100)
where 20484 is the fsaverage5 cortical vertex count used by TRIBE v2.
Verification
Parity was checked against the original PyTorch TRIBE v2 brain encoder on identical synthetic feature tensors:
- PyTorch output shape:
(1, 20484, 100) - MLX output shape:
(1, 20484, 100) - max absolute difference:
1.84029e-06 - mean absolute difference:
1.33767e-07
Limitations
- This predicts TRIBE v2's fMRI/BOLD response targets. It does not directly predict dopamine, liking, retention, or subjective preference.
- Full text+audio+video extraction requires access to the upstream feature extractors, including the gated
meta-llama/Llama-3.2-3Bmodel. - CPU feature extraction with V-JEPA2 is slow; the MLX brain encoder is fast once features are available.
Citation
@article{dAscoli2026TribeV2,
title={A foundation model of vision, audition, and language for in-silico neuroscience},
author={d'Ascoli, St{\'e}phane and Rapin, J{\'e}r{\'e}my and Benchetrit, Yohann and Brookes, Teon and Begany, Katelyn and Raugel, Jos{\'e}phine and Banville, Hubert and King, Jean-R{\'e}mi},
year={2026}
}
License
The original TRIBE v2 code and weights are released under CC BY-NC 4.0. This converted MLX bundle follows the same non-commercial license terms.
- Downloads last month
- 84
Model tree for zimengxiong/tribev2-mlx
Base model
facebook/tribev2