---
license: cc-by-nc-4.0
---
# TRIBE v2
**A Foundation Model of Vision, Audition, and Language for In-Silico Neuroscience**
[](https://colab.research.google.com/github/facebookresearch/tribev2/blob/main/tribe_demo.ipynb)
[](https://creativecommons.org/licenses/by-nc/4.0/)
[](https://www.python.org/downloads/)
π [Paper](https://ai.meta.com/research/publications/a-foundation-model-of-vision-audition-and-language-for-in-silico-neuroscience/) βΆοΈ [Demo](https://aidemos.atmeta.com/tribev2/) | π€ [Weights](https://huggingface.co/facebook/tribev2)
TRIBE v2 is a deep multimodal brain encoding model that predicts fMRI brain responses to naturalistic stimuli (video, audio, text). It combines state-of-the-art feature extractors β [**LLaMA 3.2**](https://huggingface.co/meta-llama/Llama-3.2-3B) (text), [**V-JEPA2**](https://huggingface.co/facebook/vjepa2-vitg-fpc64-256) (video), and [**Wav2Vec-BERT**](https://huggingface.co/facebook/w2v-bert-2.0) (audio) β into a unified Transformer architecture that maps multimodal representations onto the cortical surface.
## Quick start
Load a pretrained model from HuggingFace and predict brain responses to a video:
```python
from tribev2 import TribeModel
model = TribeModel.from_pretrained("facebook/tribev2", cache_folder="./cache")
df = model.get_events_dataframe(video_path="path/to/video.mp4")
preds, segments = model.predict(events=df)
print(preds.shape) # (n_timesteps, n_vertices)
```
Predictions are for the "average" subject (see paper for details) and live on the **fsaverage5** cortical mesh (~20k vertices). You can also pass `text_path` or `audio_path` to `model.get_events_dataframe` β text is automatically converted to speech and transcribed to obtain word-level timings.
For a full walkthrough with brain visualizations, see the [Colab demo notebook](https://colab.research.google.com/github/facebookresearch/tribev2/blob/main/tribe_demo.ipynb).
## Installation
**Basic** (inference only):
```bash
pip install -e .
```
**With brain visualization**:
```bash
pip install -e ".[plotting]"
```
**With training dependencies** (PyTorch Lightning, W&B, etc.):
```bash
pip install -e ".[training]"
```
## Training a model from scratch
### 1. Set environment variables
Configure data/output paths and Slurm partition (or edit `tribev2/grids/defaults.py` directly):
```bash
export DATAPATH="/path/to/studies"
export SAVEPATH="/path/to/output"
export SLURM_PARTITION="your_partition"
```
### 2. Authenticate with HuggingFace
The text encoder requires access to the gated [LLaMA 3.2-3B](https://huggingface.co/meta-llama/Llama-3.2-3B) model:
```bash
huggingface-cli login
```
Create a `read` [access token](https://huggingface.co/settings/tokens) and paste it when prompted.
### 3. Run training
**Local test run:**
```bash
python -m tribev2.grids.test_run
```
**Grid search on Slurm:**
```bash
python -m tribev2.grids.run_cortical
python -m tribev2.grids.run_subcortical
```
## Project structure
```
tribev2/
βββ main.py # Experiment pipeline: Data, TribeExperiment
βββ model.py # FmriEncoder: Transformer-based multimodalβfMRI model
βββ pl_module.py # PyTorch Lightning training module
βββ demo_utils.py # TribeModel and helpers for inference from text/audio/video
βββ eventstransforms.py # Custom event transforms (word extraction, chunking, β¦)
βββ utils.py # Multi-study loading, splitting, subject weighting
βββ utils_fmri.py # Surface projection (MNI / fsaverage) and ROI analysis
βββ grids/
β βββ defaults.py # Full default experiment configuration
β βββ test_run.py # Quick local test entry point
βββ plotting/ # Brain visualization (PyVista & Nilearn backends)
βββ studies/ # Dataset definitions (Algonauts2025, Lahner2024, β¦)
```
## Contributing to open science
If you use this software, please share your results with the broader research community using the following citation:
```bibtex
@article{dAscoli2026TribeV2,
title={A foundation model of vision, audition, and language for in-silico neuroscience},
author={d'Ascoli, St{\'e}phane and Rapin, J{\'e}r{\'e}my and Benchetrit, Yohann and Brookes, Teon and Begany, Katelyn and Raugel, Jos{\'e}phine and Banville, Hubert and King, Jean-R{\'e}mi},
year={2026}
}
```
## License
This project is licensed under CC-BY-NC-4.0. See [LICENSE](LICENSE) for details.
## Contributing
See [CONTRIBUTING.md](CONTRIBUTING.md) for how to get involved.