Add README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,123 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<div align="center">
|
| 2 |
+
|
| 3 |
+
# TRIBE v2
|
| 4 |
+
|
| 5 |
+
**A Foundation Model of Vision, Audition, and Language for In-Silico Neuroscience**
|
| 6 |
+
|
| 7 |
+
[](https://colab.research.google.com/github/facebookresearch/tribev2/blob/main/tribe_demo.ipynb)
|
| 8 |
+
[](https://creativecommons.org/licenses/by-nc/4.0/)
|
| 9 |
+
[](https://www.python.org/downloads/)
|
| 10 |
+
|
| 11 |
+
π [Paper](https://ai.meta.com/research/publications/a-foundation-model-of-vision-audition-and-language-for-in-silico-neuroscience/) βΆοΈ [Demo](https://aidemos.atmeta.com/tribev2/) | π€ [Weights](https://huggingface.co/facebook/tribev2)
|
| 12 |
+
|
| 13 |
+
</div>
|
| 14 |
+
|
| 15 |
+
TRIBE v2 is a deep multimodal brain encoding model that predicts fMRI brain responses to naturalistic stimuli (video, audio, text). It combines state-of-the-art feature extractors β [**LLaMA 3.2**](https://huggingface.co/meta-llama/Llama-3.2-3B) (text), [**V-JEPA2**](https://huggingface.co/facebook/vjepa2-vitg-fpc64-256) (video), and [**Wav2Vec-BERT**](https://huggingface.co/facebook/w2v-bert-2.0) (audio) β into a unified Transformer architecture that maps multimodal representations onto the cortical surface.
|
| 16 |
+
|
| 17 |
+
## Quick start
|
| 18 |
+
|
| 19 |
+
Load a pretrained model from HuggingFace and predict brain responses to a video:
|
| 20 |
+
|
| 21 |
+
```python
|
| 22 |
+
from tribev2 import TribeModel
|
| 23 |
+
|
| 24 |
+
model = TribeModel.from_pretrained("facebook/tribev2", cache_folder="./cache")
|
| 25 |
+
|
| 26 |
+
df = model.get_events_dataframe(video_path="path/to/video.mp4")
|
| 27 |
+
preds, segments = model.predict(events=df)
|
| 28 |
+
print(preds.shape) # (n_timesteps, n_vertices)
|
| 29 |
+
```
|
| 30 |
+
|
| 31 |
+
Predictions are for the "average" subject (see paper for details) and live on the **fsaverage5** cortical mesh (~20k vertices). You can also pass `text_path` or `audio_path` to `model.get_events_dataframe` β text is automatically converted to speech and transcribed to obtain word-level timings.
|
| 32 |
+
|
| 33 |
+
For a full walkthrough with brain visualizations, see the [Colab demo notebook](https://colab.research.google.com/github/facebookresearch/tribev2/blob/main/tribe_demo.ipynb).
|
| 34 |
+
|
| 35 |
+
## Installation
|
| 36 |
+
|
| 37 |
+
**Basic** (inference only):
|
| 38 |
+
```bash
|
| 39 |
+
pip install -e .
|
| 40 |
+
```
|
| 41 |
+
|
| 42 |
+
**With brain visualization**:
|
| 43 |
+
```bash
|
| 44 |
+
pip install -e ".[plotting]"
|
| 45 |
+
```
|
| 46 |
+
|
| 47 |
+
**With training dependencies** (PyTorch Lightning, W&B, etc.):
|
| 48 |
+
```bash
|
| 49 |
+
pip install -e ".[training]"
|
| 50 |
+
```
|
| 51 |
+
|
| 52 |
+
## Training a model from scratch
|
| 53 |
+
|
| 54 |
+
### 1. Set environment variables
|
| 55 |
+
|
| 56 |
+
Configure data/output paths and Slurm partition (or edit `tribev2/grids/defaults.py` directly):
|
| 57 |
+
|
| 58 |
+
```bash
|
| 59 |
+
export DATAPATH="/path/to/studies"
|
| 60 |
+
export SAVEPATH="/path/to/output"
|
| 61 |
+
export SLURM_PARTITION="your_partition"
|
| 62 |
+
```
|
| 63 |
+
|
| 64 |
+
### 2. Authenticate with HuggingFace
|
| 65 |
+
|
| 66 |
+
The text encoder requires access to the gated [LLaMA 3.2-3B](https://huggingface.co/meta-llama/Llama-3.2-3B) model:
|
| 67 |
+
|
| 68 |
+
```bash
|
| 69 |
+
huggingface-cli login
|
| 70 |
+
```
|
| 71 |
+
|
| 72 |
+
Create a `read` [access token](https://huggingface.co/settings/tokens) and paste it when prompted.
|
| 73 |
+
|
| 74 |
+
### 3. Run training
|
| 75 |
+
|
| 76 |
+
**Local test run:**
|
| 77 |
+
```bash
|
| 78 |
+
python -m tribev2.grids.test_run
|
| 79 |
+
```
|
| 80 |
+
|
| 81 |
+
**Grid search on Slurm:**
|
| 82 |
+
```bash
|
| 83 |
+
python -m tribev2.grids.run_cortical
|
| 84 |
+
python -m tribev2.grids.run_subcortical
|
| 85 |
+
```
|
| 86 |
+
|
| 87 |
+
## Project structure
|
| 88 |
+
|
| 89 |
+
```
|
| 90 |
+
tribev2/
|
| 91 |
+
βββ main.py # Experiment pipeline: Data, TribeExperiment
|
| 92 |
+
βββ model.py # FmriEncoder: Transformer-based multimodalβfMRI model
|
| 93 |
+
βββ pl_module.py # PyTorch Lightning training module
|
| 94 |
+
βββ demo_utils.py # TribeModel and helpers for inference from text/audio/video
|
| 95 |
+
βββ eventstransforms.py # Custom event transforms (word extraction, chunking, β¦)
|
| 96 |
+
βββ utils.py # Multi-study loading, splitting, subject weighting
|
| 97 |
+
βββ utils_fmri.py # Surface projection (MNI / fsaverage) and ROI analysis
|
| 98 |
+
βββ grids/
|
| 99 |
+
β βββ defaults.py # Full default experiment configuration
|
| 100 |
+
β βββ test_run.py # Quick local test entry point
|
| 101 |
+
βββ plotting/ # Brain visualization (PyVista & Nilearn backends)
|
| 102 |
+
βββ studies/ # Dataset definitions (Algonauts2025, Lahner2024, β¦)
|
| 103 |
+
```
|
| 104 |
+
|
| 105 |
+
## Contributing to open science
|
| 106 |
+
|
| 107 |
+
If you use this software, please share your results with the broader research community using the following citation:
|
| 108 |
+
|
| 109 |
+
```bibtex
|
| 110 |
+
@article{dAscoli2026TribeV2,
|
| 111 |
+
title={A foundation model of vision, audition, and language for in-silico neuroscience},
|
| 112 |
+
author={d'Ascoli, St{\'e}phane and Rapin, J{\'e}r{\'e}my and Benchetrit, Yohann and Brookes, Teon and Begany, Katelyn and Raugel, Jos{\'e}phine and Banville, Hubert and King, Jean-R{\'e}mi},
|
| 113 |
+
year={2026}
|
| 114 |
+
}
|
| 115 |
+
```
|
| 116 |
+
|
| 117 |
+
## License
|
| 118 |
+
|
| 119 |
+
This project is licensed under CC-BY-NC-4.0. See [LICENSE](LICENSE) for details.
|
| 120 |
+
|
| 121 |
+
## Contributing
|
| 122 |
+
|
| 123 |
+
See [CONTRIBUTING.md](CONTRIBUTING.md) for how to get involved.
|