sdascoli commited on
Commit
87e70fc
Β·
verified Β·
1 Parent(s): d46e6e4

Add README.md

Browse files
Files changed (1) hide show
  1. README.md +123 -3
README.md CHANGED
@@ -1,3 +1,123 @@
1
- ---
2
- license: cc-by-nc-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <div align="center">
2
+
3
+ # TRIBE v2
4
+
5
+ **A Foundation Model of Vision, Audition, and Language for In-Silico Neuroscience**
6
+
7
+ [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/facebookresearch/tribev2/blob/main/tribe_demo.ipynb)
8
+ [![License: CC BY-NC 4.0](https://img.shields.io/badge/License-CC%20BY--NC%204.0-lightgrey.svg)](https://creativecommons.org/licenses/by-nc/4.0/)
9
+ [![Python 3.10+](https://img.shields.io/badge/python-3.10%2B-blue.svg)](https://www.python.org/downloads/)
10
+
11
+ πŸ“„ [Paper](https://ai.meta.com/research/publications/a-foundation-model-of-vision-audition-and-language-for-in-silico-neuroscience/) ▢️ [Demo](https://aidemos.atmeta.com/tribev2/) | πŸ€— [Weights](https://huggingface.co/facebook/tribev2)
12
+
13
+ </div>
14
+
15
+ TRIBE v2 is a deep multimodal brain encoding model that predicts fMRI brain responses to naturalistic stimuli (video, audio, text). It combines state-of-the-art feature extractors β€” [**LLaMA 3.2**](https://huggingface.co/meta-llama/Llama-3.2-3B) (text), [**V-JEPA2**](https://huggingface.co/facebook/vjepa2-vitg-fpc64-256) (video), and [**Wav2Vec-BERT**](https://huggingface.co/facebook/w2v-bert-2.0) (audio) β€” into a unified Transformer architecture that maps multimodal representations onto the cortical surface.
16
+
17
+ ## Quick start
18
+
19
+ Load a pretrained model from HuggingFace and predict brain responses to a video:
20
+
21
+ ```python
22
+ from tribev2 import TribeModel
23
+
24
+ model = TribeModel.from_pretrained("facebook/tribev2", cache_folder="./cache")
25
+
26
+ df = model.get_events_dataframe(video_path="path/to/video.mp4")
27
+ preds, segments = model.predict(events=df)
28
+ print(preds.shape) # (n_timesteps, n_vertices)
29
+ ```
30
+
31
+ Predictions are for the "average" subject (see paper for details) and live on the **fsaverage5** cortical mesh (~20k vertices). You can also pass `text_path` or `audio_path` to `model.get_events_dataframe` β€” text is automatically converted to speech and transcribed to obtain word-level timings.
32
+
33
+ For a full walkthrough with brain visualizations, see the [Colab demo notebook](https://colab.research.google.com/github/facebookresearch/tribev2/blob/main/tribe_demo.ipynb).
34
+
35
+ ## Installation
36
+
37
+ **Basic** (inference only):
38
+ ```bash
39
+ pip install -e .
40
+ ```
41
+
42
+ **With brain visualization**:
43
+ ```bash
44
+ pip install -e ".[plotting]"
45
+ ```
46
+
47
+ **With training dependencies** (PyTorch Lightning, W&B, etc.):
48
+ ```bash
49
+ pip install -e ".[training]"
50
+ ```
51
+
52
+ ## Training a model from scratch
53
+
54
+ ### 1. Set environment variables
55
+
56
+ Configure data/output paths and Slurm partition (or edit `tribev2/grids/defaults.py` directly):
57
+
58
+ ```bash
59
+ export DATAPATH="/path/to/studies"
60
+ export SAVEPATH="/path/to/output"
61
+ export SLURM_PARTITION="your_partition"
62
+ ```
63
+
64
+ ### 2. Authenticate with HuggingFace
65
+
66
+ The text encoder requires access to the gated [LLaMA 3.2-3B](https://huggingface.co/meta-llama/Llama-3.2-3B) model:
67
+
68
+ ```bash
69
+ huggingface-cli login
70
+ ```
71
+
72
+ Create a `read` [access token](https://huggingface.co/settings/tokens) and paste it when prompted.
73
+
74
+ ### 3. Run training
75
+
76
+ **Local test run:**
77
+ ```bash
78
+ python -m tribev2.grids.test_run
79
+ ```
80
+
81
+ **Grid search on Slurm:**
82
+ ```bash
83
+ python -m tribev2.grids.run_cortical
84
+ python -m tribev2.grids.run_subcortical
85
+ ```
86
+
87
+ ## Project structure
88
+
89
+ ```
90
+ tribev2/
91
+ β”œβ”€β”€ main.py # Experiment pipeline: Data, TribeExperiment
92
+ β”œβ”€β”€ model.py # FmriEncoder: Transformer-based multimodalβ†’fMRI model
93
+ β”œβ”€β”€ pl_module.py # PyTorch Lightning training module
94
+ β”œβ”€β”€ demo_utils.py # TribeModel and helpers for inference from text/audio/video
95
+ β”œβ”€β”€ eventstransforms.py # Custom event transforms (word extraction, chunking, …)
96
+ β”œβ”€β”€ utils.py # Multi-study loading, splitting, subject weighting
97
+ β”œβ”€β”€ utils_fmri.py # Surface projection (MNI / fsaverage) and ROI analysis
98
+ β”œβ”€β”€ grids/
99
+ β”‚ β”œβ”€β”€ defaults.py # Full default experiment configuration
100
+ β”‚ └── test_run.py # Quick local test entry point
101
+ β”œβ”€β”€ plotting/ # Brain visualization (PyVista & Nilearn backends)
102
+ └── studies/ # Dataset definitions (Algonauts2025, Lahner2024, …)
103
+ ```
104
+
105
+ ## Contributing to open science
106
+
107
+ If you use this software, please share your results with the broader research community using the following citation:
108
+
109
+ ```bibtex
110
+ @article{dAscoli2026TribeV2,
111
+ title={A foundation model of vision, audition, and language for in-silico neuroscience},
112
+ author={d'Ascoli, St{\'e}phane and Rapin, J{\'e}r{\'e}my and Benchetrit, Yohann and Brookes, Teon and Begany, Katelyn and Raugel, Jos{\'e}phine and Banville, Hubert and King, Jean-R{\'e}mi},
113
+ year={2026}
114
+ }
115
+ ```
116
+
117
+ ## License
118
+
119
+ This project is licensed under CC-BY-NC-4.0. See [LICENSE](LICENSE) for details.
120
+
121
+ ## Contributing
122
+
123
+ See [CONTRIBUTING.md](CONTRIBUTING.md) for how to get involved.