facebook
/

tribev2

Model card Files Files and versions

xet

Community

sdascoli commited on Mar 26

Commit

87e70fc

verified ·

1 Parent(s): d46e6e4

Add README.md

Browse files

Files changed (1) hide show

README.md +123 -3

README.md CHANGED Viewed

@@ -1,3 +1,123 @@
----
-license: cc-by-nc-4.0
----

+<div align="center">
+# TRIBE v2
+**A Foundation Model of Vision, Audition, and Language for In-Silico Neuroscience**
+[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/facebookresearch/tribev2/blob/main/tribe_demo.ipynb)
+[![License: CC BY-NC 4.0](https://img.shields.io/badge/License-CC%20BY--NC%204.0-lightgrey.svg)](https://creativecommons.org/licenses/by-nc/4.0/)
+[![Python 3.10+](https://img.shields.io/badge/python-3.10%2B-blue.svg)](https://www.python.org/downloads/)
+📄 [Paper](https://ai.meta.com/research/publications/a-foundation-model-of-vision-audition-and-language-for-in-silico-neuroscience/) ▶️ [Demo](https://aidemos.atmeta.com/tribev2/) | 🤗 [Weights](https://huggingface.co/facebook/tribev2)
+</div>
+TRIBE v2 is a deep multimodal brain encoding model that predicts fMRI brain responses to naturalistic stimuli (video, audio, text). It combines state-of-the-art feature extractors — [**LLaMA 3.2**](https://huggingface.co/meta-llama/Llama-3.2-3B) (text), [**V-JEPA2**](https://huggingface.co/facebook/vjepa2-vitg-fpc64-256) (video), and [**Wav2Vec-BERT**](https://huggingface.co/facebook/w2v-bert-2.0) (audio) — into a unified Transformer architecture that maps multimodal representations onto the cortical surface.
+## Quick start
+Load a pretrained model from HuggingFace and predict brain responses to a video:
+```python
+from tribev2 import TribeModel
+model = TribeModel.from_pretrained("facebook/tribev2", cache_folder="./cache")
+df = model.get_events_dataframe(video_path="path/to/video.mp4")
+preds, segments = model.predict(events=df)
+print(preds.shape)  # (n_timesteps, n_vertices)
+```
+Predictions are for the "average" subject (see paper for details) and live on the **fsaverage5** cortical mesh (~20k vertices). You can also pass `text_path` or `audio_path` to `model.get_events_dataframe` — text is automatically converted to speech and transcribed to obtain word-level timings.
+For a full walkthrough with brain visualizations, see the [Colab demo notebook](https://colab.research.google.com/github/facebookresearch/tribev2/blob/main/tribe_demo.ipynb).
+## Installation
+**Basic** (inference only):
+```bash
+pip install -e .
+```
+**With brain visualization**:
+```bash
+pip install -e ".[plotting]"
+```
+**With training dependencies** (PyTorch Lightning, W&B, etc.):
+```bash
+pip install -e ".[training]"
+```
+## Training a model from scratch
+### 1. Set environment variables
+Configure data/output paths and Slurm partition (or edit `tribev2/grids/defaults.py` directly):
+```bash
+export DATAPATH="/path/to/studies"
+export SAVEPATH="/path/to/output"
+export SLURM_PARTITION="your_partition"
+```
+### 2. Authenticate with HuggingFace
+The text encoder requires access to the gated [LLaMA 3.2-3B](https://huggingface.co/meta-llama/Llama-3.2-3B) model:
+```bash
+huggingface-cli login
+```
+Create a `read` [access token](https://huggingface.co/settings/tokens) and paste it when prompted.
+### 3. Run training
+**Local test run:**
+```bash
+python -m tribev2.grids.test_run
+```
+**Grid search on Slurm:**
+```bash
+python -m tribev2.grids.run_cortical
+python -m tribev2.grids.run_subcortical
+```
+## Project structure
+```
+tribev2/
+├── main.py              # Experiment pipeline: Data, TribeExperiment
+├── model.py             # FmriEncoder: Transformer-based multimodal→fMRI model
+├── pl_module.py         # PyTorch Lightning training module
+├── demo_utils.py        # TribeModel and helpers for inference from text/audio/video
+├── eventstransforms.py  # Custom event transforms (word extraction, chunking, …)
+├── utils.py             # Multi-study loading, splitting, subject weighting
+├── utils_fmri.py        # Surface projection (MNI / fsaverage) and ROI analysis
+├── grids/
+│   ├── defaults.py      # Full default experiment configuration
+│   └── test_run.py      # Quick local test entry point
+├── plotting/            # Brain visualization (PyVista & Nilearn backends)
+└── studies/             # Dataset definitions (Algonauts2025, Lahner2024, …)
+```
+## Contributing to open science
+If you use this software, please share your results with the broader research community using the following citation:
+```bibtex
+@article{dAscoli2026TribeV2,
+  title={A foundation model of vision, audition, and language for in-silico neuroscience},
+  author={d'Ascoli, St{\'e}phane and Rapin, J{\'e}r{\'e}my and Benchetrit, Yohann and Brookes, Teon and Begany, Katelyn and Raugel, Jos{\'e}phine and Banville, Hubert and King, Jean-R{\'e}mi},
+  year={2026}
+}
+```
+## License
+This project is licensed under CC-BY-NC-4.0. See [LICENSE](LICENSE) for details.
+## Contributing
+See [CONTRIBUTING.md](CONTRIBUTING.md) for how to get involved.