tribev2 / README.md

correct license in metadata in README.md (#2)

f894e78 about 8 hours ago

4.85 kB

	---
	license: cc-by-nc-4.0
	---
	<div align="center">

	# TRIBE v2

	A Foundation Model of Vision, Audition, and Language for In-Silico Neuroscience

	[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/facebookresearch/tribev2/blob/main/tribe_demo.ipynb)
	[![License: CC BY-NC 4.0](https://img.shields.io/badge/License-CC%20BY--NC%204.0-lightgrey.svg)](https://creativecommons.org/licenses/by-nc/4.0/)
	[![Python 3.10+](https://img.shields.io/badge/python-3.10%2B-blue.svg)](https://www.python.org/downloads/)

	📄 [Paper](https://ai.meta.com/research/publications/a-foundation-model-of-vision-audition-and-language-for-in-silico-neuroscience/) ▶️ [Demo](https://aidemos.atmeta.com/tribev2/) \| 🤗 [Weights](https://huggingface.co/facebook/tribev2)

	</div>

	TRIBE v2 is a deep multimodal brain encoding model that predicts fMRI brain responses to naturalistic stimuli (video, audio, text). It combines state-of-the-art feature extractors — [LLaMA 3.2](https://huggingface.co/meta-llama/Llama-3.2-3B) (text), [V-JEPA2](https://huggingface.co/facebook/vjepa2-vitg-fpc64-256) (video), and [Wav2Vec-BERT](https://huggingface.co/facebook/w2v-bert-2.0) (audio) — into a unified Transformer architecture that maps multimodal representations onto the cortical surface.

	## Quick start

	Load a pretrained model from HuggingFace and predict brain responses to a video:

	```python
	from tribev2 import TribeModel

	model = TribeModel.from_pretrained("facebook/tribev2", cache_folder="./cache")

	df = model.get_events_dataframe(video_path="path/to/video.mp4")
	preds, segments = model.predict(events=df)
	print(preds.shape) # (n_timesteps, n_vertices)
	```

	Predictions are for the "average" subject (see paper for details) and live on the fsaverage5 cortical mesh (~20k vertices). You can also pass `text_path` or `audio_path` to `model.get_events_dataframe` — text is automatically converted to speech and transcribed to obtain word-level timings.

	For a full walkthrough with brain visualizations, see the [Colab demo notebook](https://colab.research.google.com/github/facebookresearch/tribev2/blob/main/tribe_demo.ipynb).

	## Installation

	Basic (inference only):
	```bash
	pip install -e .
	```

	With brain visualization:
	```bash
	pip install -e ".[plotting]"
	```

	With training dependencies (PyTorch Lightning, W&B, etc.):
	```bash
	pip install -e ".[training]"
	```

	## Training a model from scratch

	### 1. Set environment variables

	Configure data/output paths and Slurm partition (or edit `tribev2/grids/defaults.py` directly):

	```bash
	export DATAPATH="/path/to/studies"
	export SAVEPATH="/path/to/output"
	export SLURM_PARTITION="your_partition"
	```

	### 2. Authenticate with HuggingFace

	The text encoder requires access to the gated [LLaMA 3.2-3B](https://huggingface.co/meta-llama/Llama-3.2-3B) model:

	```bash
	huggingface-cli login
	```

	Create a `read` [access token](https://huggingface.co/settings/tokens) and paste it when prompted.

	### 3. Run training

	Local test run:
	```bash
	python -m tribev2.grids.test_run
	```

	Grid search on Slurm:
	```bash
	python -m tribev2.grids.run_cortical
	python -m tribev2.grids.run_subcortical
	```

	## Project structure

	```
	tribev2/
	├── main.py # Experiment pipeline: Data, TribeExperiment
	├── model.py # FmriEncoder: Transformer-based multimodal→fMRI model
	├── pl_module.py # PyTorch Lightning training module
	├── demo_utils.py # TribeModel and helpers for inference from text/audio/video
	├── eventstransforms.py # Custom event transforms (word extraction, chunking, …)
	├── utils.py # Multi-study loading, splitting, subject weighting
	├── utils_fmri.py # Surface projection (MNI / fsaverage) and ROI analysis
	├── grids/
	│ ├── defaults.py # Full default experiment configuration
	│ └── test_run.py # Quick local test entry point
	├── plotting/ # Brain visualization (PyVista & Nilearn backends)
	└── studies/ # Dataset definitions (Algonauts2025, Lahner2024, …)
	```

	## Contributing to open science

	If you use this software, please share your results with the broader research community using the following citation:

	```bibtex
	@article{dAscoli2026TribeV2,
	title={A foundation model of vision, audition, and language for in-silico neuroscience},
	author={d'Ascoli, St{\'e}phane and Rapin, J{\'e}r{\'e}my and Benchetrit, Yohann and Brookes, Teon and Begany, Katelyn and Raugel, Jos{\'e}phine and Banville, Hubert and King, Jean-R{\'e}mi},
	year={2026}
	}
	```

	## License

	This project is licensed under CC-BY-NC-4.0. See [LICENSE](LICENSE) for details.

	## Contributing

	See [CONTRIBUTING.md](CONTRIBUTING.md) for how to get involved.