NeuroVista: Multimodal Brain-Response Prediction System

A production-grade, compact multimodal brain-response prediction model inspired by Meta's TRIBE v2 and related neural encoding research. NeuroVista predicts human brain (fMRI BOLD-like) responses to text, images, audio, and video stimuli while providing interpretable explanations, region-level summaries, cautious Q&A, and 3D brain visualizations.

Architecture Overview

┌─────────────────────────────────────────────────────────────────┐
│                    NeuroVista Architecture                        │
├─────────────────────────────────────────────────────────────────┤
│  Modality Encoders (frozen pretrained + LoRA adapters)          │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐        │
│  │  Text    │  │  Image   │  │  Audio   │  │  Video   │        │
│  │ OPT-1.3B │  │CLIP ViT-B│  │Whisper   │  │CLIP+Temp │        │
│  │ ~2.6GB   │  │ ~570MB   │  │~280MB   │  │ ~100MB   │        │
│  └────┬─────┘  └────┬─────┘  └────┬─────┘  └────┬─────┘        │
│       └──────────────┴──────────────┘             │               │
│                       │                          │               │
│                       ▼                          ▼               │
│              ┌─────────────────────────────────┐                  │
│              │  Cross-Modal Fusion (RoPE)     │                  │
│              │  2 layers, 8 heads, 512 dim     │                  │
│              │  ~20MB                          │                  │
│              └─────────────┬───────────────────┘                  │
│                            ▼                                     │
│              ┌─────────────────────────────────┐                  │
│              │  Brain Decoder (temporal trans)   │                  │
│              │  4 layers, 8 heads, 512 dim       │                  │
│              │  Subject-conditioned + population │                  │
│              │  ~100MB                         │                  │
│              └─────────────┬───────────────────┘                  │
│                            ▼                                     │
│              ┌─────────────────────────────────┐                  │
│              │  Interpretation Heads           │                  │
│              │  ROI, Network, Modality, Unc.   │                  │
│              │  ~50MB                          │                  │
│              └─────────────────────────────────┘                  │
└─────────────────────────────────────────────────────────────────┘

Total Deployable Size: ~3.5-5 GB (fits 6-10 GB target with headroom)

Features

Prediction Outputs

Brain Activity Map: 1000-parcel whole-brain predictions
Temporal Dynamics: Per-TR predictions over stimulus window
Subject Conditioning: Subject-specific or population-average modes
Uncertainty Estimates: Epistemic + aleatoric uncertainty via MC dropout

Interpretation Outputs

ROI Activation Ranking: Top positive/negative regions
Network Attribution: 7 functional network engagement scores
Modality Contributions: Which input stream drove the prediction
Confidence Scoring: Low/moderate/high confidence indicators

Cautious Q&A System

Answers questions using only predicted activation maps and atlas knowledge:

"Which regions are most activated?"
"Are these regions associated with language, vision, or emotion?"
"How certain is this prediction?"
"What would a population-level response look like?"

All answers include calibrated uncertainty language and explicit caveats. Never claims to read minds or diagnose.

3D Visualization

Cortical surface heatmap overlays (nilearn-based)
Hemisphere comparison views (lateral, medial, dorsal, ventral)
Parcel-level activation bars with network coloring
Uncertainty overlay plots
Modality contribution pie charts
Comprehensive interpretation reports

Installation

pip install -r requirements.txt

Quick Start

Training

python -m neurovista.scripts.train --config configs/base_config.yaml --data_dir ./data --dry_run

Inference

python -m neurovista.scripts.infer --model_dir ./model_export --text "A natural scene" --image scene.jpg --output_dir ./output

Demo

python neurovista/demo.py

Scientific Framing & Safety

What This Model Estimates

Likely cortical response patterns from stimulus features
Probable functional systems engagement
Sensory, language, attention, memory, and affect-related correlates
Uncertainty-aware predictions with explicit confidence levels

What This Model Does NOT Claim

Exact thoughts or mental content
Exact emotions (only correlates with uncertainty labels)
Clinical diagnoses
Guaranteed behavior of "normal" individuals

References

TRIBE (Meta/Facebook): arxiv:2507.22229 — Algonauts 2025 brain encoding
VIBE: arxiv:2507.17958 — Video-input brain encoder
BraInCoRL: arxiv:2505.15813 — In-context brain prediction

License

MIT License — Research use only. Not for clinical diagnosis.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Space using ryu34/neurovista-multimodal-brain 1

Papers for ryu34/neurovista-multimodal-brain