YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

NeuroVista: Multimodal Brain-Response Prediction System

A production-grade, compact multimodal brain-response prediction model inspired by Meta's TRIBE v2 and related neural encoding research. NeuroVista predicts human brain (fMRI BOLD-like) responses to text, images, audio, and video stimuli while providing interpretable explanations, region-level summaries, cautious Q&A, and 3D brain visualizations.

Architecture Overview

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    NeuroVista Architecture                        β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Modality Encoders (frozen pretrained + LoRA adapters)          β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”        β”‚
β”‚  β”‚  Text    β”‚  β”‚  Image   β”‚  β”‚  Audio   β”‚  β”‚  Video   β”‚        β”‚
β”‚  β”‚ OPT-1.3B β”‚  β”‚CLIP ViT-Bβ”‚  β”‚Whisper   β”‚  β”‚CLIP+Temp β”‚        β”‚
β”‚  β”‚ ~2.6GB   β”‚  β”‚ ~570MB   β”‚  β”‚~280MB   β”‚  β”‚ ~100MB   β”‚        β”‚
β”‚  β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜        β”‚
β”‚       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜             β”‚               β”‚
β”‚                       β”‚                          β”‚               β”‚
β”‚                       β–Ό                          β–Ό               β”‚
β”‚              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                  β”‚
β”‚              β”‚  Cross-Modal Fusion (RoPE)     β”‚                  β”‚
β”‚              β”‚  2 layers, 8 heads, 512 dim     β”‚                  β”‚
β”‚              β”‚  ~20MB                          β”‚                  β”‚
β”‚              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                  β”‚
β”‚                            β–Ό                                     β”‚
β”‚              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                  β”‚
β”‚              β”‚  Brain Decoder (temporal trans)   β”‚                  β”‚
β”‚              β”‚  4 layers, 8 heads, 512 dim       β”‚                  β”‚
β”‚              β”‚  Subject-conditioned + population β”‚                  β”‚
β”‚              β”‚  ~100MB                         β”‚                  β”‚
β”‚              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                  β”‚
β”‚                            β–Ό                                     β”‚
β”‚              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                  β”‚
β”‚              β”‚  Interpretation Heads           β”‚                  β”‚
β”‚              β”‚  ROI, Network, Modality, Unc.   β”‚                  β”‚
β”‚              β”‚  ~50MB                          β”‚                  β”‚
β”‚              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Total Deployable Size: ~3.5-5 GB (fits 6-10 GB target with headroom)

Features

Prediction Outputs

  • Brain Activity Map: 1000-parcel whole-brain predictions
  • Temporal Dynamics: Per-TR predictions over stimulus window
  • Subject Conditioning: Subject-specific or population-average modes
  • Uncertainty Estimates: Epistemic + aleatoric uncertainty via MC dropout

Interpretation Outputs

  • ROI Activation Ranking: Top positive/negative regions
  • Network Attribution: 7 functional network engagement scores
  • Modality Contributions: Which input stream drove the prediction
  • Confidence Scoring: Low/moderate/high confidence indicators

Cautious Q&A System

Answers questions using only predicted activation maps and atlas knowledge:

  • "Which regions are most activated?"
  • "Are these regions associated with language, vision, or emotion?"
  • "How certain is this prediction?"
  • "What would a population-level response look like?"

All answers include calibrated uncertainty language and explicit caveats. Never claims to read minds or diagnose.

3D Visualization

  • Cortical surface heatmap overlays (nilearn-based)
  • Hemisphere comparison views (lateral, medial, dorsal, ventral)
  • Parcel-level activation bars with network coloring
  • Uncertainty overlay plots
  • Modality contribution pie charts
  • Comprehensive interpretation reports

Installation

pip install -r requirements.txt

Quick Start

Training

python -m neurovista.scripts.train --config configs/base_config.yaml --data_dir ./data --dry_run

Inference

python -m neurovista.scripts.infer --model_dir ./model_export --text "A natural scene" --image scene.jpg --output_dir ./output

Demo

python neurovista/demo.py

Scientific Framing & Safety

What This Model Estimates

  • Likely cortical response patterns from stimulus features
  • Probable functional systems engagement
  • Sensory, language, attention, memory, and affect-related correlates
  • Uncertainty-aware predictions with explicit confidence levels

What This Model Does NOT Claim

  • Exact thoughts or mental content
  • Exact emotions (only correlates with uncertainty labels)
  • Clinical diagnoses
  • Guaranteed behavior of "normal" individuals

References

  • TRIBE (Meta/Facebook): arxiv:2507.22229 β€” Algonauts 2025 brain encoding
  • VIBE: arxiv:2507.17958 β€” Video-input brain encoder
  • BraInCoRL: arxiv:2505.15813 β€” In-context brain prediction

License

MIT License β€” Research use only. Not for clinical diagnosis.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Space using ryu34/neurovista-multimodal-brain 1

Papers for ryu34/neurovista-multimodal-brain