ryu34 commited on
Commit
e1a420c
·
verified ·
1 Parent(s): 07879f8

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +65 -0
README.md ADDED
@@ -0,0 +1,65 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 🧠 Multimodal Brain Encoder
2
+
3
+ A **real** brain encoding model that predicts fMRI brain activity from multimodal inputs (images, text, audio).
4
+
5
+ ## Architecture
6
+
7
+ | Component | Details |
8
+ |-----------|---------|
9
+ | Feature Extractor | CLIP ViT-L/14 (openai/clip-vit-large-patch14) |
10
+ | Feature Layers | Layers 6, 12, 18, 24 CLS tokens concatenated (4096-dim) |
11
+ | Brain Encoder | Deep network: 4096 → 2048 → 2048 → 1024 → N_voxels |
12
+ | ROI Heads | 5 functional network-specific attention heads |
13
+ | Ridge Baseline | sklearn RidgeCV (Algonauts 2023 recipe) |
14
+ | Q&A System | Grounded LLM interpreter (Qwen2.5-72B) |
15
+
16
+ ## Training Data
17
+
18
+ - **Dataset**: [Natural Scenes Dataset (NSD)](https://huggingface.co/datasets/pscotti/naturalscenesdataset)
19
+ - **Subject**: subj01 (7T fMRI)
20
+ - **Training samples**: 2000 images with paired fMRI responses
21
+ - **Validation**: 200 images
22
+ - **Voxels**: ~47,236 (nsdgeneral mask)
23
+
24
+ ## Brain Regions (24 ROIs)
25
+
26
+ | Network | Regions | Function |
27
+ |---------|---------|----------|
28
+ | Early Visual | V1v, V1d, V2v, V2d, V3v, V3d, hV4 | Basic visual processing |
29
+ | Body Selective | EBA, FBA-1, FBA-2, mTL-bodies | Body/person perception |
30
+ | Face Selective | OFA, FFA-1, FFA-2, mTL-faces, aTL-faces | Face recognition |
31
+ | Place Selective | OPA, PPA, RSC | Scene/navigation |
32
+ | Word Selective | OWFA, VWFA-1, VWFA-2, mfs-words, mTL-words | Reading/text |
33
+
34
+ ## How It Works
35
+
36
+ 1. **Input** → CLIP ViT-L/14 multi-layer features (4096-dim)
37
+ 2. **Brain Encoder** → Predicted fMRI voxel activations (~47k voxels)
38
+ 3. **ROI Analysis** → Per-region activation summaries with uncertainty
39
+ 4. **LLM Q&A** → Grounded interpretation (only references model outputs)
40
+
41
+ ## References
42
+
43
+ - Allen et al. (2022). A massive 7T fMRI dataset. *Nature Neuroscience*
44
+ - Gifford et al. (2023). The Algonauts Project 2023 Challenge
45
+ - Radford et al. (2021). Learning Transferable Visual Models (CLIP)
46
+ - Adeli & Zelinsky (2025). Transformer Brain Encoders (arxiv:2505.17329)
47
+
48
+ ## Usage
49
+
50
+ ```python
51
+ from huggingface_hub import hf_hub_download
52
+ import torch, numpy as np
53
+
54
+ # Load model
55
+ model_path = hf_hub_download(repo_id="ryu34/multimodal-brain-encoder", filename="best_model.pt")
56
+ checkpoint = torch.load(model_path, map_location="cpu", weights_only=False)
57
+
58
+ # Load your CLIP features (4096-dim multi-layer)
59
+ # features = extract_clip_features(image) # See app.py for full pipeline
60
+
61
+ # Predict brain activity
62
+ model = BrainEncoder(**checkpoint['config'])
63
+ model.load_state_dict(checkpoint['model_state_dict'])
64
+ predictions = model(features) # [1, n_voxels]
65
+ ```