stat214-lab3-ridge-models

Per-voxel ridge regression weight matrices for the Stat 214 (Spring 2026) final project at UC Berkeley. These models predict whole-brain BOLD signal from spoken-story transcripts on the Huth Lab fMRI dataset (Subjects 2 and 3).

Each .pkl file is a Python dict with:

Key Type Description
weights np.ndarray (D, V) float64 Per-voxel ridge weights
alphas np.ndarray (V,) Per-voxel selected ridge alpha
alpha_grid np.ndarray (30,) Candidate alpha grid (logspace(-1, 6, 30))
train_mean_X / train_std_X feature-axis z-score stats needed to apply weights to new X
train_mean_Y / train_std_Y voxel-axis z-score stats needed to invert prediction back to BOLD
train_stories / test_stories list[str] provenance

D = feature dim after 4-lag concatenation, V = number of voxels per subject (94,251 for Subject 2, 95,556 for Subject 3).

Files

File family Embedding Notes
ridge_bow_subject{2,3}.pkl Bag-of-Words D=24,368 (6,092 vocab × 4 lags)
ridge_word2vec_subject{2,3}.pkl Word2Vec (word2vec-google-news-300) D=1,200 (300 × 4)
ridge_glove_subject{2,3}.pkl GloVe (glove-wiki-gigaword-300) D=1,200
ridge_bert_pretrained_subject{2,3}.pkl bert-base-uncased layer-12 D=3,072 (768 × 4)
ridge_bert_lora_lora_r{4,8}_maxlen{128,256}_subject{2,3}.pkl LoRA fine-tuned BERT (4 configs) D=3,072
ridge_bert_var_layer{4,8,11,12}_val_then_test_subject{2,3}.pkl BERT layer-{4,8,11,12} (val-then-test mode) D=3,072
ridge_bert_var_{concat,avg}_4_8_12_val_then_test_subject{2,3}.pkl BERT 3-layer concat / avg D=9,216 / 3,072
ridge_bert_var_layer8_full_train_subject{2,3}.pkl BERT layer-8 (winner, full 86-story train) D=3,072

Loading

import pickle
from huggingface_hub import hf_hub_download

path = hf_hub_download(
    repo_id="RheaTinghe/stat214-lab3-ridge-models",
    filename="ridge_bert_var_layer8_full_train_subject2.pkl",
)
with open(path, "rb") as f:
    model = pickle.load(f)

# Predict for new z-scored test features X (T x D):
import numpy as np
X_test_z = (X_test - model["train_mean_X"]) / model["train_std_X"]
Y_pred_z = X_test_z @ model["weights"]
Y_pred = Y_pred_z * model["train_std_Y"] + model["train_mean_Y"]

Citation

@misc{stat214lab3,
  author = {Galloro, Drew and Wang, Ruihang and Khothsombath, Benjamin and Zhang, Rhea},
  title  = {Stat 214 Lab 3: voxel-wise encoding of spoken stories},
  year   = {2026},
  note   = {UC Berkeley Spring 2026},
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support