|
|
--- |
|
|
title: Cognitive Proxy |
|
|
emoji: 🧠 |
|
|
colorFrom: gray |
|
|
colorTo: gray |
|
|
sdk: gradio |
|
|
sdk_version: 4.44.1 |
|
|
app_file: app.py |
|
|
pinned: false |
|
|
license: cc-by-4.0 |
|
|
model: TinyLlama/TinyLlama-1.1B-Chat-v1.0 |
|
|
--- |
|
|
|
|
|
# Brain Coordinates for Language Models |
|
|
|
|
|
[](https://creativecommons.org/licenses/by/4.0/) |
|
|
[](https://huggingface.co/spaces/ai-nthusiast/cognitive-proxy) |
|
|
[](https://arxiv.org) |
|
|
|
|
|
**MEG Phase-Locking as a Steering Geometry for LLMs** |
|
|
|
|
|
**Author**: Sandro Andric |
|
|
|
|
|
## Overview |
|
|
|
|
|
We propose using human brain activity not as a score to optimize, but as a **coordinate system** for reading and steering model states. From MEG recordings of 21 subjects listening to naturalistic speech, we construct a brain atlas of Phase-Locking Value (PLV) patterns for 2,113 words and train lightweight adapters that project frozen LLM hidden states into this space. |
|
|
|
|
|
**Key Results**: |
|
|
- **Function-Content Axis**: Dominant axis (61% variance) separating syntactic binding from semantic access |
|
|
- **Cross-Architecture Transfer**: GPT-2 (d=1.59) and TinyLlama (d=1.40), both p < 10^-22 |
|
|
- **Bidirectional Steering**: Control generation along brain-derived axes (p < 0.0001) |
|
|
- **Scale-Dependent Structure**: Agency axis transfers to larger model only (d=-0.82) |
|
|
|
|
|
--- |
|
|
|
|
|
## 1. Installation |
|
|
|
|
|
Requires Python 3.9+ and PyTorch. |
|
|
|
|
|
```bash |
|
|
# Install dependencies |
|
|
pip install torch transformers scikit-learn pandas scipy numpy streamlit plotly sentencepiece |
|
|
|
|
|
# Ensure local modules are Importable |
|
|
export PYTHONPATH=$PYTHONPATH:$(pwd)/src |
|
|
``` |
|
|
|
|
|
## 2. Reproduction Pipeline |
|
|
|
|
|
To reproduce the scientific results from scratch, execute the following steps in order. |
|
|
|
|
|
### Step 1: Build the Cognitive Atlas |
|
|
Constructs the "Brain Dictionary" from the MEG-MASC dataset. |
|
|
* **Input**: MEG-MASC BIDS data (configured in `DATA_ROOT`). |
|
|
* **Output**: `results/final_atlas_256.pkl` (and `_vocab.pkl`). |
|
|
|
|
|
```bash |
|
|
python experiments/build_clustered_atlas.py |
|
|
``` |
|
|
|
|
|
### Step 2: Interpret the Axis (Phase 10.1) |
|
|
Analyzes the semantics of the discovered brain clusters. |
|
|
* **Output**: Correlation stats showing Cluster A = Function, Cluster B = Content. |
|
|
|
|
|
```bash |
|
|
python experiments/analyze_axis_correlations.py \ |
|
|
--pos-cluster Cluster_2 \ |
|
|
--neg-cluster Cluster_3 |
|
|
``` |
|
|
|
|
|
### Step 3: Train the Adapter |
|
|
Trains the MLP mapping `GPT-2 Hidden -> Brain PLV`. |
|
|
* **Input**: GPT-2 Tokenizer + Atlas. |
|
|
* **Output**: `results/gpt2_adapter.pt`. |
|
|
|
|
|
```bash |
|
|
python experiments/train_gpt2_adapter.py |
|
|
``` |
|
|
|
|
|
### Step 4: Validate the Alignment (Phase 10.2) |
|
|
Performs the rigorous T-Test on held-out words. |
|
|
* **Metric**: Cohen's d > 0.9 expected for Function vs Concrete. |
|
|
|
|
|
```bash |
|
|
python experiments/validate_adapter_stats.py \ |
|
|
--pos-cluster Cluster_2 \ |
|
|
--neg-cluster Cluster_3 |
|
|
``` |
|
|
|
|
|
### Step 5: Systematic Steering (Phase 10.3) |
|
|
Generates text under "Neuro-Steering" conditions to measure causal effect. |
|
|
|
|
|
```bash |
|
|
python experiments/evaluate_steering_batch.py \ |
|
|
--pos-cluster Cluster_2 \ |
|
|
--neg-cluster Cluster_3 \ |
|
|
--alpha 50.0 |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## 3. Interactive Demo (Cognitive Proxy) |
|
|
|
|
|
**Try it online**: [huggingface.co/spaces/ai-nthusiast/cognitive-proxy](https://huggingface.co/spaces/ai-nthusiast/cognitive-proxy) |
|
|
|
|
|
Or run locally: |
|
|
|
|
|
```bash |
|
|
streamlit run src/ui/app_tinyllama_minimal.py |
|
|
``` |
|
|
|
|
|
Features: |
|
|
- **Compare**: See three generation variants side-by-side (semantic, baseline, syntactic) |
|
|
- **Inspect**: Analyze text projection onto brain coordinate space with PLV visualization |
|
|
- **Steer**: Manually control generation along the Function-Content axis |
|
|
|
|
|
--- |
|
|
|
|
|
## 4. Directory Structure |
|
|
|
|
|
* `src/`: Core libraries (`models`, `data`, `ui`). |
|
|
* `experiments/`: Scientific scripts (Training, Validation). |
|
|
* `results/`: Trained models (`.pt`) and atlases (`.pkl`). |
|
|
* `artifacts/`: Project history, papers, and walkthroughs. |
|
|
|