Spaces:
Sleeping
Sleeping
| # Implementation Tracker: Glass-Box Dashboard | |
| **Project:** PhD Study - Making Architecture Transparent for Code Generation | |
| **Timeline:** 8 weeks (November 2025 - December 2025) | |
| **Status:** Week 1 - In Progress | |
| **Last Updated:** 2025-11-01 | |
| --- | |
| ## Overview | |
| This document tracks progress through the 8-week implementation plan outlined in the PhD Study Specification. Each week has specific deliverables, acceptance criteria, and links to relevant code/files. | |
| --- | |
| ## Week 1-2: Core Model Instrumentation | |
| **Goal:** Implement PyTorch hooks, tokenizer instrumentation, zarr storage, and minimal API endpoint. | |
| **Status:** π‘ In Progress | |
| ### Tasks | |
| #### 1.1 PyTorch Hooks for Attention & Residuals | |
| - [ ] Add forward hooks to capture attention tensors `A[L,H,T,T]` | |
| - [ ] Capture residual norms `||x_l||` per layer | |
| - [ ] Capture logits, logprobs, entropy per token | |
| - [ ] Record timing per layer (latency profiling) | |
| - [ ] Optional: FFN activations for future SAE integration | |
| **Files:** `/backend/model_service.py`, `/backend/instrumentation.py` (new) | |
| **Acceptance Criteria:** | |
| - Attention tensors stored with shape (num_layers, num_heads, seq_len, seq_len) | |
| - Residual norms array with shape (num_layers, seq_len) | |
| - Per-token metadata includes logprob, entropy, timing | |
| - Latency per layer < 10ms overhead on avg | |
| **Notes:** | |
| --- | |
| #### 1.2 Tokenizer Instrumentation | |
| - [ ] Capture BPE/SentencePiece subword splits | |
| - [ ] Record byte length per token | |
| - [ ] Store token IDs and text | |
| - [ ] Identify multi-split identifiers (β₯3 subwords) | |
| **Files:** `/backend/tokenizer_utils.py` (new) | |
| **Acceptance Criteria:** | |
| - Each token has `bpe: [subword1, subword2, ...]` field | |
| - Byte length calculated correctly (matches `len(token.encode('utf-8'))`) | |
| - Multi-split identifiers flagged with `multi_split: true` | |
| **Notes:** | |
| --- | |
| #### 1.3 Zarr/Memmap Storage Layer | |
| - [ ] Implement zarr writer with chunking strategy `(layer, head)` | |
| - [ ] Create directory structure: `runs/{run_id}/tensors/` | |
| - [ ] Store attention, residuals, logits as zarr arrays | |
| - [ ] Implement lazy loading for frontend access | |
| **Files:** `/backend/storage.py` (new), `/backend/zarr_utils.py` (new) | |
| **Acceptance Criteria:** | |
| - Zarr arrays created with correct chunking | |
| - File size reasonable (< 500MB for 512 token generation with 32 layers) | |
| - Load time < 50ms for single layer/head slice | |
| - Compression ratio > 3x (use Blosc) | |
| **Notes:** | |
| --- | |
| #### 1.4 Minimal API Endpoint `/analyze/study` | |
| - [ ] Create POST endpoint accepting prompt + generation params | |
| - [ ] Generate Run ID (format: `R{date}-{time}-{hash}`) | |
| - [ ] Implement deterministic generation (fixed seed) | |
| - [ ] Return minimal data contract JSON | |
| - [ ] Store telemetry (JSONL format) | |
| **Files:** `/backend/model_service.py` | |
| **API Contract:** | |
| ```json | |
| POST /analyze/study | |
| { | |
| "prompt": "def factorial(n):", | |
| "max_tokens": 50, | |
| "seed": 42, | |
| "temperature": 0.0, | |
| "instrumentation": ["attention", "residuals", "tokenizer"] | |
| } | |
| Response: | |
| { | |
| "run_id": "R2025-11-01-1430-a7f3", | |
| "tokens": [...], // minimal data contract | |
| "tensor_path": "runs/R2025-11-01-1430-a7f3/tensors/", | |
| "telemetry_path": "runs/R2025-11-01-1430-a7f3/telemetry.jsonl" | |
| } | |
| ``` | |
| **Acceptance Criteria:** | |
| - Endpoint returns in < 5s for 50-token generation | |
| - Run ID is unique and reproducible with same seed | |
| - Telemetry JSONL created with `run.start` and `run.end` events | |
| - Tensors stored in zarr format | |
| **Notes:** | |
| --- | |
| #### 1.5 Attention Rollout & Head Ranking | |
| - [ ] Implement attention rollout algorithm (Kovaleva-style) | |
| - [ ] Rank heads by rollout contribution (top-k = 20) | |
| - [ ] Store head rankings in Run ID metadata | |
| **Files:** `/backend/attention_analysis.py` (new) | |
| **Acceptance Criteria:** | |
| - Rollout matrix computed efficiently (< 100ms for 512 tokens) | |
| - Top-20 heads identified by max rollout weight | |
| - Rankings stored in `runs/{run_id}/metadata.json` | |
| **Notes:** | |
| --- | |
| ### Week 1-2 Acceptance Criteria (Overall) | |
| - [ ] All 5 tasks completed | |
| - [ ] Latency < 250ms for β€512 tokens (measured end-to-end) | |
| - [ ] Zarr storage working correctly (can reload tensors) | |
| - [ ] API endpoint functional (manual test via curl/Postman) | |
| - [ ] Run ID reproducibility verified (same seed β same output) | |
| ### Blockers | |
| - **None yet** | |
| ### Decisions Made | |
| - **2025-11-01:** Using zarr instead of HDF5 for better chunking and parallel access. | |
| --- | |
| ## Week 3: Attention Visualization | |
| **Goal:** Build interactive attention heatmap, head grid, and rollout toggle. | |
| **Status:** π΄ Not Started | |
| ### Tasks | |
| #### 3.1 Frontend: Attention Heatmap (WebGL) | |
| - [ ] Create `/components/study/AttentionVisualization.tsx` | |
| - [ ] Implement WebGL-based heatmap for performance | |
| - [ ] Add hover tooltips showing exact attention weights | |
| - [ ] Support aggregated (all heads) and per-head views | |
| **Files:** `/components/study/AttentionVisualization.tsx` | |
| **Acceptance Criteria:** | |
| - Renders 512x512 heatmap in < 100ms | |
| - Hover shows source token, target token, weight | |
| - Toggle between aggregated and per-head | |
| **Notes:** | |
| --- | |
| #### 3.2 Frontend: Head Grid (Layer Γ Head Matrix) | |
| - [ ] Display Layer Γ Head grid with mini-sparklines | |
| - [ ] Show mean attention to token classes (identifiers, operators, etc.) | |
| - [ ] Click head β overlay on main heatmap | |
| **Files:** `/components/study/HeadGrid.tsx` | |
| **Acceptance Criteria:** | |
| - Grid renders 32Γ32 cells in < 50ms | |
| - Sparklines show attention distribution | |
| - Click interaction works smoothly | |
| **Notes:** | |
| --- | |
| #### 3.3 Attention Rollout Toggle | |
| - [ ] Add toggle button: Raw Attention vs Rollout | |
| - [ ] Fetch rollout data from backend | |
| - [ ] Update heatmap dynamically | |
| **Files:** `/components/study/AttentionVisualization.tsx` | |
| **Acceptance Criteria:** | |
| - Toggle switches view in < 100ms | |
| - Rollout data fetched lazily (not on initial load) | |
| **Notes:** | |
| --- | |
| #### 3.4 Interactions: Brush & Pin | |
| - [ ] Implement brush selection on context tokens | |
| - [ ] Highlight downstream tokens impacted by selection | |
| - [ ] Add "pin" button to save sourceβtarget pair for ablation | |
| **Files:** `/components/study/AttentionVisualization.tsx` | |
| **Acceptance Criteria:** | |
| - Brush selection responsive (< 50ms) | |
| - Pinned pairs visible in sidebar | |
| - Pin data passed to Ablation pane | |
| **Notes:** | |
| --- | |
| #### 3.5 Disclaimer & Warnings | |
| - [ ] Add text: "Attention is descriptive; causal claims require ablation" | |
| - [ ] Warn if temperature > 1.2 or top-k sampling active | |
| **Files:** `/components/study/AttentionVisualization.tsx` | |
| **Acceptance Criteria:** | |
| - Disclaimer visible at top of pane | |
| - Warnings shown contextually | |
| **Notes:** | |
| --- | |
| ### Week 3 Acceptance Criteria (Overall) | |
| - [ ] Attention visualization fully functional | |
| - [ ] Interactive latency < 150ms for all operations | |
| - [ ] Cross-links to Ablation pane working | |
| - [ ] Manual test with Code Llama 7B (50-token generation) | |
| ### Blockers | |
| ### Decisions Made | |
| --- | |
| ## Week 4: Token Size & Confidence Visualization | |
| **Goal:** Build token chip bar, entropy sparkline, and risk hotspot flags. | |
| **Status:** π΄ Not Started | |
| ### Tasks | |
| #### 4.1 Frontend: Token Chip Bar | |
| - [ ] Create `/components/study/TokenConfidenceView.tsx` | |
| - [ ] Render tokens as chips: width = byte length, opacity = confidence | |
| - [ ] Add click handler to show tokenization + top-k alternatives | |
| **Files:** `/components/study/TokenConfidenceView.tsx` | |
| **Acceptance Criteria:** | |
| - Chips render correctly with variable widths | |
| - Opacity maps to confidence (1 - entropy or exp(logprob)) | |
| - Click shows detailed panel | |
| **Notes:** | |
| --- | |
| #### 4.2 Frontend: Entropy Sparkline | |
| - [ ] Add sparkline above/below token bar showing entropy per token | |
| - [ ] Highlight peaks (entropy β₯ Ο_H, initially 1.5 nats) | |
| - [ ] Add calibration toggle (show thresholds for keywords/identifiers/operators) | |
| **Files:** `/components/study/TokenConfidenceView.tsx` | |
| **Acceptance Criteria:** | |
| - Sparkline renders in < 50ms | |
| - Peaks clearly visible | |
| - Threshold adjustable via slider | |
| **Notes:** | |
| --- | |
| #### 4.3 Risk Hotspot Flags | |
| - [ ] Flag identifiers split into β₯3 subwords AND entropy peak | |
| - [ ] Display flag icon on token chips | |
| - [ ] Compute Bug-risk AUC (requires ground truth bug locations) | |
| **Files:** `/components/study/TokenConfidenceView.tsx`, `/backend/risk_analysis.py` (new) | |
| **Acceptance Criteria:** | |
| - Flags appear on relevant tokens | |
| - AUC metric computed (requires pilot data) | |
| **Notes:** | |
| --- | |
| #### 4.4 Top-k Alternatives Panel | |
| - [ ] Show top-k alternatives with probabilities on token click | |
| - [ ] Display attention snippet (which context tokens justified each alternative) | |
| **Files:** `/components/study/TokenConfidenceView.tsx` | |
| **Acceptance Criteria:** | |
| - Panel shows top-3 alternatives minimum | |
| - Attention snippet links to Attention visualization | |
| **Notes:** | |
| --- | |
| #### 4.5 Cost/Latency Estimator | |
| - [ ] Add widget showing cumulative decoding time | |
| - [ ] Estimate API cost (tokens Γ price per token) | |
| **Files:** `/components/study/TokenConfidenceView.tsx` | |
| **Acceptance Criteria:** | |
| - Time displayed in ms | |
| - Cost displayed in USD (or N/A for local) | |
| **Notes:** | |
| --- | |
| ### Week 4 Acceptance Criteria (Overall) | |
| - [ ] Token Size & Confidence view functional | |
| - [ ] Risk hotspots flagged correctly | |
| - [ ] Interactive latency < 150ms | |
| - [ ] Manual test with Code Llama 7B | |
| ### Blockers | |
| ### Decisions Made | |
| --- | |
| ## Week 5: Ablation Visualization | |
| **Goal:** Build interactive ablation controls with head toggles, layer bypass, and diff viewer. | |
| **Status:** π΄ Not Started | |
| ### Tasks | |
| #### 5.1 Backend: Ablation Engine | |
| - [ ] Implement head masking (zero out or uniform attention) | |
| - [ ] Implement layer bypass (skip layer, pass residual through) | |
| - [ ] Support token constraints (force/ban specific tokens) | |
| - [ ] Add surrogate regressor for predicted Ξlog-prob | |
| **Files:** `/backend/ablation_engine.py` (new) | |
| **Acceptance Criteria:** | |
| - Ablation runs in < 3s for single head mask | |
| - Surrogate predictor accuracy > 70% (train on 100 samples) | |
| - Queue system for background ablation execution | |
| **Notes:** | |
| --- | |
| #### 5.2 Frontend: Head Toggle Matrix | |
| - [ ] Create `/components/study/AblationView.tsx` | |
| - [ ] Display Layer Γ Head matrix with checkboxes | |
| - [ ] Show only top-20 heads (from Week 1-2 ranking) | |
| **Files:** `/components/study/AblationView.tsx` | |
| **Acceptance Criteria:** | |
| - Matrix renders in < 50ms | |
| - Checkboxes responsive | |
| - Selected heads highlighted | |
| **Notes:** | |
| --- | |
| #### 5.3 Frontend: Diff Viewer | |
| - [ ] Show unified diff between baseline and ablated output | |
| - [ ] Highlight changed tokens (color-coded: added/removed/modified) | |
| - [ ] Display code-aware metrics (tests passed, AST parse, lints) | |
| **Files:** `/components/study/AblationView.tsx` | |
| **Acceptance Criteria:** | |
| - Diff renders clearly | |
| - Metrics displayed prominently | |
| - Color-coding accessible (colorblind-friendly) | |
| **Notes:** | |
| --- | |
| #### 5.4 Frontend: Per-Token Delta Heat | |
| - [ ] Show Ξlog-prob and Ξentropy per token | |
| - [ ] Display as small multiples for most-impactful heads | |
| **Files:** `/components/study/AblationView.tsx` | |
| **Acceptance Criteria:** | |
| - Delta heat visible | |
| - Most-impactful heads identified (Ξlog-prob β₯ Ο_Ξ) | |
| **Notes:** | |
| --- | |
| #### 5.5 Integration with Attention View | |
| - [ ] Accept pinned sourceβtarget pairs from Attention view | |
| - [ ] Auto-suggest heads to ablate based on attention weights | |
| **Files:** `/components/study/AblationView.tsx` | |
| **Acceptance Criteria:** | |
| - Pinned pairs appear in Ablation pane | |
| - Suggested heads shown with explanation | |
| **Notes:** | |
| --- | |
| ### Week 5 Acceptance Criteria (Overall) | |
| - [ ] Ablation view functional | |
| - [ ] Head masking works correctly (verified with manual test) | |
| - [ ] Diff viewer shows meaningful changes | |
| - [ ] Code-aware metrics computed (AST, tests, lints) | |
| ### Blockers | |
| ### Decisions Made | |
| --- | |
| ## Week 6: Pipeline Visualization | |
| **Goal:** Build swimlane timeline with residual-z, entropy shift, and layer signals. | |
| **Status:** π΄ Not Started | |
| ### Tasks | |
| #### 6.1 Backend: Layer-Level Signals | |
| - [ ] Compute residual-norm z-scores | |
| - [ ] Compute entropy shift (pre vs post-layer) | |
| - [ ] Compute attention-flow saturation | |
| - [ ] Optional: router load for MoE models | |
| **Files:** `/backend/pipeline_analysis.py` (new) | |
| **Acceptance Criteria:** | |
| - Signals computed in < 50ms | |
| - Residual-z outliers flagged (> 2Ο) | |
| - Entropy shifts tracked per layer | |
| **Notes:** | |
| --- | |
| #### 6.2 Frontend: Swimlane Timeline | |
| - [ ] Create `/components/study/PipelineView.tsx` | |
| - [ ] Display lanes: Tokenizer β Embeddings β Layers β Logits β Sampler β Tests | |
| - [ ] Rectangle length = time per stage | |
| - [ ] Color intensity = uncertainty (entropy) | |
| **Files:** `/components/study/PipelineView.tsx` | |
| **Acceptance Criteria:** | |
| - Swimlane renders in < 100ms | |
| - Hover shows per-stage stats | |
| - Timeline scrubber works smoothly | |
| **Notes:** | |
| --- | |
| #### 6.3 Layer Signal Overlays | |
| - [ ] Add overlays for residual-z, entropy shift, attention saturation | |
| - [ ] Toggle visibility of each signal | |
| - [ ] Highlight bottlenecks (top-q percentile of latency/residual-z) | |
| **Files:** `/components/study/PipelineView.tsx` | |
| **Acceptance Criteria:** | |
| - Overlays don't clutter visualization | |
| - Bottlenecks clearly marked | |
| - Toggle responsive | |
| **Notes:** | |
| --- | |
| #### 6.4 Layer Bypass Interaction | |
| - [ ] Add controls to bypass β€2 layers | |
| - [ ] Show predicted impact (via surrogate) | |
| - [ ] Execute queued ablation | |
| **Files:** `/components/study/PipelineView.tsx` | |
| **Acceptance Criteria:** | |
| - Bypass controls accessible | |
| - Predicted impact shown before execution | |
| - Ablation queued in background | |
| **Notes:** | |
| --- | |
| #### 6.5 Cross-Links to Other Views | |
| - [ ] Click token β highlight in Attention and Token Confidence views | |
| - [ ] Integrated telemetry (track hover/click events) | |
| **Files:** `/components/study/PipelineView.tsx` | |
| **Acceptance Criteria:** | |
| - Cross-highlighting works | |
| - Telemetry logged | |
| **Notes:** | |
| --- | |
| ### Week 6 Acceptance Criteria (Overall) | |
| - [ ] Pipeline view functional | |
| - [ ] Layer signals computed correctly | |
| - [ ] Interactive latency < 150ms | |
| - [ ] Manual test with Code Llama 7B | |
| ### Blockers | |
| ### Decisions Made | |
| --- | |
| ## Week 7: Pilot Study (n=3) | |
| **Goal:** Run pilot with 3 participants; tune thresholds; validate latency; gather feedback. | |
| **Status:** π΄ Not Started | |
| ### Tasks | |
| #### 7.1 Recruit Pilot Participants | |
| - [ ] Identify 3 software engineers (varied experience levels) | |
| - [ ] Schedule 90-minute sessions | |
| **Acceptance Criteria:** | |
| - 3 participants confirmed | |
| - Availability scheduled | |
| **Notes:** | |
| --- | |
| #### 7.2 Prepare Study Materials | |
| - [ ] Task T1: Code completion (sanitize_sql_like) | |
| - [ ] Task T2: Bug fix (reverse_string) | |
| - [ ] Pre-survey (demographics, LLM familiarity) | |
| - [ ] Post-task mini-survey (SCS, Trust, NASA-TLX) | |
| - [ ] Interview questions | |
| **Files:** `/docs/pilot-study-materials.md` (new) | |
| **Acceptance Criteria:** | |
| - Materials ready to distribute | |
| - Survey forms created (Google Forms or similar) | |
| **Notes:** | |
| --- | |
| #### 7.3 Run Pilot Sessions | |
| - [ ] Session 1: Participant P01 | |
| - [ ] Session 2: Participant P02 | |
| - [ ] Session 3: Participant P03 | |
| **Acceptance Criteria:** | |
| - All 3 sessions completed | |
| - Telemetry logged | |
| - Surveys completed | |
| **Notes:** | |
| --- | |
| #### 7.4 Analyze Pilot Data & Tune Thresholds | |
| - [ ] Compute latency statistics (mean, p95) | |
| - [ ] Tune Ο_H (entropy threshold) for ~90% specificity | |
| - [ ] Tune Ο_Ξ (log-prob delta) for ablation sensitivity | |
| - [ ] Tune Ο_z (residual-norm outlier) | |
| **Files:** `/docs/pilot-analysis.md` (new) | |
| **Acceptance Criteria:** | |
| - Thresholds tuned based on pilot data | |
| - Latency < 250ms (if not, optimize) | |
| - Survey completion rate β₯ 90% | |
| **Notes:** | |
| --- | |
| #### 7.5 Iterate on UX | |
| - [ ] Add tooltips/warnings based on pilot feedback | |
| - [ ] Fix any UX issues (confusing interactions, unclear labels) | |
| - [ ] Update documentation | |
| **Acceptance Criteria:** | |
| - At least 2 UX improvements implemented | |
| - Pilot participants' feedback documented | |
| **Notes:** | |
| --- | |
| ### Week 7 Acceptance Criteria (Overall) | |
| - [ ] Pilot study completed successfully | |
| - [ ] Thresholds tuned | |
| - [ ] Latency validated (< 250ms) | |
| - [ ] UX improvements identified and implemented | |
| ### Blockers | |
| ### Decisions Made | |
| --- | |
| ## Week 8: Main Study Preparation | |
| **Goal:** Finalize study tooling, prepare OSF pre-registration, and set up participant recruitment. | |
| **Status:** π΄ Not Started | |
| ### Tasks | |
| #### 8.1 Survey Integration | |
| - [ ] Integrate SUS, NASA-TLX, SCS scales into dashboard | |
| - [ ] Add pre-survey and post-task mini-surveys | |
| - [ ] Export survey data to CSV | |
| **Files:** `/components/study/SurveyModal.tsx` (new) | |
| **Acceptance Criteria:** | |
| - Surveys embedded in dashboard | |
| - Data exported correctly | |
| **Notes:** | |
| --- | |
| #### 8.2 Latin Square Counterbalancing | |
| - [ ] Implement Latin square assignment for task order | |
| - [ ] Randomize condition order (Baseline vs Dashboard) | |
| **Files:** `/lib/study-randomization.ts` (new) | |
| **Acceptance Criteria:** | |
| - Counterbalancing correct (verified manually) | |
| - Participant assigned random ID (P01-P24) | |
| **Notes:** | |
| --- | |
| #### 8.3 OSF Pre-Registration | |
| - [ ] Complete OSF template (Appendix D from spec) | |
| - [ ] Upload task stimuli, exclusion criteria | |
| - [ ] Submit pre-registration | |
| **Files:** `/docs/osf-preregistration.md` (copy of Appendix D) | |
| **Acceptance Criteria:** | |
| - Pre-registration submitted before main study | |
| - DOI obtained | |
| **Notes:** | |
| --- | |
| #### 8.4 Export Artifact Bundle | |
| - [ ] Create script to package Run ID, tensors, telemetry | |
| - [ ] Generate `run_pack_P01.zip` for each participant | |
| - [ ] Test import into OSF | |
| **Files:** `/scripts/export_artifact.py` (new) | |
| **Acceptance Criteria:** | |
| - Export script functional | |
| - Bundle includes all necessary files | |
| - Bundle < 100MB per participant | |
| **Notes:** | |
| --- | |
| #### 8.5 Participant Recruitment | |
| - [ ] Prepare recruitment email | |
| - [ ] Post to developer communities (Reddit, HackerNews, university mailing lists) | |
| - [ ] Target n=18-24 participants | |
| **Acceptance Criteria:** | |
| - Recruitment materials ready | |
| - At least 10 participants confirmed | |
| **Notes:** | |
| --- | |
| ### Week 8 Acceptance Criteria (Overall) | |
| - [ ] Study tooling finalized | |
| - [ ] OSF pre-registration submitted | |
| - [ ] Participant recruitment underway | |
| - [ ] Ready to begin main study (Week 9-10) | |
| ### Blockers | |
| ### Decisions Made | |
| --- | |
| ## Progress Summary | |
| | Week | Status | Completion Date | Notes | | |
| |------|--------|----------------|-------| | |
| | Week 1-2: Instrumentation | π‘ In Progress | - | Started 2025-11-01 | | |
| | Week 3: Attention Viz | π΄ Not Started | - | - | | |
| | Week 4: Token Confidence Viz | π΄ Not Started | - | - | | |
| | Week 5: Ablation Viz | π΄ Not Started | - | - | | |
| | Week 6: Pipeline Viz | π΄ Not Started | - | - | | |
| | Week 7: Pilot Study | π΄ Not Started | - | - | | |
| | Week 8: Main Study Prep | π΄ Not Started | - | - | | |
| **Legend:** | |
| - π’ Completed | |
| - π‘ In Progress | |
| - π΄ Not Started | |
| - π΅ Blocked | |
| --- | |
| ## Global Blockers | |
| *None currently* | |
| --- | |
| ## Key Metrics (Target vs Actual) | |
| | Metric | Target | Actual | Status | | |
| |--------|--------|--------|--------| | |
| | Initial render latency (β€512 tokens) | < 250ms | - | - | | |
| | Interactive update latency | < 150ms | - | - | | |
| | Zarr file size (512 tokens, 32 layers) | < 500MB | - | - | | |
| | Zarr load time (single layer/head) | < 50ms | - | - | | |
| | Attention rollout computation | < 100ms | - | - | | |
| | Ablation execution time | < 3s | - | - | | |
| --- | |
| ## Notes & Decisions Log | |
| ### 2025-11-01 | |
| - **Decision:** Using zarr instead of HDF5 for tensor storage due to better chunking and parallel access. | |
| - **Decision:** Targeting top-k=20 heads for ablation UI (performance constraint). | |
| - **Note:** Started Week 1-2 instrumentation tasks. | |
| --- | |
| **End of Implementation Tracker** | |