Spaces:

cdpearlman
/

LLMVis

Sleeping

App Files Files Community

cdpearlman commited on Mar 2

Commit

11aaea3

1 Parent(s): c629c1f

ContextKit LLM memory update

Browse files

Files changed (19) hide show

.context/data/decisions.md +28 -0
.context/data/lessons.md +22 -0
.context/data/sessions.md +10 -0
.context/modules/architecture.md +62 -0
.context/modules/conventions.md +70 -0
.context/modules/education.md +40 -0
.context/modules/product.md +48 -0
.context/modules/testing.md +59 -0
.cursor/rules/AGENTS.md +68 -0
.cursor/rules/minimal_changes.mdc +0 -56
README.md +1 -8
conductor/code_styleguides/python.md +0 -37
conductor/index.md +0 -14
conductor/product-guidelines.md +0 -23
conductor/product.md +0 -32
conductor/setup_state.json +0 -1
conductor/tech-stack.md +0 -15
conductor/tracks.md +0 -3
conductor/workflow.md +0 -333

.context/data/decisions.md ADDED Viewed

	@@ -0,0 +1,28 @@

+# Decision Record
+<!-- Append-only. Record significant decisions with reasoning. -->
+<!-- Format:
+## [Decision title]
+**Date**: YYYY-MM-DD
+**Context**: What prompted this decision
+**Options considered**: What alternatives were evaluated
+**Decision**: What was chosen
+**Reasoning**: Why
+**Revisit if**: Conditions that would warrant reconsidering
+-->
+## Educational depth: conceptual over mathematical
+**Date**: 2026-03-02
+**Context**: Target audience includes people without full college-level CS/math education
+**Options considered**: (1) Full mathematical rigor, (2) Conceptual understanding with simplified math, (3) No math at all
+**Decision**: Conceptual understanding with simplified math — skip complex derivations, focus on motivation and intuition
+**Reasoning**: The goal is building correct mental models, not producing textbook-ready proofs. Accurate simplification serves the audience better than intimidating formalism.
+**Revisit if**: Audience shifts to researchers or grad students who need full rigor
+## Chatbot backend: OpenRouter
+**Date**: 2026-03-02
+**Context**: Chatbot needed an LLM backend; previously used Gemini
+**Options considered**: Gemini API, OpenRouter (multi-model), direct OpenAI
+**Decision**: OpenRouter — provides access to multiple models through a single API
+**Reasoning**: Flexibility to switch underlying models without code changes; single API key
+**Revisit if**: OpenRouter pricing becomes prohibitive or a specific model provider offers significantly better educational responses

.context/data/lessons.md ADDED Viewed

	@@ -0,0 +1,22 @@

+# Lessons Learned
+<!-- Append-only. Record what the team learned the hard way. -->
+<!-- Format:
+## YYYY-MM-DD — [Brief title]
+**What happened**: What went wrong or what was discovered
+**Root cause**: Why it happened
+**Fix**: What was done about it
+**Rule going forward**: What to do (or avoid) in the future
+-->
+## 2026-03-02 — Dead code accumulation during refactors
+**What happened**: Large component changes left hundreds of lines of orphaned code from deprecated or deleted components
+**Root cause**: Refactors focused only on building the new thing without cleaning up what the old thing left behind
+**Fix**: Manual cleanup after discovering the bloat
+**Rule going forward**: Every refactor must include a dead code sweep. This is a first-class concern, not an afterthought.
+## 2026-03-02 — Tunnel vision on implementation details
+**What happened**: Going deep on implementation rabbit holes produced outputs that weren't actually useful for the educational goal
+**Root cause**: Losing sight of the "is this useful for teaching?" question while focused on technical correctness
+**Fix**: Stepped back and re-evaluated against the educational mission
+**Rule going forward**: Sanity check every significant change: (1) Does this help someone understand transformers? (2) Is this accurate enough for correct intuition? (3) Am I in a rabbit hole?

.context/data/sessions.md ADDED Viewed

	@@ -0,0 +1,10 @@

+# Session Log
+<!-- Append-only. Add a new entry after each substantive work session. -->
+## 2026-03-02 — Bootstrap
+**Area**: Project setup
+**Work done**: Ran ContextKit bootstrap interview, generated memory system
+**Decisions made**: Established educational philosophy (conceptual understanding over math rigor), dead code cleanup as mandatory during refactors, agent behavior (push back on bad ideas, no yes-man behavior)
+**Memory created**: architecture.md, conventions.md, education.md, testing.md, sessions.md, decisions.md, lessons.md
+**Open threads**: None — ready for feature work

.context/modules/architecture.md ADDED Viewed

	@@ -0,0 +1,62 @@

+# Architecture
+## System Overview
+A Plotly Dash single-page application that visualizes transformer LLM internals and enables interactive experimentation. Users select a model, enter a prompt, and explore a five-stage pipeline: Tokenization → Embedding → Attention → MLP → Output.
+## Component Map
+```
+app.py                    # Entry point. Dash layout + all callbacks (~1450 lines)
+├── components/
+│   ├── sidebar.py        # Collapsible left panel: glossary, attention/block/norm dropdowns
+│   ├── model_selector.py # Model dropdown + prompt textarea + generation settings
+│   ├── pipeline.py       # 5-stage expandable pipeline with flow indicators
+│   ├── investigation_panel.py  # Tabs: Ablation and Token Attribution
+│   ├── ablation_panel.py      # Head selection, run ablation, original vs ablated comparison
+│   ├── chatbot.py        # Floating chat icon + window + RAG-aware conversation
+│   └── glossary.py       # Modal with transformer terms and video links
+├── utils/
+│   ├── model_patterns.py     # Model loading, forward pass, head ablation, bertviz, logit lens
+│   ├── model_config.py       # Model family definitions, module templates, auto-selections
+│   ├── head_detection.py     # Categorize heads (Previous Token, Induction, etc.)
+│   ├── beam_search.py        # Beam search with optional ablation hooks
+│   ├── token_attribution.py  # Integrated Gradients and simple gradient attribution
+│   ├── ablation_metrics.py   # KL divergence, sequence scoring, token probability deltas
+│   ├── openrouter_client.py  # OpenRouter API client for chat + embeddings
+│   ├── rag_utils.py          # RAG: load/chunk rag_docs/, embed, retrieve
+│   └── head_categories.json  # Static head category definitions
+├── assets/
+│   ├── style.css         # Custom styling (Bootstrap-compatible)
+│   └── chat_resize.js    # Client-side chat window resize
+├── rag_docs/             # ~30 markdown files: chatbot knowledge base
+├── tests/                # pytest suite (~12 test files)
+└── scripts/
+    └── analyze_heads.py  # One-off analysis script
+```
+## Data Flow
+1. **User selects model** → `model_patterns.load_model()` downloads/caches HF model
+2. **User enters prompt** → Forward pass captures activations at each pipeline stage
+3. **Pipeline renders** → Each stage shows its visualization (tokens, embeddings, attention maps, MLP, logits)
+4. **Beam search** → `beam_search.perform_beam_search()` generates continuations with top-k display
+5. **Experiments** → Ablation disables selected heads and re-runs; Attribution computes token importance via gradients
+6. **Chatbot** → User question → RAG retrieval from `rag_docs/` → OpenRouter API → streamed response
+## State Management
+Dash `dcc.Store` components hold session state: activations, patterns, beam results, ablation state. No server-side session persistence — everything is per-page-load.
+## Deployment
+- Dockerfile targeting Hugging Face Spaces (port 7860)
+- `.env` for `OPENROUTER_API_KEY` (not committed)
+- Models cached locally on first load
+## Key Boundaries
+- **components/** only builds Dash layout — no ML logic
+- **utils/** handles all computation — no Dash imports
+- **app.py** is the glue: callbacks wire components to utils
+- **rag_docs/** is the chatbot's knowledge base — edit these to change what the chatbot knows

.context/modules/conventions.md ADDED Viewed

	@@ -0,0 +1,70 @@

+# Conventions
+## Python Style
+Google Python Style Guide conventions:
+- `snake_case` for functions, methods, variables, modules
+- `PascalCase` for classes
+- `ALL_CAPS` for constants
+- 4-space indentation, 80-char line length
+- Type hints on public APIs
+- Docstrings on public functions/classes (`Args:`, `Returns:`, `Raises:`)
+- f-strings for formatting
+- Group imports: stdlib → third-party → local
+- Run `pylint` to catch bugs and style issues
+- No mutable default arguments (`[]`, `{}`)
+- Use implicit false (`if not my_list:`) and `if foo is None:` for None checks
+## Code Hygiene
+- **Dead code cleanup is mandatory during refactors.** Every refactor must include a sweep for orphaned code from deprecated or deleted components. This is a recurring problem — treat it as a first-class concern.
+- Prefer small, surgical edits over broad rewrites.
+- Reuse existing files before creating new modules.
+- Remove or simplify unnecessary code only when it reduces complexity.
+- Add concise comments explaining intent only where the change is non-obvious.
+- Do not reformat unrelated code or alter indentation styles.
+## Naming & Organization
+- Components go in `components/` — UI layout only, no ML logic
+- Utilities go in `utils/` — computation only, no Dash imports
+- Tests go in `tests/` with `test_` prefix matching the module they test
+- RAG knowledge goes in `rag_docs/` as markdown files
+## Commit Messages
+Format: `<type>(<scope>): <description>`
+Types: `feat`, `fix`, `docs`, `style`, `refactor`, `test`, `chore`
+Examples:
+- `feat(ablation): Add multi-layer head selection`
+- `fix(pipeline): Correct embedding stage token count`
+- `refactor(utils): Remove unused activation caching code`
+## Error Handling
+- Use built-in exception classes
+- No bare `except:` clauses
+- User-facing errors should be clear and non-technical
+## Dependencies
+- Avoid adding new dependencies unless strictly needed
+- Document any new dependency in requirements.txt with minimum version
+## Dash-Specific
+- Callbacks must remain responsive — avoid heavy synchronous work without feedback indicators
+- Use `dcc.Store` for session state; no server-side persistence
+## Quality Gates
+Before marking any task complete:
+- All tests pass
+- Code coverage >80% for new code
+- Follows style guide
+- Public functions/methods have docstrings
+- Type hints on public APIs
+- No linting errors
+- No security vulnerabilities (no hardcoded secrets, input validation present)

.context/modules/education.md ADDED Viewed

	@@ -0,0 +1,40 @@

+# Educational Philosophy
+## Audience
+Primary: ML students and AI enthusiasts — people who are curious but may not have a full college-level CS/math background.
+Secondary: Educators looking for interactive tools to teach transformer concepts.
+## Core Principles
+### Conceptual Understanding Over Mathematical Rigor
+It is acceptable to skip complex math (e.g., full derivations of scaled dot-product attention) as long as the motivation and intuition are clearly communicated. The goal is "I understand what this does and why" not "I can derive this from scratch."
+### Action-Oriented Learning
+Every architectural explanation should be paired with an interactive element. Don't just tell — let users poke at things and see what happens.
+### Progressive Disclosure
+- **Surface level**: Clean interface, minimal jargon, tooltips for technical terms
+- **Mid level**: In-situ descriptions followed by interactive examples
+- **Deep level**: Glossary entries, video links, chatbot for open-ended questions
+### Framing
+- Speak to curiosity: "What happens if...?" and "How does this work?"
+- Tone is enthusiastic and accessible but concise — no walls of text
+- Frame experiments as hypothesis testing: "What if I disable this head?"
+## Sanity Check Rule
+Before shipping any educational content or visualization change, ask:
+1. Does this actually help someone understand transformers better?
+2. Is this accurate enough to build correct intuition (even if simplified)?
+3. Am I going down a rabbit hole, or is this genuinely useful?
+Tunnel vision leads to bad outputs. Step back regularly.
+## Visual Consistency
+- Attention head indices, layer numbers, and token highlights must be consistent across all panels
+- Use consistent color language for different components (attention vs MLP)
+- Support both light and dark modes with high contrast for data visualizations

.context/modules/product.md ADDED Viewed

	@@ -0,0 +1,48 @@

+# Product Definition
+## Vision
+Demystify the inner workings of Transformer-based LLMs for students and curious individuals. Combine interactive visualizations with hands-on experimentation to transform abstract architectural concepts into tangible, observable phenomena.
+## Core Value Proposition
+- **Visual Learning**: Translate complex matrix operations and data flows into clear, interactive representations (attention maps, logit lens)
+- **Interactive Experimentation**: Go beyond observation — let users manipulate the model (ablation, activation patching) and immediately see consequences
+- **Educational Scaffolding**: Support varying expertise levels with layered content, from tooltips to deep-dive glossaries to AI-guided chat
+## Key Features
+- **Sequential Data Flow Visualization**: Step-by-step data transformation through model layers
+- **Component Breakdown**: Detailed inspection views for self-attention (heads, weights) and MLPs
+- **Interactive Experiments**:
+  - Ablation studies: selectively disable heads/layers to observe output impact
+  - Activation steering: modify activation values in real-time
+  - Prompt comparison: compare internal activations from different inputs side-by-side
+- **Integrated Education**:
+  - Contextual tooltips for immediate clarity
+  - Glossary panel with in-depth definitions and video links
+  - AI chatbot with RAG-powered knowledge base (30 docs covering transformer concepts, usage, experiments, troubleshooting, interpretability)
+  - Step-by-step guided experiments for beginners
+## Brand & Voice
+- **Tone**: Enthusiastic and accessible yet concise. Encouraging to learners while remaining direct and functional.
+- **Framing**: Speak to curiosity — "How does this work?" and "What happens if...?"
+- Avoid excessive jargon or long analogies. Prioritize clarity.
+## Visual Identity
+- **Aesthetic**: Clean & modern. High whitespace, legible typography, clear visual hierarchy.
+- **Modes**: Light and dark, both with high contrast for data visualizations.
+- **Color Palette**: Consistent color language for different model components (e.g., specific colors for attention vs MLP) to aid mental mapping.
+## UI Patterns
+- **Progressive Disclosure**: Tooltips for brief context, in-situ descriptions paired with interactive examples, glossary/chatbot for depth.
+- **Sandbox Explorer**: Comprehensive control panel for free-form exploration (toggles, sliders, ablation switches).
+- **Comparison View**: Integrated into the sandbox so users see modification impact relative to original state.
+- **On-Demand Depth**: Keep the primary interface simple with clear paths to dive deeper.
+## User Experience
+The interface centers on exploration and clarity. Users start by selecting a model and inputting text. The dashboard unfolds the model's processing pipeline, letting users zoom into specific components. Experimentation modes are clearly distinguished: hypothesize ("What if I turn off this head?") and test. Educational resources are omnipresent but non-intrusive — available on-demand.

.context/modules/testing.md ADDED Viewed

	@@ -0,0 +1,59 @@

+# Testing
+## Approach
+Test-Driven Development (TDD) for all backend/utils logic:
+1. Write failing tests that define expected behavior
+2. Implement minimum code to pass
+3. Refactor with confidence
+## What to Test
+- All `utils/` modules — these contain the core logic
+- Model configuration and pattern matching
+- Ablation metrics and scoring
+- Head detection and categorization
+- Beam search behavior
+- Token attribution computations
+- OpenRouter client (mock external calls)
+## What NOT to Test
+- UI/frontend layout changes (components/ files)
+- Trivial additions and documentation
+- CSS and JavaScript assets
+## Framework & Conventions
+- **Framework**: pytest
+- **Location**: `tests/` directory
+- **Naming**: `test_<module_name>.py` matching the module in `utils/`
+- **Fixtures**: Defined in `conftest.py` for shared test state
+- **Mocking**: Mock external dependencies (API calls, model loading when appropriate)
+- **Coverage target**: >80% for new code
+## Running Tests
+```bash
+pytest                           # Run all tests
+pytest tests/test_<name>.py      # Run specific test file
+pytest --cov=utils --cov-report=html  # With coverage
+```
+## When Tests Are Required
+- New utility functions or modules
+- Bug fixes (write a test that reproduces the bug first)
+- Changes to computation logic
+- Refactors that touch testable behavior
+## When Tests Are Optional
+- Pure UI/layout changes
+- Documentation updates
+- Configuration changes
+## After Every Change
+- Run `pytest` to verify all tests still pass
+- If tests fail, iterate on debugging until fixed — don't move on with broken tests

.cursor/rules/AGENTS.md ADDED Viewed

	@@ -0,0 +1,68 @@

+---
+description:
+globs:
+  - "**/*.py"
+  - "**/*.md"
+  - "app.py"
+  - "components/**/*.py"
+  - "utils/**/*.py"
+alwaysApply: true
+---
+# Transformer Explanation Dashboard
+Interactive Dash app for exploring LLM internals through visualization and experimentation. Built with Python, Dash, PyTorch, HF Transformers, Bertviz, pyvene.
+## Critical Rules
+- **Dead code cleanup is mandatory during refactors.** Sweep for orphaned code from deprecated/deleted components every time.
+- **Conceptual understanding over math rigor.** Skip complex derivations; focus on motivation and intuition that builds correct mental models.
+- **Sanity check every change:** (1) Does this help someone learn about transformers? (2) Is the simplification accurate enough? (3) Am I in a rabbit hole?
+- **TDD for backend logic.** Write failing tests first for anything in `utils/`. Skip tests for UI-only changes. Run `pytest` after every change; iterate until green.
+- **Don't run the app.** Describe manual verification steps; the user will test themselves.
+- **Surgical edits over rewrites.** Reuse existing files. Only create new modules when existing ones can't be extended.
+- **No new dependencies** unless strictly necessary.
+- **Push back on bad ideas.** Think through problems fully. Don't be a yes-man — challenge flawed approaches.
+- **Components = layout only** (no ML logic). **Utils = computation only** (no Dash imports). **app.py = glue.**
+- **Dash callbacks must stay responsive.** No heavy sync work in callbacks without feedback indicators.
+- Don't reformat unrelated code or alter indentation styles.
+- Check for zombie processes before debugging server errors.
+## Module Map
+| Module | Path | Load when |
+|--------|------|-----------|
+| Product | `.context/modules/product.md` | Understanding vision, features, brand voice, visual identity, UX patterns |
+| Architecture | `.context/modules/architecture.md` | Understanding system structure, data flow, or component boundaries |
+| Conventions | `.context/modules/conventions.md` | Writing or reviewing code style, naming, commits, dead code cleanup |
+| Education | `.context/modules/education.md` | Creating or editing educational content, visualizations, explanations |
+| Testing | `.context/modules/testing.md` | Writing tests, running pytest, TDD workflow |
+## Data Files
+| File | Path | Purpose |
+|------|------|---------|
+| Sessions | `.context/data/sessions.md` | Running work log (append-only) |
+| Decisions | `.context/data/decisions.md` | Decision records with reasoning (append-only) |
+| Lessons | `.context/data/lessons.md` | Hard-won knowledge and past mistakes (append-only) |
+## Memory Maintenance
+Always look for opportunities to update the memory system:
+- **New patterns**: "We've been doing X consistently — should I add it to conventions?"
+- **Decisions made**: "We decided Y — should I record this in decisions.md?"
+- **Mistakes caught**: "This went wrong because Z — should I add it to lessons.md?"
+- **Scope changes**: "The project now includes W — should I create a new module?"
+**Before any memory update**:
+1. State which file(s) would change and what the change would be
+2. Wait for approval
+3. Never update memory mid-task without mentioning it
+**Rules**:
+- Data files are append-only — add entries, never remove or overwrite past entries
+- Modules can be edited but changes should be targeted, not full rewrites
+- After substantive work sessions, append a summary to `.context/data/sessions.md`
+## Preferences
+- Don't ask permission for changes that fall within an approved plan — just execute
+- Commit normal changes to main; feature branches for major components/refactors. Never merge branches.
+- Keep `todo.md` and `plans.md` current before/after changes. Tasks should be atomic.
+- When in doubt, research options and make a minimal reasonable choice, noting it in `todo.md`
+- Explain manual tests clearly — what to look for, expected behavior, where to check

.cursor/rules/minimal_changes.mdc DELETED Viewed

@@ -1,56 +0,0 @@
----
-description:
-globs:
-  - "**/*.py"
-  - "**/*.md"
-  - "app.py"
-  - "components/**/*.py"
-  - "utils/**/*.py"
-alwaysApply: true
----
-# Minimal Change Rules
-- Testing & verification:
-  - For substantial code changes (new files, new functionality), write tests first in `tests/` that describe expected behavior.
-  - Skip tests for UI/frontend changes, trivial additions, and documentation.
-  - After implementing changes, run `pytest` to verify all tests pass.
-  - If tests fail, iterate on debugging until fixed.
-- Plan first:
-  - Update `todo.md` with the smallest next actions tied to `plans.md`.
-  - Keep tasks atomic and check them off as you go.
-  - Use the `conductor` folder to learn about the project. Maintain this folder after every change to the code in order to keep running memory (only make changes if necessary).
-- Keep edits minimal:
-  - Prefer small, surgical changes over refactors.
-  - Reuse existing files: `app.py`, `components/sidebar.py`, `components/main_panel.py`, `utils/*`.
-  - Remove or simplify clearly unnecessary code only when it reduces complexity.
-- Comments & style:
-  - Add concise comments explaining intent where the change is non-obvious.
-  - Do not reformat unrelated code or alter indentation styles.
-- Git workflow:
-  - After each coherent set of changes:
-    - git commit -am "[short, concise, and helpful message about what was done]"
-  - Between features:
-    - git push
-    - git checkout -b feature/<short-name>
-  - Never merge branches.
-- Ongoing planning:
-  - Keep `todo.md` current before/after each change.
-  - Update `plans.md` if scope/ideas evolve.
-- Research as needed:
-  - If details are unclear (e.g., detection thresholds), research your options and make a minimal reasonable choice and note it in `todo.md`.
-  # Guardrails
-- Avoid adding new dependencies unless strictly needed.
-- Avoid creating new modules/components unless existing ones cannot be cleanly extended.
-- Ensure Dash callbacks remain responsive; avoid heavy sync work in callbacks without feedback indicators.
-# Debugging
-- Sometimes a zombie process can cause errors. Check for zombie processes and kill them if necessary.

README.md CHANGED Viewed

@@ -80,11 +80,4 @@ The project is structured around a central Dash application with modular compone
     *   `head_detection.py`: Attention head categorization logic.
     *   `beam_search.py`: Beam search implementation.
 *   `tests/`: Comprehensive test suite ensuring stability.
-*   `conductor/`: Detailed project documentation and product guidelines.
-## Documentation
-Additional project documentation is available in the `conductor/` directory:
-*   [Product Definition](conductor/product.md)
-*   [Tech Stack](conductor/tech-stack.md)
-*   [Workflow](conductor/workflow.md)

     *   `head_detection.py`: Attention head categorization logic.
     *   `beam_search.py`: Beam search implementation.
 *   `tests/`: Comprehensive test suite ensuring stability.
+*   `.context/`: Project memory — modules (architecture, conventions, education, product, testing) and data files (sessions, decisions, lessons).

conductor/code_styleguides/python.md DELETED Viewed

@@ -1,37 +0,0 @@
-# Google Python Style Guide Summary
-This document summarizes key rules and best practices from the Google Python Style Guide.
-## 1. Python Language Rules
-- **Linting:** Run `pylint` on your code to catch bugs and style issues.
-- **Imports:** Use `import x` for packages/modules. Use `from x import y` only when `y` is a submodule.
-- **Exceptions:** Use built-in exception classes. Do not use bare `except:` clauses.
-- **Global State:** Avoid mutable global state. Module-level constants are okay and should be `ALL_CAPS_WITH_UNDERSCORES`.
-- **Comprehensions:** Use for simple cases. Avoid for complex logic where a full loop is more readable.
-- **Default Argument Values:** Do not use mutable objects (like `[]` or `{}`) as default values.
-- **True/False Evaluations:** Use implicit false (e.g., `if not my_list:`). Use `if foo is None:` to check for `None`.
-- **Type Annotations:** Strongly encouraged for all public APIs.
-## 2. Python Style Rules
-- **Line Length:** Maximum 80 characters.
-- **Indentation:** 4 spaces per indentation level. Never use tabs.
-- **Blank Lines:** Two blank lines between top-level definitions (classes, functions). One blank line between method definitions.
-- **Whitespace:** Avoid extraneous whitespace. Surround binary operators with single spaces.
-- **Docstrings:** Use `"""triple double quotes"""`. Every public module, function, class, and method must have a docstring.
-  - **Format:** Start with a one-line summary. Include `Args:`, `Returns:`, and `Raises:` sections.
-- **Strings:** Use f-strings for formatting. Be consistent with single (`'`) or double (`"`) quotes.
-- **`TODO` Comments:** Use `TODO(username): Fix this.` format.
-- **Imports Formatting:** Imports should be on separate lines and grouped: standard library, third-party, and your own application's imports.
-## 3. Naming
-- **General:** `snake_case` for modules, functions, methods, and variables.
-- **Classes:** `PascalCase`.
-- **Constants:** `ALL_CAPS_WITH_UNDERSCORES`.
-- **Internal Use:** Use a single leading underscore (`_internal_variable`) for internal module/class members.
-## 4. Main
-- All executable files should have a `main()` function that contains the main logic, called from a `if __name__ == '__main__':` block.
-**BE CONSISTENT.** When editing code, match the existing style.
-*Source: [Google Python Style Guide](https://google.github.io/styleguide/pyguide.html)*

conductor/index.md DELETED Viewed

@@ -1,14 +0,0 @@
-# Project Context
-## Definition
-- [Product Definition](./product.md)
-- [Product Guidelines](./product-guidelines.md)
-- [Tech Stack](./tech-stack.md)
-## Workflow
-- [Workflow](./workflow.md)
-- [Code Style Guides](./code_styleguides/)
-## Management
-- [Tracks Registry](./tracks.md)
-- [Tracks Directory](./tracks/)

conductor/product-guidelines.md DELETED Viewed

@@ -1,23 +0,0 @@
-# Product Guidelines
-## Brand & Voice
-- **Tone:** Enthusiastic and Accessible yet Concise. The voice should be encouraging to learners while remaining direct and functional. Avoid excessive jargon or overly long analogies; prioritize clarity and "get-to-the-point" descriptions.
-- **Audience Engagement:** Speak directly to the user's curiosity. Frame technical explanations as answers to "How does this work?" or "What happens if...?"
-## Visual Identity
-- **Aesthetic:** Clean & Modern. Prioritize high whitespace, legible typography, and a clear visual hierarchy (inspired by Material Design or Notion).
-- **Mode:** Support both Light and Dark modes, ensuring high contrast for data visualizations.
-- **Color Palette:** Use a consistent color language for different model components (e.g., specific colors for Attention vs. MLP layers) to aid mental mapping.
-## User Interface & Experience
-- **Terminology & Disclosure:** Use a combination of Progressive Disclosure and In-Situ Definitions.
-    - **Tooltips:** Use tooltips for most technical terms to provide immediate, brief context without cluttering the UI.
-    - **In-Situ Descriptions:** Provide short, clear descriptions immediately followed by the relevant interactive example to solidify the concept through action.
-- **Experimentation Layout:** Sandbox Explorer.
-    - Provide a comprehensive control panel for free-form exploration (toggles, sliders, ablation switches).
-    - **Comparison View:** Integrate comparison elements into the sandbox so users can see the impact of their modifications relative to the original state.
-## Design Principles
-- **Action-Oriented Learning:** Every architectural explanation should be paired with an interactive element.
-- **Visual Consistency:** Ensure that attention head indices, layer numbers, and token highlights are consistent across all panels and visualization types.
-- **On-Demand Depth:** Keep the primary interface simple, but provide clear paths (like the Glossary or tooltips) for users who want to dive deeper into the technicalities.

conductor/product.md DELETED Viewed

@@ -1,32 +0,0 @@
-# Product Definition
-## Initial Concept
-A tool for capturing activations from transformer models and visualizing attention patterns using bertviz and an interactive Dash web application.
-## Vision
-To demystify the inner workings of Transformer-based Large Language Models (LLMs) for students and curious individuals. By combining interactive visualizations with hands-on experimentation capabilities, the tool transforms abstract architectural concepts into tangible, observable phenomena, fostering a deep, intuitive understanding of how these powerful models process information.
-## Target Audience
-- **Primary:** Machine Learning Students and AI enthusiasts.
-- **Secondary:** Any individual seeking a practical, interactive way to learn about Transformer architectures and mechanical interpretability.
-## Core Value Proposition
-- **Visual Learning:** Translates complex matrix operations and data flows into clear, interactive visual representations (Attention Maps, Logit Lens).
-- **Interactive Experimentation:** Goes beyond static observation by allowing users to manipulate the model (Ablation, Activation Patching) and immediately see the consequences.
-- **Educational Scaffolding:** Supports users of varying expertise levels with layered educational content, from simple tooltips to deep-dive glossaries and future AI-guided tutorials.
-## Key Features
-- **Sequential Data Flow Visualization:** Illustrates how data transforms step-by-step through the model's layers.
-- **Component Breakdown:** Detailed inspection views for key components like Self-Attention (heads, weights) and MLPs.
-- **Interactive Experiments:**
-    - **Ablation Studies:** selectively disable heads or layers to observe impact on output.
-    - **Activation Steering:** modify activation values in real-time.
-    - **Prompt Comparison:** compare internal activations resulting from two different input prompts side-by-side.
-- **Integrated Education:**
-    - Contextual tooltips for immediate clarity.
-    - Dedicated "Glossary" panel for in-depth definitions.
-    - AI chatbot with RAG-powered knowledge base (30 documents covering transformer concepts, dashboard usage, guided experiments, result interpretation, troubleshooting, and mechanistic interpretability research).
-    - Step-by-step guided experiments that walk beginners through the dashboard's features.
-## User Experience
-The interface centers on exploration and clarity. Users start by selecting a model and inputting text. The dashboard then unfolds the model's processing pipeline, allowing users to "zoom in" on specific components. Experimentation modes are clearly distinguished, enabling users to hypothesize ("What if I turn off this head?") and test. Educational resources are omnipresent but non-intrusive, available on-demand to explain the *what* and *why* of what is being visualized.

conductor/setup_state.json DELETED Viewed

	@@ -1 +0,0 @@
1	- {"last_successful_step": "3.3_initial_track_generated"}

conductor/tech-stack.md DELETED Viewed

@@ -1,15 +0,0 @@
-# Tech Stack
-## Core Technologies
-- **Programming Language:** Python
-- **Deep Learning Framework:** PyTorch & Hugging Face Transformers
-## Frontend & Visualization
-- **Web Framework:** Plotly Dash
-- **Data Visualization:** Plotly
-- **Attention Visualization:** Bertviz
-- **Styling:** Custom CSS (Bootstrap-compatible)
-## Interpretability & Research Tools
-- **Activation Capture:** PyVene
-- **Model Analysis:** Custom utilities for ablation and logit lens analysis

conductor/tracks.md DELETED Viewed

@@ -1,3 +0,0 @@
-# Project Tracks
-This file tracks all major tracks for the project. Each track has its own detailed plan in its respective folder.

conductor/workflow.md DELETED Viewed

@@ -1,333 +0,0 @@
-# Project Workflow
-## Guiding Principles
-1. **The Plan is the Source of Truth:** All work must be tracked in `plan.md`
-2. **The Tech Stack is Deliberate:** Changes to the tech stack must be documented in `tech-stack.md` *before* implementation
-3. **Test-Driven Development:** Write unit tests before implementing functionality
-4. **High Code Coverage:** Aim for >80% code coverage for all modules
-5. **User Experience First:** Every decision should prioritize user experience
-6. **Non-Interactive & CI-Aware:** Prefer non-interactive commands. Use `CI=true` for watch-mode tools (tests, linters) to ensure single execution.
-## Task Workflow
-All tasks follow a strict lifecycle:
-### Standard Task Workflow
-1. **Select Task:** Choose the next available task from `plan.md` in sequential order
-2. **Mark In Progress:** Before beginning work, edit `plan.md` and change the task from `[ ]` to `[~]`
-3. **Write Failing Tests (Red Phase):**
-   - Create a new test file for the feature or bug fix.
-   - Write one or more unit tests that clearly define the expected behavior and acceptance criteria for the task.
-   - **CRITICAL:** Run the tests and confirm that they fail as expected. This is the "Red" phase of TDD. Do not proceed until you have failing tests.
-4. **Implement to Pass Tests (Green Phase):**
-   - Write the minimum amount of application code necessary to make the failing tests pass.
-   - Run the test suite again and confirm that all tests now pass. This is the "Green" phase.
-5. **Refactor (Optional but Recommended):**
-   - With the safety of passing tests, refactor the implementation code and the test code to improve clarity, remove duplication, and enhance performance without changing the external behavior.
-   - Rerun tests to ensure they still pass after refactoring.
-6. **Verify Coverage:** Run coverage reports using the project's chosen tools. For example, in a Python project, this might look like:
-   ```bash
-   pytest --cov=app --cov-report=html
-   ```
-   Target: >80% coverage for new code. The specific tools and commands will vary by language and framework.
-7. **Document Deviations:** If implementation differs from tech stack:
-   - **STOP** implementation
-   - Update `tech-stack.md` with new design
-   - Add dated note explaining the change
-   - Resume implementation
-8. **Commit Code Changes:**
-   - Stage all code changes related to the task.
-   - Propose a clear, concise commit message e.g, `feat(ui): Create basic HTML structure for calculator`.
-   - Perform the commit.
-9. **Attach Task Summary with Git Notes:**
-   - **Step 9.1: Get Commit Hash:** Obtain the hash of the *just-completed commit* (`git log -1 --format="%H"`).
-   - **Step 9.2: Draft Note Content:** Create a detailed summary for the completed task. This should include the task name, a summary of changes, a list of all created/modified files, and the core "why" for the change.
-   - **Step 9.3: Attach Note:** Use the `git notes` command to attach the summary to the commit.
-     ```bash
-     # The note content from the previous step is passed via the -m flag.
-     git notes add -m "<note content>" <commit_hash>
-     ```
-10. **Get and Record Task Commit SHA:**
-    - **Step 10.1: Update Plan:** Read `plan.md`, find the line for the completed task, update its status from `[~]` to `[x]`, and append the first 7 characters of the *just-completed commit's* commit hash.
-    - **Step 10.2: Write Plan:** Write the updated content back to `plan.md`.
-11. **Commit Plan Update:**
-    - **Action:** Stage the modified `plan.md` file.
-    - **Action:** Commit this change with a descriptive message (e.g., `conductor(plan): Mark task 'Create user model' as complete`).
-### Phase Completion Verification and Checkpointing Protocol
-**Trigger:** This protocol is executed immediately after a task is completed that also concludes a phase in `plan.md`.
-1.  **Announce Protocol Start:** Inform the user that the phase is complete and the verification and checkpointing protocol has begun.
-2.  **Ensure Test Coverage for Phase Changes:**
-    -   **Step 2.1: Determine Phase Scope:** To identify the files changed in this phase, you must first find the starting point. Read `plan.md` to find the Git commit SHA of the *previous* phase's checkpoint. If no previous checkpoint exists, the scope is all changes since the first commit.
-    -   **Step 2.2: List Changed Files:** Execute `git diff --name-only <previous_checkpoint_sha> HEAD` to get a precise list of all files modified during this phase.
-    -   **Step 2.3: Verify and Create Tests:** For each file in the list:
-        -   **CRITICAL:** First, check its extension. Exclude non-code files (e.g., `.json`, `.md`, `.yaml`).
-        -   For each remaining code file, verify a corresponding test file exists.
-        -   If a test file is missing, you **must** create one. Before writing the test, **first, analyze other test files in the repository to determine the correct naming convention and testing style.** The new tests **must** validate the functionality described in this phase's tasks (`plan.md`).
-3.  **Execute Automated Tests with Proactive Debugging:**
-    -   Before execution, you **must** announce the exact shell command you will use to run the tests.
-    -   **Example Announcement:** "I will now run the automated test suite to verify the phase. **Command:** `CI=true npm test`"
-    -   Execute the announced command.
-    -   If tests fail, you **must** inform the user and begin debugging. You may attempt to propose a fix a **maximum of two times**. If the tests still fail after your second proposed fix, you **must stop**, report the persistent failure, and ask the user for guidance.
-4.  **Propose a Detailed, Actionable Manual Verification Plan:**
-    -   **CRITICAL:** To generate the plan, first analyze `product.md`, `product-guidelines.md`, and `plan.md` to determine the user-facing goals of the completed phase.
-    -   You **must** generate a step-by-step plan that walks the user through the verification process, including any necessary commands and specific, expected outcomes.
-    -   The plan you present to the user **must** follow this format:
-        **For a Frontend Change:**
-        ```
-        The automated tests have passed. For manual verification, please follow these steps:
-        **Manual Verification Steps:**
-        1.  **Start the development server with the command:** `npm run dev`
-        2.  **Open your browser to:** `http://localhost:3000`
-        3.  **Confirm that you see:** The new user profile page, with the user's name and email displayed correctly.
-        ```
-        **For a Backend Change:**
-        ```
-        The automated tests have passed. For manual verification, please follow these steps:
-        **Manual Verification Steps:**
-        1.  **Ensure the server is running.**
-        2.  **Execute the following command in your terminal:** `curl -X POST http://localhost:8080/api/v1/users -d '{"name": "test"}'`
-        3.  **Confirm that you receive:** A JSON response with a status of `201 Created`.
-        ```
-5.  **Await Explicit User Feedback:**
-    -   After presenting the detailed plan, ask the user for confirmation: "**Does this meet your expectations? Please confirm with yes or provide feedback on what needs to be changed.**"
-    -   **PAUSE** and await the user's response. Do not proceed without an explicit yes or confirmation.
-6.  **Create Checkpoint Commit:**
-    -   Stage all changes. If no changes occurred in this step, proceed with an empty commit.
-    -   Perform the commit with a clear and concise message (e.g., `conductor(checkpoint): Checkpoint end of Phase X`).
-7.  **Attach Auditable Verification Report using Git Notes:**
-    -   **Step 7.1: Draft Note Content:** Create a detailed verification report including the automated test command, the manual verification steps, and the user's confirmation.
-    -   **Step 7.2: Attach Note:** Use the `git notes` command and the full commit hash from the previous step to attach the full report to the checkpoint commit.
-8.  **Get and Record Phase Checkpoint SHA:**
-    -   **Step 8.1: Get Commit Hash:** Obtain the hash of the *just-created checkpoint commit* (`git log -1 --format="%H"`).
-    -   **Step 8.2: Update Plan:** Read `plan.md`, find the heading for the completed phase, and append the first 7 characters of the commit hash in the format `[checkpoint: <sha>]`.
-    -   **Step 8.3: Write Plan:** Write the updated content back to `plan.md`.
-9. **Commit Plan Update:**
-    - **Action:** Stage the modified `plan.md` file.
-    - **Action:** Commit this change with a descriptive message following the format `conductor(plan): Mark phase '<PHASE NAME>' as complete`.
-10.  **Announce Completion:** Inform the user that the phase is complete and the checkpoint has been created, with the detailed verification report attached as a git note.
-### Quality Gates
-Before marking any task complete, verify:
-- [ ] All tests pass
-- [ ] Code coverage meets requirements (>80%)
-- [ ] Code follows project's code style guidelines (as defined in `code_styleguides/`)
-- [ ] All public functions/methods are documented (e.g., docstrings, JSDoc, GoDoc)
-- [ ] Type safety is enforced (e.g., type hints, TypeScript types, Go types)
-- [ ] No linting or static analysis errors (using the project's configured tools)
-- [ ] Works correctly on mobile (if applicable)
-- [ ] Documentation updated if needed
-- [ ] No security vulnerabilities introduced
-## Development Commands
-**AI AGENT INSTRUCTION: This section should be adapted to the project's specific language, framework, and build tools.**
-### Setup
-```bash
-# Example: Commands to set up the development environment (e.g., install dependencies, configure database)
-# e.g., for a Node.js project: npm install
-# e.g., for a Go project: go mod tidy
-```
-### Daily Development
-```bash
-# Example: Commands for common daily tasks (e.g., start dev server, run tests, lint, format)
-# e.g., for a Node.js project: npm run dev, npm test, npm run lint
-# e.g., for a Go project: go run main.go, go test ./..., go fmt ./...
-```
-### Before Committing
-```bash
-# Example: Commands to run all pre-commit checks (e.g., format, lint, type check, run tests)
-# e.g., for a Node.js project: npm run check
-# e.g., for a Go project: make check (if a Makefile exists)
-```
-## Testing Requirements
-### Unit Testing
-- Every module must have corresponding tests.
-- Use appropriate test setup/teardown mechanisms (e.g., fixtures, beforeEach/afterEach).
-- Mock external dependencies.
-- Test both success and failure cases.
-### Integration Testing
-- Test complete user flows
-- Verify database transactions
-- Test authentication and authorization
-- Check form submissions
-### Mobile Testing
-- Test on actual iPhone when possible
-- Use Safari developer tools
-- Test touch interactions
-- Verify responsive layouts
-- Check performance on 3G/4G
-## Code Review Process
-### Self-Review Checklist
-Before requesting review:
-1. **Functionality**
-   - Feature works as specified
-   - Edge cases handled
-   - Error messages are user-friendly
-2. **Code Quality**
-   - Follows style guide
-   - DRY principle applied
-   - Clear variable/function names
-   - Appropriate comments
-3. **Testing**
-   - Unit tests comprehensive
-   - Integration tests pass
-   - Coverage adequate (>80%)
-4. **Security**
-   - No hardcoded secrets
-   - Input validation present
-   - SQL injection prevented
-   - XSS protection in place
-5. **Performance**
-   - Database queries optimized
-   - Images optimized
-   - Caching implemented where needed
-6. **Mobile Experience**
-   - Touch targets adequate (44x44px)
-   - Text readable without zooming
-   - Performance acceptable on mobile
-   - Interactions feel native
-## Commit Guidelines
-### Message Format
-```
-<type>(<scope>): <description>
-[optional body]
-[optional footer]
-```
-### Types
-- `feat`: New feature
-- `fix`: Bug fix
-- `docs`: Documentation only
-- `style`: Formatting, missing semicolons, etc.
-- `refactor`: Code change that neither fixes a bug nor adds a feature
-- `test`: Adding missing tests
-- `chore`: Maintenance tasks
-### Examples
-```bash
-git commit -m "feat(auth): Add remember me functionality"
-git commit -m "fix(posts): Correct excerpt generation for short posts"
-git commit -m "test(comments): Add tests for emoji reaction limits"
-git commit -m "style(mobile): Improve button touch targets"
-```
-## Definition of Done
-A task is complete when:
-1. All code implemented to specification
-2. Unit tests written and passing
-3. Code coverage meets project requirements
-4. Documentation complete (if applicable)
-5. Code passes all configured linting and static analysis checks
-6. Works beautifully on mobile (if applicable)
-7. Implementation notes added to `plan.md`
-8. Changes committed with proper message
-9. Git note with task summary attached to the commit
-## Emergency Procedures
-### Critical Bug in Production
-1. Create hotfix branch from main
-2. Write failing test for bug
-3. Implement minimal fix
-4. Test thoroughly including mobile
-5. Deploy immediately
-6. Document in plan.md
-### Data Loss
-1. Stop all write operations
-2. Restore from latest backup
-3. Verify data integrity
-4. Document incident
-5. Update backup procedures
-### Security Breach
-1. Rotate all secrets immediately
-2. Review access logs
-3. Patch vulnerability
-4. Notify affected users (if any)
-5. Document and update security procedures
-## Deployment Workflow
-### Pre-Deployment Checklist
-- [ ] All tests passing
-- [ ] Coverage >80%
-- [ ] No linting errors
-- [ ] Mobile testing complete
-- [ ] Environment variables configured
-- [ ] Database migrations ready
-- [ ] Backup created
-### Deployment Steps
-1. Merge feature branch to main
-2. Tag release with version
-3. Push to deployment service
-4. Run database migrations
-5. Verify deployment
-6. Test critical paths
-7. Monitor for errors
-### Post-Deployment
-1. Monitor analytics
-2. Check error logs
-3. Gather user feedback
-4. Plan next iteration
-## Continuous Improvement
-- Review workflow weekly
-- Update based on pain points
-- Document lessons learned
-- Optimize for user happiness
-- Keep things simple and maintainable