Spaces:

riazmo
/

Design-System-Automation

Sleeping

riazmo Claude Opus 4.6 commited on Feb 23

Commit

f0ceb42

1 Parent(s): 6b43e51

docs: update all docs for v3.2 + add Part 2 component generation research

- Rewrite Medium article for v3.2: color classifier, naming authority chain,
DTCG compliance, 8-source extraction, component generation teaser
- Fix AURORA prompt contradiction in llm_agents.py: align SYSTEM_PROMPT
(advisory only) with PROMPT_TEMPLATE (optional naming_map)
- Update LinkedIn post, image guide, and context doc for v3.2
- Add PART2_COMPONENT_GENERATION.md: 30+ tool research, custom plugin
decision, MVP scope (5 components, 86 variants), architecture plan
- Update CLAUDE.md Phase 5 with research findings and decision

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Files changed (7) hide show

CLAUDE.md +30 -9
PART2_COMPONENT_GENERATION.md +418 -0
agents/llm_agents.py +2 -2
docs/CONTEXT.md +159 -766
docs/IMAGE_GUIDE_EPISODE_6.md +196 -127
docs/LINKEDIN_POST_EPISODE_6.md +104 -86
docs/MEDIUM_ARTICLE_EPISODE_6.md +438 -287

CLAUDE.md CHANGED Viewed

@@ -1123,18 +1123,34 @@ PHASE 4: EXTRACTION IMPROVEMENTS (NOT STARTED)
   4c. ❌ Rule engine: shadow elevation analysis
 ```
-### PHASE 5: COMPONENT GENERATION (FUTURE — NOT STARTED)
-Based on strategic research (Feb 2026), the next major feature is automated component generation in Figma:
 ```
 PHASE 5: FIGMA COMPONENT GENERATION
   5a. Component Definition Schema (JSON defining anatomy + token bindings + variants)
-  5b. Token-to-Component binding engine
-  5c. Figma Plugin: createComponent() + combineAsVariants() + setBoundVariable()
-  5d. MVP Components: Button (60 variants), TextInput (8), Card (2), Toast (4), Checkbox+Radio (12)
-  5e. Variable Collections: Primitives, Semantic, Spacing, Radius, Typography
 PHASE 6: ECOSYSTEM INTEGRATION
   6a. Style Dictionary v4 compatible output (50+ platform formats for free)
   6b. Tokens Studio compatible JSON import
@@ -1151,6 +1167,8 @@ PHASE 7: MCP INTEGRATION
 **"Lighthouse for Design Systems"** — We are NOT a token management platform (Tokens Studio), NOT a documentation platform (Zeroheight), NOT an extraction tool (Dembrandt). We are the **automated audit + bootstrap tool** that sits upstream of all of those.
 **Unique differentiators no competitor has:**
 - Type scale ratio detection + standard scale matching
 - Spacing grid detection (GCD-based, base-8 alignment scoring)
@@ -1158,12 +1176,15 @@ PHASE 7: MCP INTEGRATION
 - Holistic design system quality score (0-100)
 - Visual spec page auto-generated in Figma
 - Benchmark comparison against established design systems
 **Key competitors to watch:**
-- Dembrandt (1,300★) — does extraction better, but no analysis
-- Tokens Studio (264K users) — does Figma management better, but no extraction
 - Knapsack ($10M funding) — building ingestion engine, biggest strategic threat
-- html.to.design — captures layouts but not tokens/variables
 ---

   4c. ❌ Rule engine: shadow elevation analysis
 ```
+### PHASE 5: COMPONENT GENERATION (NEXT — RESEARCH COMPLETE)
+**Full context**: See `PART2_COMPONENT_GENERATION.md` for detailed research, API checks, and architecture.
+**Research finding (Feb 2026)**: 30+ tools evaluated. No production tool takes DTCG JSON -> Figma Components. This is a genuine market gap.
+**Decision**: Custom Figma Plugin (Option A) — extend existing `code.js` with component generation.
 ```
 PHASE 5: FIGMA COMPONENT GENERATION
   5a. Component Definition Schema (JSON defining anatomy + token bindings + variants)
+  5b. Token-to-Component binding engine (resolveTokenValue, bindTokenToVariable)
+  5c. Variable Collection builder (primitives, semantic, spacing, radius, shadow, typography)
+  5d. MVP Components:
+      - Button: 4 variants x 3 sizes x 5 states = 60 variants (2-3 days)
+      - TextInput: 4 states x 2 sizes = 8 variants (1-2 days)
+      - Card: 2 configurations (1 day)
+      - Toast: 4 types success/error/warn/info (1 day)
+      - Checkbox+Radio: ~12 variants (1-2 days)
+  5e. Post-MVP: Toggle (4), Select (multi-state), Modal (3 sizes), Table (template)
+  Estimated: ~1400 lines new plugin code, 8-12 days total
+```
+**Figma Plugin API confirmed**: createComponent(), combineAsVariants(), setBoundVariable(),
+setBoundVariableForPaint(), addComponentProperty(), setReactionsAsync() — ALL supported.
+```
 PHASE 6: ECOSYSTEM INTEGRATION
   6a. Style Dictionary v4 compatible output (50+ platform formats for free)
   6b. Tokens Studio compatible JSON import
 **"Lighthouse for Design Systems"** — We are NOT a token management platform (Tokens Studio), NOT a documentation platform (Zeroheight), NOT an extraction tool (Dembrandt). We are the **automated audit + bootstrap tool** that sits upstream of all of those.
+**With Phase 5**: We become the ONLY tool that goes from URL -> complete Figma design system WITH components. Fully automated. Nobody else does this end-to-end.
 **Unique differentiators no competitor has:**
 - Type scale ratio detection + standard scale matching
 - Spacing grid detection (GCD-based, base-8 alignment scoring)
 - Holistic design system quality score (0-100)
 - Visual spec page auto-generated in Figma
 - Benchmark comparison against established design systems
+- (Phase 5) Automated component generation from extracted tokens
 **Key competitors to watch:**
+- Dembrandt (1,300 stars) — does extraction better, but no analysis, no components
+- Tokens Studio (1M+ installs) — manages tokens, no extraction, no component generation
 - Knapsack ($10M funding) — building ingestion engine, biggest strategic threat
+- Figr Identity — generates components but from brand config, not extracted tokens
+- html.to.design — captures layouts but not tokens/variables/components
+- story.to.design — Storybook->Figma components, but needs full code pipeline
 ---

PART2_COMPONENT_GENERATION.md ADDED Viewed

	@@ -0,0 +1,418 @@

+# Design System Extractor — Part 2: Component Generation
+## Session Context
+**Prerequisite**: Part 1 (Token Extraction + Analysis) is COMPLETE at v3.2
+- Phases 1-3 DONE: Normalizer, Stage 2 agents, Export all working
+- 113 tests passing, W3C DTCG v1 compliant output
+- GitHub: https://github.com/hiriazmo/design-system-extractor-v3
+- Project: `/Users/yahya/design-system-extractor-v3/`
+**This session**: Build automated component generation from extracted tokens into Figma.
+---
+## THE GAP: Nobody Does This
+Exhaustive research of 30+ tools (Feb 2026) confirms:
+**No production tool takes DTCG JSON and outputs Figma Components.**
+```
+YOUR EXTRACTOR                    THE GAP                     FIGMA
++--------------+    +----------------------------+    +------------------+
+| DTCG JSON    |--->|  ??? Nothing does this     |--->| Button component |
+| with tokens  |    |  tokens -> components      |    | with 60 variants |
++--------------+    +----------------------------+    +------------------+
+```
+### What Exists (and What It Can't Do)
+| Category | Best Tool | What It Does | Creates Components? |
+|----------|-----------|-------------|-------------------|
+| Token Importers | Tokens Studio (1M+ installs) | JSON -> Figma Variables | NO - variables only |
+| AI Design | Figma Make | Prompt -> prototype | NO - not token-driven |
+| MCP Bridges | Figma Console MCP (543 stars) | AI writes to Figma | YES but non-deterministic |
+| Code-to-Figma | story.to.design | Storybook -> Figma components | YES but needs full Storybook |
+| Generators | Figr Identity | Brand config -> components | YES but can't consume YOUR tokens |
+| Commercial | Knapsack ($10M), Supernova | Token management | NO - manages, doesn't create |
+| DEAD | Specify.app (shutting down), Backlight.dev (shut down June 2025) | - | - |
+### Key Findings Per Category
+**Token Importers** (7+ tools evaluated): Tokens Studio, TokensBrucke, Styleframe, DTCG Token Manager, GitFig, Supa Design Tokens, Design System Automator — ALL create Figma Variables from JSON, NONE create components.
+**MCP Bridges** (5 tools): Figma Console MCP (Southleft), claude-talk-to-figma-mcp, cursor-talk-to-figma-mcp (Grab), figma-mcp-write-server, Figma-MCP-Write-Bridge — ALL have full write access, but component creation is AI-interpreted (non-deterministic, varies per run).
+**Code-to-Figma**: story.to.design is the standout — creates REAL Figma components with proper variants from Storybook. But requires a full coded component library + running Storybook instance as intermediary.
+**figma-json2component** (GitHub): Experimental proof-of-concept that generates components from custom JSON schema. Not DTCG, not production quality, but validates the concept IS possible.
+---
+## FOUR APPROACHES — RANKED
+### Option A: Custom Figma Plugin (RECOMMENDED)
+```
+DTCG JSON -> Your Plugin reads JSON -> Creates Variables -> Generates Components -> Done
+```
+- **Effort**: 4-8 weeks (~1400 lines of plugin code for 5 MVP components)
+- **Quality**: Highest — fully deterministic, consistent every run
+- **Advantage**: We already have a working plugin (code.js) that imports tokens
+- **Risk**: Low — Figma Plugin API supports everything needed
+### Option B: Pipeline — shadcn + Storybook + story.to.design
+```
+DTCG JSON -> Style Dictionary -> CSS vars -> shadcn themed -> Storybook -> story.to.design -> Figma
+```
+- **Effort**: 2-3 days setup, then 15-30 min per extraction
+- **Quality**: High — battle-tested shadcn components
+- **Dependency**: story.to.design (commercial, paid)
+- **Risk**: Medium — many moving parts
+### Option C: MCP + Claude AI Chain
+```
+DTCG JSON -> Claude reads tokens -> Figma Console MCP -> AI creates components -> Figma
+```
+- **Effort**: 2-3 weeks
+- **Quality**: Medium — non-deterministic
+- **Risk**: High — AI output varies per run
+### Option D: Figr Identity + Manual Token Swap
+```
+Figr Identity generates base system -> Manually swap tokens -> Adjust
+```
+- **Effort**: 1-2 days
+- **Quality**: Medium — not YOUR tokens
+- **Risk**: Medium — manual alignment needed
+**Decision: Option A (Custom Plugin)** — we already have 80% of the infrastructure, it's deterministic, no external dependencies, and fills a genuine market gap.
+---
+## FIGMA PLUGIN API: FULL CAPABILITY CHECK
+Every feature needed for component generation is supported:
+| Requirement | API Method | Status |
+|------------|-----------|--------|
+| Create components | `figma.createComponent()` | Supported |
+| Variant sets (60 variants) | `figma.combineAsVariants()` | Supported |
+| Auto-layout with padding | `layoutMode`, `paddingTop/Right/Bottom/Left`, `itemSpacing` | Supported |
+| Text labels | `figma.createText()` + `loadFontAsync()` | Supported |
+| Icon slot (optional) | `addComponentProperty("ShowIcon", "BOOLEAN", true)` | Supported |
+| Instance swap (icons) | `addComponentProperty("Icon", "INSTANCE_SWAP", id)` | Supported |
+| Border radius from tokens | `setBoundVariable('topLeftRadius', radiusVar)` | Supported |
+| Colors from tokens | `setBoundVariableForPaint()` -> binds to variables | Supported |
+| Shadows from tokens | `setBoundVariableForEffect()` | Supported (has spread bug, workaround exists) |
+| Hover/press interactions | `node.setReactionsAsync()` with `ON_HOVER`/`ON_PRESS` | Supported |
+| Expose text property | `addComponentProperty("Label", "TEXT", "Button")` | Supported |
+| Disabled opacity | `node.opacity = 0.5` | Supported |
+---
+## MVP SCOPE: 5 Components, 62 Variants
+| Component | Variants | Automatable? | Effort |
+|-----------|---------|-------------|--------|
+| **Button** | 4 variants x 3 sizes x 5 states = 60 | Fully | 2-3 days |
+| **Text Input** | 4 states x 2 sizes = 8 | Fully | 1-2 days |
+| **Card** | 2 configurations | Semi | 1 day |
+| **Toast/Notification** | 4 types (success/error/warn/info) | Fully | 1 day |
+| **Checkbox + Radio** | ~12 variants | Fully | 1-2 days |
+| **Total** | **~86 variants** | | **8-12 days** |
+### Post-MVP Components
+| Component | Variants | Automatable? | Effort |
+|-----------|---------|-------------|--------|
+| Toggle/Switch | on/off x enabled/disabled = 4 | Fully | 0.5 day |
+| Select/Dropdown | Multiple states | Semi | 1-2 days |
+| Modal/Dialog | 3 sizes | Semi | 1 day |
+| Table | Header + data rows | Template-based | 2 days |
+---
+## TOKEN-TO-COMPONENT MAPPING
+How extracted tokens bind to component properties:
+### Button Example
+```
+Token                    -> Figma Property
+-------------------------------------------------
+color.brand.primary      -> Fill (default state)
+color.brand.600          -> Fill (hover state)
+color.brand.700          -> Fill (pressed state)
+color.text.inverse       -> Text color
+color.neutral.200        -> Fill (secondary variant)
+color.neutral.300        -> Fill (secondary hover)
+radius.md                -> Corner radius (all corners)
+shadow.sm                -> Drop shadow (elevated variant)
+spacing.3                -> Padding horizontal (16px)
+spacing.2                -> Padding vertical (8px)
+font.body.md             -> Text style (label)
+```
+### Variable Collections Needed
+```
+1. Primitives     -> Raw color palette (blue.50 through blue.900, etc.)
+2. Semantic       -> Role-based aliases (brand.primary -> blue.500)
+3. Spacing        -> 4px grid (spacing.1=4, spacing.2=8, spacing.3=12...)
+4. Radius         -> none/sm/md/lg/xl/full
+5. Shadow         -> xs/sm/md/lg/xl elevation levels
+6. Typography     -> Font families, sizes, weights, line-heights
+```
+---
+## COMPONENT DEFINITION SCHEMA (Proposed)
+Each component needs a JSON definition describing its anatomy, token bindings, and variant matrix:
+```json
+{
+  "component": "Button",
+  "anatomy": {
+    "root": {
+      "type": "frame",
+      "layout": "horizontal",
+      "padding": { "h": "spacing.3", "v": "spacing.2" },
+      "radius": "radius.md",
+      "fill": "color.brand.primary",
+      "gap": "spacing.2"
+    },
+    "icon_slot": {
+      "type": "instance_swap",
+      "size": 16,
+      "visible": false,
+      "property": "ShowIcon"
+    },
+    "label": {
+      "type": "text",
+      "style": "font.body.md",
+      "color": "color.text.inverse",
+      "content": "Button",
+      "property": "Label"
+    }
+  },
+  "variants": {
+    "Variant": ["Primary", "Secondary", "Outline", "Ghost"],
+    "Size": ["Small", "Medium", "Large"],
+    "State": ["Default", "Hover", "Pressed", "Focused", "Disabled"]
+  },
+  "variant_overrides": {
+    "Variant=Secondary": {
+      "root.fill": "color.neutral.200",
+      "label.color": "color.text.primary"
+    },
+    "Variant=Outline": {
+      "root.fill": "transparent",
+      "root.stroke": "color.border.primary",
+      "root.strokeWeight": 1,
+      "label.color": "color.brand.primary"
+    },
+    "Variant=Ghost": {
+      "root.fill": "transparent",
+      "label.color": "color.brand.primary"
+    },
+    "State=Hover": {
+      "root.fill": "color.brand.600"
+    },
+    "State=Pressed": {
+      "root.fill": "color.brand.700"
+    },
+    "State=Disabled": {
+      "root.opacity": 0.5
+    },
+    "Size=Small": {
+      "root.padding.h": "spacing.2",
+      "root.padding.v": "spacing.1",
+      "label.style": "font.body.sm"
+    },
+    "Size=Large": {
+      "root.padding.h": "spacing.4",
+      "root.padding.v": "spacing.3",
+      "label.style": "font.body.lg"
+    }
+  }
+}
+```
+### Component Generation Pattern (Plugin Code)
+Every component follows the same pipeline:
+```
+1. Read tokens from DTCG JSON
+2. Create Variable Collections (if not exist)
+3. For each variant combination:
+   a. Create frame with auto-layout
+   b. Add child nodes (icon slot, label, etc.)
+   c. Apply token bindings via setBoundVariable()
+   d. Apply variant-specific overrides
+4. combineAsVariants() -> component set
+5. Add component properties (Label text, ShowIcon boolean)
+```
+---
+## ARCHITECTURE FOR PLUGIN EXTENSION
+Current plugin (`code.js`) already does:
+- Parse DTCG JSON (isDTCGFormat detection)
+- Create paint styles from colors
+- Create text styles from typography
+- Create effect styles from shadows
+- Create variable collections
+What needs to be ADDED:
+```
+code.js (existing ~1200 lines)
+  |
+  +-- componentGenerator.js (NEW ~1400 lines)
+  |     |-- generateButton()      ~250 lines
+  |     |-- generateTextInput()   ~200 lines
+  |     |-- generateCard()        ~150 lines
+  |     |-- generateToast()       ~150 lines
+  |     |-- generateCheckbox()    ~200 lines
+  |     |-- generateRadio()       ~150 lines
+  |     +-- shared utilities      ~300 lines
+  |          |-- createAutoLayoutFrame()
+  |          |-- bindTokenToVariable()
+  |          |-- buildVariantMatrix()
+  |          |-- resolveTokenValue()
+  |
+  +-- componentDefinitions.json (NEW ~500 lines)
+        |-- Button definition
+        |-- TextInput definition
+        |-- Card definition
+        |-- Toast definition
+        +-- Checkbox/Radio definition
+```
+### Implementation Order
+```
+Week 1-2: Infrastructure
+  - Variable collection builder (primitives, semantic, spacing, radius, shadow)
+  - Token resolver (DTCG path -> Figma variable reference)
+  - Auto-layout frame builder with token bindings
+  - Variant matrix generator
+Week 3-4: MVP Components
+  - Button (60 variants) — most complex, validates the full pipeline
+  - TextInput (8 variants) — validates form patterns
+  - Toast (4 variants) — validates feedback patterns
+Week 5-6: Remaining MVP + Polish
+  - Card (2 configs) — validates layout composition
+  - Checkbox + Radio (12 variants) — validates toggle patterns
+  - Error handling, edge cases, testing
+Week 7-8: Post-MVP (if time)
+  - Toggle/Switch, Select, Modal
+  - Documentation
+```
+---
+## EXISTING FILES TO KNOW ABOUT
+| File | Purpose | Lines |
+|------|---------|-------|
+| `app.py` | Main Gradio app, token extraction orchestration | ~5000 |
+| `agents/llm_agents.py` | AURORA, ATLAS, SENTINEL, NEXUS LLM agents | ~1200 |
+| `agents/normalizer.py` | Token normalization (colors, radius, shadows) | ~950 |
+| `core/color_classifier.py` | Rule-based color classification (PRIMARY authority) | ~815 |
+| `core/color_utils.py` | Color math (hex/RGB/HSL, contrast, ramps) | ~400 |
+| `core/rule_engine.py` | Type scale, WCAG, spacing grid analysis | ~1100 |
+| `output_json/figma-plugin-extracted/figma-design-token-creator 5/src/code.js` | **Figma plugin — EXTEND THIS** | ~1200 |
+| `output_json/figma-plugin-extracted/figma-design-token-creator 5/src/ui.html` | Plugin UI | ~500 |
+### DTCG Output Format (What the Plugin Receives)
+```json
+{
+  "color": {
+    "brand": {
+      "primary": {
+        "$type": "color",
+        "$value": "#005aa3",
+        "$description": "[classifier] brand: primary_action",
+        "$extensions": {
+          "com.design-system-extractor": {
+            "frequency": 47,
+            "confidence": "high",
+            "category": "brand",
+            "evidence": ["background-color on <a>", "background-color on <button>"]
+          }
+        }
+      }
+    }
+  },
+  "radius": {
+    "md": { "$type": "dimension", "$value": "8px" },
+    "lg": { "$type": "dimension", "$value": "16px" },
+    "full": { "$type": "dimension", "$value": "9999px" }
+  },
+  "shadow": {
+    "sm": {
+      "$type": "shadow",
+      "$value": {
+        "offsetX": "0px",
+        "offsetY": "2px",
+        "blur": "8px",
+        "spread": "0px",
+        "color": "#00000026"
+      }
+    }
+  },
+  "typography": {
+    "body": {
+      "md": {
+        "$type": "typography",
+        "$value": {
+          "fontFamily": "Inter",
+          "fontSize": "16px",
+          "fontWeight": 400,
+          "lineHeight": 1.5,
+          "letterSpacing": "0px"
+        }
+      }
+    }
+  },
+  "spacing": {
+    "1": { "$type": "dimension", "$value": "4px" },
+    "2": { "$type": "dimension", "$value": "8px" },
+    "3": { "$type": "dimension", "$value": "16px" }
+  }
+}
+```
+---
+## COMPETITIVE ADVANTAGE
+Building this fills a genuine market gap:
+- **Tokens Studio** (1M+ installs) = token management, no component generation
+- **Figr Identity** = generates components but from brand config, not YOUR tokens
+- **story.to.design** = needs full Storybook pipeline as intermediary
+- **MCP bridges** = non-deterministic AI interpretation
+- **Us** = DTCG JSON in, deterministic Figma components out. Nobody else does this.
+### Strategic Position
+```
+[Extract from website] -> [Analyze & Score] -> [Generate Components in Figma]
+     Part 1 (DONE)          Part 1 (DONE)          Part 2 (THIS)
+```
+We become the only tool that goes from URL to complete Figma design system with components — fully automated.
+---
+## OPEN QUESTIONS FOR THIS SESSION
+1. Should component definitions live in JSON (data-driven) or be hardcoded in JS (simpler)?
+2. Should we generate all 60 Button variants at once, or let user pick which variants?
+3. How to handle missing tokens? (e.g., site has no shadow tokens — skip shadow on buttons or use defaults?)
+4. Should we support dark mode variants from the start, or add later?
+5. Icon system — use a bundled icon set (Lucide?) or just placeholder frames?

agents/llm_agents.py CHANGED Viewed

@@ -364,7 +364,7 @@ For each area: THINK → ACT → OBSERVE → VERIFY.
   "palette_strategy": "complementary|analogous|triadic|monochromatic|random",
   "cohesion_score": N,
   "cohesion_notes": "...",
-  "naming_map": {},
   "typography_notes": "Heading: Inter 700, Body: Inter 400. Clean hierarchy.",
   "spacing_notes": "8px grid, 92% aligned.",
   "radius_notes": "Rounded style: 4px inputs, 8px cards.",
@@ -391,7 +391,7 @@ Return ONLY valid JSON."""
 ## SHADOWS
 {shadow_data}
-Use ReAct for each area. Name EVERY color in naming_map."""
     def __init__(self, hf_client):
         self.hf_client = hf_client

   "palette_strategy": "complementary|analogous|triadic|monochromatic|random",
   "cohesion_score": N,
   "cohesion_notes": "...",
+  "naming_map": {},  // Optional: ONLY semantic role suggestions (brand.primary, text.secondary, etc.)
   "typography_notes": "Heading: Inter 700, Body: Inter 400. Clean hierarchy.",
   "spacing_notes": "8px grid, 92% aligned.",
   "radius_notes": "Rounded style: 4px inputs, 8px cards.",
 ## SHADOWS
 {shadow_data}
+Use ReAct for each area. If you see clear semantic roles (brand primary, text color, etc.), suggest them in naming_map. Otherwise leave naming_map empty — the rule-based classifier handles naming."""
     def __init__(self, hf_client):
         self.hf_client = hf_client

docs/CONTEXT.md CHANGED Viewed

@@ -1,797 +1,190 @@
-# Design System Extractor v2 — Master Context File
 > **Upload this file to refresh Claude's context when continuing work on this project.**
-**Last Updated:** January 2026
 ---
-## 📁 Files Changed in Latest Session
-| File | What Changed |
-|------|--------------|
-| `agents/extractor.py` | Enhanced 7-source extraction (DOM, CSS vars, SVG, inline, stylesheets, external CSS, page scan) |
-| `agents/firecrawl_extractor.py` | **NEW** Agent 1B for deep CSS parsing |
-| `agents/semantic_analyzer.py` | **NEW** Agent 1C for semantic color categorization (brand/text/bg/border) |
-| `core/preview_generator.py` | AS-IS previews + Color Ramps sorted by brand priority |
-| `app.py` | Stage 1 UI now has 6 preview tabs including Semantic Colors |
-| `docs/CONTEXT.md` | Updated with semantic analyzer, full architecture diagrams |
 ---
-## 🎯 Project Goal
-Build a **semi-automated, human-in-the-loop agentic system** that:
 1. Reverse-engineers a design system from a live website
-2. Reconstructs and upgrades it into a modern, scalable design system
-3. Outputs production-ready JSON tokens (Figma Tokens Studio compatible)
-**Philosophy:** This is a design-aware co-pilot, NOT a magic button. Humans decide, agents propose.
----
-## 🤔 Why This Project? (Market Differentiation)
-### The Problem We Solve
-| Pain Point | Who Has It | Current Solutions | Why They Fail |
-|------------|------------|-------------------|---------------|
-| Legacy websites with no design system | Enterprise teams | Manual audit (weeks) | Time-consuming, error-prone |
-| Inconsistent design tokens scattered in CSS | Agencies inheriting projects | Figma plugins (style extractors) | Only extract from Figma, not live sites |
-| Need to modernize without breaking existing | Product teams | Design system generators | Generate new, don't reverse-engineer existing |
-| AA compliance gaps unknown | Accessibility teams | Contrast checkers | Check one color at a time, no system view |
-### Existing Tools & Their Gaps
-| Tool | What It Does | Gap We Fill |
-|------|--------------|-------------|
-| **Figma Tokens Studio** | Manages tokens in Figma | Doesn't extract from websites |
-| **Style Dictionary** | Transforms tokens to code | Needs tokens first (we create them) |
-| **Polypane/VisBug** | Inspect live sites | No systematic extraction or upgrade |
-| **AI Design Tools** (Galileo, Uizard) | Generate new designs | Don't reverse-engineer existing |
-| **CSS Stats** | Analyze CSS files | Statistics only, no actionable tokens |
-| **Chromatic/Percy** | Visual regression | Compare, don't extract or upgrade |
-### Our Unique Value Proposition
-```
-┌─────────────────────────────────────────────────────────────────────────────┐
-│                     WHAT MAKES US DIFFERENT                                 │
-├─────────────────────────────────────────────────────────────────────────────┤
-│                                                                             │
-│  1. REVERSE-ENGINEERING (not generation)                                    │
-│     • Extracts from LIVE websites, not design files                         │
-│     • Preserves what's working, upgrades what's broken                      │
-│     • Respects existing brand decisions                                     │
-│                                                                             │
-│  2. MULTI-AGENT REASONING (not single LLM)                                  │
-│     • Two analysts with different perspectives                              │
-│     • HEAD compiler resolves conflicts                                      │
-│     • Shows reasoning, not just results                                     │
-│                                                                             │
-│  3. HUMAN-IN-THE-LOOP (not magic button)                                    │
-│     • Designer reviews every stage                                          │
-│     • Accept/reject individual tokens                                       │
-│     • Choose from upgrade OPTIONS, not forced decisions                     ��
-│                                                                             │
-│  4. VISUAL PREVIEWS (not just data tables)                                  │
-│     • Typography rendered in actual detected font                           │
-│     • Color ramps with AA compliance per shade                              │
-│     • See before you export                                                 │
-│                                                                             │
-│  5. COST-TRANSPARENT (not black box)                                        │
-│     • Shows token usage and cost per analysis                               │
-│     • Uses HF free tier ($0.10/mo) or Pro ($2/mo)                          │
-│     • ~$0.05 per full analysis                                              │
-│                                                                             │
-└─────────────────────────────────────────────────────────────────────────────┘
-```
-### Target Users
-| User | Use Case | Value |
-|------|----------|-------|
-| **UX Managers** (like you!) | Modernize legacy booking platforms | Weeks → Hours |
-| **Design System Teams** | Audit and standardize existing properties | Systematic, not ad-hoc |
-| **Agencies** | Onboard client projects with no documentation | Instant design inventory |
-| **Accessibility Consultants** | AA compliance audit with fixes | Full palette view |
-| **Developers** | Get production-ready tokens from designer's website | No manual translation |
-### Why Not Just Use [X]?
-**"Why not just inspect the CSS manually?"**
-→ You could, but it takes weeks for a complex site. We do it in minutes with systematic coverage.
-**"Why not use Figma's native styles?"**
-→ Many legacy sites were never in Figma. We extract from the source of truth: the live website.
-**"Why do you need AI? Can't rules handle this?"**
-→ Rules extract tokens. AI understands *design intent* — why is this color used here? What scale was intended? Where does it deviate from best practices?
-**"Isn't this just CSS Stats with AI?"**
-→ CSS Stats tells you what exists. We tell you what it *should* be and give you actionable upgrade paths.
----
-## 🏗️ Architecture Overview
-```
-┌─────────────────────────────────────────────────────────────────────────────┐
-│                              TECH STACK                                     │
-├─────────────────────────────────────────────────────────────────────────────┤
-│  Frontend:        Gradio (long-scroll, sectioned UI with live preview)      │
-│  Orchestration:   LangGraph (agent state management & workflow)             │
-│  Models:          HuggingFace Inference Providers (Novita, Groq, etc.)     │
-│  Hosting:         Hugging Face Spaces                                       │
-│  Storage:         HF Spaces persistent storage                              │
-│  Output:          Platform-agnostic JSON tokens (Figma Tokens Studio)       │
-└─────────────────────────────────────────────────────────────────────────────┘
-```
----
-## 🧠 Model Assignments
-### Stage 2: Multi-Agent Analysis (4 Named Agents + Rule Engine)
-| Agent | Persona | Model | Temperature | Cost |
-|-------|---------|-------|-------------|------|
-| **Rule Engine** | — (deterministic) | None | — | FREE |
-| **AURORA** | Brand Color Analyst | `Qwen/Qwen2.5-72B-Instruct` | 0.4 | ~Free (HF PRO) |
-| **ATLAS** | Benchmark Advisor | `meta-llama/Llama-3.3-70B-Instruct` | 0.25 | ~Free (HF PRO) |
-| **SENTINEL** | Best Practices Auditor | `Qwen/Qwen2.5-72B-Instruct` | 0.2 | ~Free (HF PRO) |
-| **NEXUS** | Head Synthesizer | `meta-llama/Llama-3.3-70B-Instruct` | 0.3 | ~$0.001 |
-**Architecture:**
-```
-┌─────────────────────────────────────────────────────────────────────────────┐
-│  LAYER 1: DETERMINISTIC (Free — $0.00)                                      │
-│  ├─ WCAG Contrast Checker (actual FG/BG pairs, not just vs white)           │
-│  ├─ Type Scale Detection (ratio math, variance, standard comparison)        │
-│  ├─ Spacing Grid Analysis (GCD math, alignment %)                           │
-│  └─ Color Statistics (unique, near-duplicates, hue distribution)            │
-│                                                                             │
-│  LAYER 2: 4 AI AGENTS (~$0.003 total)                                       │
-│                                                                             │
-│   Rule Engine Results                                                        │
-│         │                                                                    │
-│    ┌────┼────────────────┐                                                   │
-│    ↓    ↓                ↓                                                   │
-│  ┌────────┐  ┌────────┐  ┌──────────┐                                       │
-│  │ AURORA │  │ ATLAS  │  │ SENTINEL │   (analyze in parallel)               │
-│  │ Brand  │  │ Bench- │  │ Best     │                                       │
-│  │ Colors │  │ marks  │  │ Practices│                                       │
-│  │Qwen 72B│  │Llama70B│  │ Qwen 72B │                                       │
-│  └───┬────┘  └───┬────┘  └────┬─────┘                                       │
-│      └───────────┼────────────┘                                              │
-│                  ↓                                                            │
-│           ┌───────────┐                                                      │
-│           │   NEXUS   │  (final synthesis)                                   │
-│           │ Llama 70B │                                                      │
-│           │ • Resolve │                                                      │
-│           │ • Score   │                                                      │
-│           │ • Top 3   │                                                      │
-│           └───────────┘                                                      │
-└─────────────────────────────────────────────────────────────────────────────┘
-```
-### Other Agents
-| Agent | Role | Model | Provider | Why |
-|-------|------|-------|----------|-----|
-| **Agent 1** | Crawler & Extractor | None (Rule-based) | — | Pure CSS extraction, no LLM needed |
-| **Agent 2** | Normalizer | `microsoft/Phi-3.5-mini-instruct` | Novita | Fast, great structured output |
-| **Agent 4** | Generator | `mistralai/Codestral-22B-v0.1` | Novita | Code specialist, JSON formatting |
-### Provider Configuration
-Default provider: **Novita** (configurable in `config/agents.yaml`)
-Available providers (via HuggingFace Inference Providers):
-- **novita** - Default, good balance
-- **groq** - Fastest
-- **cerebras** - Ultra-fast
-- **sambanova** - Good for Llama
-- **together** - Wide model selection
-### Cost Tracking
-Estimated cost per Stage 2 analysis: **~$0.003**
-- Rule Engine: $0.00 (free — pure math)
-- AURORA + ATLAS + SENTINEL: ~Free within HF PRO ($9/mo subscription)
-- NEXUS: ~$0.001
-- HuggingFace PRO tier: $9/month (covers inference for all models)
----
-## 👁️ Visual Previews
-### Stage 1: AS-IS Previews (No Enhancements)
-Shows raw extracted values exactly as found on the website:
-| Preview | What It Shows |
-|---------|---------------|
-| **Typography** | Actual font rendered with detected styles |
-| **Colors** | Simple swatches with hex, frequency, context, AA status |
-| **Spacing** | Visual bars representing each spacing value |
-| **Radius** | Boxes with each border-radius applied |
-| **Shadows** | Cards with each box-shadow applied |
-### Stage 2: Enhanced Previews (Upgraded)
-Shows proposed upgrades and improvements:
-| Preview | What It Shows |
-|---------|---------------|
-| **Typography** | Type scale comparison (1.2, 1.25, 1.333 ratios) |
-| **Color Ramps** | 11 shades (50-950) with AA compliance per shade |
----
-## 🔍 Enhanced Extraction (Agent 1)
-Agent 1 now extracts from **5 sources** to capture ALL colors:
-```
-┌─────────────────────────────────────────────────────────────────────────────┐
-│                    ENHANCED EXTRACTION SOURCES                               │
-├─────────────────────────────────────────────────────────────────────────────┤
-│                                                                             │
-│  1. DOM Computed Styles                                                     │
-│     • window.getComputedStyle(element)                                      │
-│     • Captures: color, background-color, border-color, etc.                 │
-│                                                                             │
-│  2. CSS Variables                                                           │
-│     • :root { --primary-color: #3860be; }                                  │
-│     • Parses all stylesheets for CSS custom properties                     │
-│                                                                             │
-│  3. SVG Colors                                                              │
-│     • <svg fill="#00c4cc">                                                 │
-│     • <path stroke="#3860be">                                              │
-│                                                                             │
-│  4. Inline Styles                                                           │
-│     • <div style="background-color: #bcd432;">                             │
-│     • Parses style attributes for color values                             │
-│                                                                             │
-│  5. Stylesheet Rules                                                        │
-│     • Parses CSS rules that may not be applied to visible elements         │
-│     • Catches hover states, pseudo-elements, etc.                          │
-│                                                                             │
-└─────────────────────────────────────────────────────────────────────────────┘
-```
----
-## 📋 Enhanced Logging
-### Stage 1 Extraction Logs
-Shows detailed extraction progress:
-```
-============================================================
-🖥️ DESKTOP EXTRACTION (1440px)
-============================================================
-📡 Enhanced extraction from 5 sources:
-   1. DOM computed styles (getComputedStyle)
-   2. CSS variables (:root { --color: })
-   3. SVG colors (fill, stroke)
-   4. Inline styles (style='color:')
-   5. Stylesheet rules (CSS files)
-   6. External CSS files (fetch & parse)
-   7. Page content scan (brute-force)
-📊 EXTRACTION RESULTS:
-   Colors:     45 unique
-   Typography: 12 styles
-   Spacing:    28 values
-   Radius:     8 values
-   Shadows:    4 values
-🎨 CSS Variables found: 15
-   --primary-color: #3860be
-   --accent-color: #00c4cc
-   --brand-lime: #bcd432
-   ... and 12 more
-🔄 Normalizing (deduping, naming)...
-   ✅ Normalized: 32 colors, 10 typography, 18 spacing
-============================================================
-🔥 FIRECRAWL CSS EXTRACTION
-============================================================
-   🌐 Scraping: https://example.com
-   ✅ Page scraped (125000 chars)
-   📝 Parsing <style> blocks...
-      Found 5 style blocks
-   🔗 Finding linked CSS files...
-      Found 8 CSS files
-   📄 Fetching: main.css...
-      ✅ Parsed (234 colors)
-   📄 Fetching: theme.css...
-      ✅ Parsed (45 colors)
-📊 FIRECRAWL RESULTS:
-   CSS files parsed:    8
-   Style blocks parsed: 5
-   CSS variables found: 23
-   Unique colors found: 156
-   🎨 Top colors found:
-      #06b2c4 (used 45x)
-      #c1df1f (used 38x)
-      #373737 (used 120x)
-🔀 Merging Firecrawl colors with Playwright extraction...
-   ✅ Added 12 new colors from Firecrawl
-   📊 Total colors now: 44
-============================================================
-🧠 SEMANTIC COLOR ANALYSIS
-============================================================
-   📊 Analyzing 143 colors...
-   Using rule-based analysis (no LLM)
-📊 SEMANTIC ANALYSIS RESULTS:
-   🎨 BRAND COLORS:
-      primary: #06b2c4 (high)
-         └─ Most frequent saturated color on interactive elements (freq: 33)
-      secondary: #c1df1f (medium)
-         └─ Second most frequent brand color (freq: 15)
-   📝 TEXT COLORS:
-      primary: #373737 (high)
-      secondary: #666666 (medium)
-   🖼️ BACKGROUND COLORS:
-      primary: #ffffff (high)
-      secondary: #f5f5f5 (medium)
-   📈 SUMMARY:
-      Total colors analyzed: 143
-      Brand colors found: 2
-      Clear hierarchy: Yes
-      Analysis method: rule-based
-```
-### Stage 2 LLM Analysis Logs (With Semantic Context)
-Shows detailed reasoning from each agent WITH semantic context:
-```
-============================================================
-🧠 STAGE 2: MULTI-AGENT ANALYSIS
-============================================================
-🧠 SEMANTIC CONTEXT FROM STAGE 1:
-   Brand Primary: #06b2c4
-   Text Primary: #373737
-   Analysis Method: rule-based
-=======================================================
-🤖 LLM 1: meta-llama/Llama-3.1-70B-Instruct
-=======================================================
-   Provider: novita
-   💰 Cost: $0.29/M in, $0.59/M out
-   📝 Task: Typography, Colors, AA, Spacing analysis
-   🧠 Semantic context: Yes  ← NEW: LLM knows color roles!
-   📊 LLM 1 FINDINGS:
-   COLORS (with semantic context):
-   ├─ Brand Primary (#06b2c4): "Fails AA on white (3.2:1)"
-   ├─ Suggested fix: "#0891a8 (4.6:1)"
-   └─ Score: 6/10
-=======================================================
-🎯 HEAD: Compiling final recommendations...
-=======================================================
-   📥 INPUT: Analyzing outputs from LLM 1 + LLM 2 + Rules + Semantic...
-   📊 HEAD SYNTHESIS:
-   COLOR RECOMMENDATIONS (per semantic role):
-   ├─ brand.primary: #06b2c4 → Keep for branding, use #0891a8 for text
-   ├─ text.primary: #373737 → Keep (passes AA)
-   └─ Generate ramps for: brand.primary, brand.secondary, neutral
-```
----
-## 🤖 Agent Personas
-### Agent 1A: Website Crawler & Enhanced Extractor
-- **Persona:** Meticulous Design Archaeologist
-- **Tool:** Playwright
-- **Job:**
-  - Auto-discover 10+ pages from base URL
-  - Crawl Desktop (1440px) + Mobile (375px) separately
-  - Scroll to bottom + wait for network idle
-  - **ENHANCED: Extract from 7 sources:**
-    1. DOM computed styles (`getComputedStyle`)
-    2. CSS variables (`:root { --primary: #xxx }`)
-    3. SVG colors (`fill`, `stroke` attributes)
-    4. Inline styles (`style="background-color: #xxx"`)
-    5. Stylesheet rules (CSS files, hover states, pseudo-elements)
-    6. External CSS files (fetch & parse to bypass CORS)
-    7. Page content scan (brute-force regex on HTML)
-- **Output:** Raw tokens with frequency, context, confidence, source type
-### Agent 1B: Firecrawl CSS Deep Diver
-- **Persona:** CSS Deep Diver
-- **Tool:** Firecrawl / httpx fallback
-- **Job:**
-  - Fetch and parse ALL linked CSS files
-  - Extract colors from CSS rules and variables
-  - Bypass CORS restrictions
-  - Find colors missed by DOM inspection
-- **Output:** Additional colors merged into main extraction
-### Agent 1C: Semantic Color Analyzer (NEW - LLM)
-- **Persona:** Design System Semanticist
-- **Tool:** Rule-based analysis (LLM optional)
-- **Job:**
-  - Analyze colors based on actual CSS usage (not guessing)
-  - Categorize into semantic roles:
-    - **Brand Colors:** Used on buttons, CTAs, links (interactive elements)
-    - **Text Colors:** Used with `color` property on p, span, h1-h6
-    - **Background Colors:** Used with `background-color` on containers
-    - **Border Colors:** Used with `border-color` properties
-    - **Feedback Colors:** Error (red), success (green), warning (yellow)
-  - Detect color hierarchy (primary → secondary → muted)
-- **Input:** Colors WITH context data (css_properties, elements, frequency)
-- **Output:** Semantic categorization with confidence levels
-- **Why:** Stage 2 LLMs can now give SPECIFIC recommendations per role
-### Agent 2: Token Normalizer & Structurer
-- **Persona:** Design System Librarian
-- **Job:**
-  - Clean noisy extraction, dedupe
-  - Infer naming patterns
-  - Tag tokens as: `detected` | `inferred` | `low-confidence`
-- **Output:** Structured token sets with metadata
-### Agent 3: Design System Best Practices Advisor
-- **Persona:** Senior Staff Design Systems Architect
-- **Job:**
-  - Research modern DS patterns (Material, Polaris, Carbon, etc.)
-  - Propose upgrade OPTIONS (not decisions)
-  - Suggest: type scales (3 options), spacing (8px), color ramps (AA compliant), naming conventions
-- **Output:** Option sets with rationale
-### Agent 4: Plugin & JSON Generator
-- **Persona:** Automation Engineer
-- **Job:**
-  - Convert finalized tokens to Figma-compatible JSON
-  - Generate: typography, color (with tints/shades), spacing variables
-  - Maintain Desktop + Mobile + version metadata
-- **Output:** Production-ready JSON (flat structure for Figma Tokens Studio)
----
-## 🖥️ UI Stages (3 Stages)
-### Stage 1: Extraction Review (AS-IS)
-- **Purpose:** Trust building — show exactly what was extracted
-- **Shows:**
-  - Token tables (colors, typography, spacing)
-  - **6 Visual Preview Tabs (AS-IS, no enhancements):**
-    1. 🔤 Typography — actual font rendered
-    2. 🎨 Colors — simple swatches sorted by frequency (no ramps)
-    3. 🧠 Semantic Colors — colors organized by usage (brand/text/bg/border)
-    4. 📏 Spacing — visual bars
-    5. 🔘 Radius — rounded boxes
-    6. 🌑 Shadows — shadow cards
-- **Human Actions:** Accept/reject tokens, flag anomalies, toggle Desktop↔Mobile
-### Stage 2: Upgrade Playground (MOST IMPORTANT)
-- **Purpose:** Decision-making through live visuals
-- **Shows:**
-  - Side-by-side option selector + live preview
-  - **Color Ramps (50-950 shades with AA compliance)**
-  - Type scale options (1.2, 1.25, 1.333)
-  - **Semantic-aware recommendations:** "Your brand primary #06b2c4 fails AA, consider #0891a8"
-- **Human Actions:** Select type scale A/B/C, spacing system, color ramps — preview updates instantly
-### Stage 3: Final Review & Export
-- **Purpose:** Confidence before export
-- **Shows:** Token preview, JSON tree, diff view (original vs final)
-- **Human Actions:** Download JSON, save version, label version
----
-## 📁 Project Structure
-```
-design-system-extractor/
-├── app.py                          # Gradio main entry point
-├── requirements.txt
-├── README.md
-│
-├── config/
-│   ├── .env.example                # Environment variables template
-│   ├── agents.yaml                 # Agent personas & configurations
-│   └── settings.py                 # Application settings
-│
-├── agents/
-│   ├── __init__.py
-│   ├── state.py                    # LangGraph state definitions
-│   ├── graph.py                    # LangGraph workflow orchestration
-│   ├── crawler.py                  # Agent 1A: Website crawler
-│   ├── extractor.py                # Agent 1A: Token extraction (7 sources)
-│   ├── firecrawl_extractor.py      # Agent 1B: Deep CSS parsing
-│   ├── semantic_analyzer.py        # Agent 1C: Semantic color categorization
-│   ├── normalizer.py               # Agent 2: Token normalization
-│   ├── advisor.py                  # Agent 3: Best practices
-│   ├── stage2_graph.py             # Stage 2 multi-agent LLM workflow
-│   └── generator.py                # Agent 4: JSON generator
-│
-├── core/
-│   ├── __init__.py
-│   ├── color_utils.py              # Color analysis, contrast, ramps
-│   ├── preview_generator.py        # HTML preview generation
-│   ├── hf_inference.py             # HuggingFace LLM inference
-│   └── token_schema.py             # Token data structures (Pydantic)
-│
-├── ui/
-│   └── __init__.py
-│
-├── templates/
-│
-├── storage/
-│   └── __init__.py
-│
-├── tests/
-│   └── __init__.py
-│
-└── docs/
-    └── CONTEXT.md                  # THIS FILE - upload for context refresh
-```
----
-## 🔧 Key Technical Decisions
 | Decision | Choice | Rationale |
 |----------|--------|-----------|
-| Viewports | Fixed 1440px + 375px | Simplicity, covers main use cases |
-| Scrolling | Bottom + network idle | Captures lazy-loaded content |
-| Infinite scroll | Skip | Avoid complexity |
-| Modals | Manual trigger | User decides what to capture |
-| Color ramps | 5-10 shades, AA compliant | Industry standard |
-| Type scales | 3 options (1.25, 1.333, 1.414) | User selects |
-| Spacing | 8px base system | Modern standard |
-| ML models | Minimal, rule-based preferred | Simplicity, reliability |
-| Versioning | HF Spaces persistent storage | Built-in, free |
-| Preview | Gradio + iframe (best for dynamic) | Smooth updates |
 ---
-## 📊 Token Schema (Core Data Structures)
-```python
-class TokenSource(Enum):
-    DETECTED = "detected"       # Directly found in CSS
-    INFERRED = "inferred"       # Derived from patterns
-    UPGRADED = "upgraded"       # User-selected improvement
-class Confidence(Enum):
-    HIGH = "high"               # 10+ occurrences
-    MEDIUM = "medium"           # 3-9 occurrences
-    LOW = "low"                 # 1-2 occurrences
-class Viewport(Enum):
-    DESKTOP = "desktop"         # 1440px
-    MOBILE = "mobile"           # 375px
-```
-### Token Types:
-- **ColorToken:** value, frequency, contexts, elements, contrast ratios
-- **TypographyToken:** family, size, weight, line-height, elements
-- **SpacingToken:** value, frequency, contexts, fits_base_8
-- **RadiusToken:** value, frequency, elements
-- **ShadowToken:** value, frequency, elements
----
-## 🔄 LangGraph Workflow
-```
-                    ┌─────────────┐
-                    │   START     │
-                    └──────┬──────┘
-                           │
-                           ▼
-                    ┌─────────────┐
-                    │ URL Input   │
-                    └──────┬──────┘
-                           │
-                           ▼
-              ┌────────────────────────┐
-              │  Agent 1: Discover     │
-              │  (find pages)          │
-              └───────────┬────────────┘
-                          │
-                          ▼
-              ┌────────────────────────┐
-              │  HUMAN: Confirm pages  │◄─── Checkpoint 1
-              └───────────┬────────────┘
-                          │
-                          ▼
-              ┌────────────────────────┐
-              │  Agent 1: Extract      │
-              │  (crawl & extract)     │
-              └───────────┬────────────┘
-                          │
-                          ▼
-              ┌────────────────────────┐
-              │  Agent 2: Normalize    │
-              └───────────┬────────────┘
-                          │
-                          ▼
-              ┌────────────────────────┐
-              │  HUMAN: Review tokens  │◄─── Checkpoint 2 (Stage 1 UI)
-              └───────────┬────────────┘
-                          │
-          ┌───────────────┴───────────────┐
-          │                               │
-          ▼                               ▼
-┌──────────────────┐            ┌──────────────────┐
-│ Agent 3: Advise  │            │  (parallel)      │
-│ (best practices) │            │                  │
-└────────┬─────────┘            └──────────────────┘
-         │
-         ▼
-┌────────────────────────┐
-│  HUMAN: Select options │◄─── Checkpoint 3 (Stage 2 UI)
-└───────────┬────────────┘
-            │
-            ▼
-┌────────────────────────┐
-│  Agent 4: Generate     │
-│  (final JSON)          │
-└───────────┬────────────┘
-            │
-            ▼
-┌────────────────────────┐
-│  HUMAN: Export         │◄─── Checkpoint 4 (Stage 3 UI)
-└───────────┬────────────┘
-            │
-            ▼
-      ┌─────────┐
-      │   END   │
-      └─────────┘
 ```
----
-## 🚦 Human-in-the-Loop Rules
-1. **No irreversible automation**
-2. **Agents propose → Humans decide**
-3. **Every auto action must be:**
-   - Visible
-   - Reversible
-   - Previewed
----
-## 📦 Output JSON Format
-```json
-{
-  "metadata": {
-    "source_url": "https://example.com",
-    "extracted_at": "2025-01-23T10:00:00Z",
-    "version": "v1-recovered",
-    "viewport": "desktop"
-  },
-  "colors": {
-    "primary": {
-      "50": { "value": "#e6f2ff", "source": "upgraded" },
-      "500": { "value": "#007bff", "source": "detected" },
-      "900": { "value": "#001a33", "source": "upgraded" }
-    }
-  },
-  "typography": {
-    "heading-xl": {
-      "fontFamily": "Inter",
-      "fontSize": "32px",
-      "fontWeight": 700,
-      "lineHeight": "1.2",
-      "source": "detected"
-    }
-  },
-  "spacing": {
-    "xs": { "value": "4px", "source": "upgraded" },
-    "sm": { "value": "8px", "source": "detected" },
-    "md": { "value": "16px", "source": "detected" }
-  }
-}
 ```
----
-## 🛠️ Implementation Phases & Current Status
-### Phase 1 ✅ COMPLETE
-- [x] Project structure
-- [x] Configuration files
-- [x] Token schema (Pydantic models)
-- [x] Agent 1: Crawler (page discovery)
-- [x] Agent 1: Enhanced Extractor (5-source extraction)
-- [x] Agent 2: Normalizer
-- [x] Stage 1 UI with 5 AS-IS preview tabs
-- [x] LangGraph basic workflow
-- [x] JSON export (flat structure for Figma)
-### Phase 2 ✅ MOSTLY COMPLETE
-- [x] Agent 3: Multi-LLM Advisor (Qwen + Llama + HEAD)
-- [x] Stage 2 UI (Upgrade Playground)
-- [x] Live preview system (typography, color ramps)
-- [x] Enhanced LLM logging with reasoning
-- [ ] Accept/Reject checkbox wiring to export
-### Phase 3 🔄 IN PROGRESS
-- [ ] Agent 4: Generator (component patterns)
-- [ ] Stage 3 UI (diff view)
-- [ ] Arabic page filtering
-### Phase 4 ⏳ PENDING
-- [ ] Full LangGraph orchestration
-- [ ] HF Spaces deployment
-- [ ] Persistent storage
-- [ ] MCP Claude / Figma plugin integration (Part 2 of article)
----
-## 🐛 Known Issues & Pending Fixes
-| Issue | Status | Fix |
-|-------|--------|-----|
-| Arabic pages included | Pending | Filter `/ar/` URLs in crawler |
-| Accept/Reject not wired | Pending | Export should respect checkbox state |
-| Stage 1 vs Stage 2 preview confusion | ✅ Fixed | Stage 1 now shows AS-IS (no ramps) |
-| Colors missed from CSS variables | ✅ Fixed | Enhanced 5-source extraction |
-| JSON nested structure | ✅ Fixed | Flat structure for Figma compatibility |
----
-## 🔑 Environment Variables
-```env
-# Required
-HF_TOKEN=your_huggingface_token
-# Model Configuration (defaults shown — diverse providers)
-AGENT2_MODEL=microsoft/Phi-3.5-mini-instruct      # Microsoft - Fast naming
-AGENT3_MODEL=meta-llama/Llama-3.1-70B-Instruct    # Meta - Strong reasoning
-AGENT4_MODEL=mistralai/Codestral-22B-v0.1         # Mistral - Code/JSON
-# Optional
-DEBUG=true
-LOG_LEVEL=INFO
-```
 ---
-## 📝 Notes for Claude
-When continuing this project:
-1. **Check current phase** in Implementation Phases section
-2. **Review agent personas** in agents.yaml for consistent behavior
-3. **Follow token schema** defined in core/token_schema.py
-4. **Maintain LangGraph state** consistency across agents
-5. **Use Gradio components** from ui/components.py for consistency
-6. **Test with** real websites before deployment
-7. **Enhanced extraction** captures from 5 sources — check logs to verify
-8. **Stage 1 = AS-IS** (no ramps), **Stage 2 = Enhanced** (with ramps)
 ---
-*Last updated: 2025-01-23*

+# Design System Extractor v3.2 — Master Context File
 > **Upload this file to refresh Claude's context when continuing work on this project.**
+**Last Updated:** February 2026
 ---
+## Current Status
+| Component | Status | Version |
+|-----------|--------|---------|
+| Token Extraction (Part 1) | COMPLETE | v3.2 |
+| Color Classification | COMPLETE | v3.1 |
+| DTCG Compliance | COMPLETE | v3.2 |
+| Naming Authority Chain | COMPLETE | v3.2 |
+| Figma Plugin (Visual Spec) | COMPLETE | v7 |
+| Component Generation (Part 2) | RESEARCH DONE | - |
+| Tests | 113 passing | - |
 ---
+## Project Goal
+Build a **semi-automated, human-in-the-loop system** that:
 1. Reverse-engineers a design system from a live website
+2. Classifies colors deterministically by CSS evidence
+3. Audits against industry benchmarks and best practices
+4. Outputs W3C DTCG v1 compliant JSON
+5. Generates Figma Variables, Styles, and Visual Spec pages
+6. (Part 2) Auto-generates Figma components from tokens
+**Philosophy:** AI as copilot, not autopilot. Humans decide, agents propose.
+---
+## Architecture (v3.2)
+```
++--------------------------------------------------+
+|  LAYER 1: EXTRACTION + NORMALIZATION (Free)       |
+|  +- Crawler + 7-Source Extractor (Playwright)     |
+|  +- Normalizer: colors, radius, shadows, typo     |
+|  +- Firecrawl: deep CSS parsing                   |
++--------------------------------------------------+
+|  LAYER 2: CLASSIFICATION + RULE ENGINE (Free)     |
+|  +- Color Classifier (815 lines, deterministic)   |
+|  +- WCAG Contrast Checker (actual FG/BG pairs)    |
+|  +- Type Scale Detection (ratio math)             |
+|  +- Spacing Grid Analysis (GCD math)              |
++--------------------------------------------------+
+|  LAYER 3: 4 AI AGENTS (~$0.003)                   |
+|  +- AURORA   - Brand Advisor        (Qwen 72B)   |
+|  +- ATLAS    - Benchmark Advisor    (Llama 70B)   |
+|  +- SENTINEL - Best Practices Audit (Qwen 72B)   |
+|  +- NEXUS    - Head Synthesizer     (Llama 70B)   |
++--------------------------------------------------+
+|  EXPORT: W3C DTCG v1 Compliant JSON               |
+|  +- $type, $value, $description, $extensions      |
+|  +- Figma Plugin: Variables + Styles + Visual Spec|
++--------------------------------------------------+
+```
+### Naming Authority Chain (v3.2)
+```
+1. Color Classifier (PRIMARY) - deterministic, covers ALL colors
+   +- CSS evidence -> category -> token name
+   +- 100% reproducible, logged with evidence
+2. AURORA LLM (SECONDARY) - semantic role enhancer ONLY
+   +- Can promote "color.blue.500" -> "color.brand.primary"
+   +- CANNOT rename palette colors
+   +- filter_aurora_naming_map() enforces boundary
+3. Normalizer (FALLBACK) - preliminary hue+shade names
+```
+---
+## File Structure
+```
+design-system-extractor-v3/
++-- app.py                          # Main Gradio app (~5000 lines)
++-- CLAUDE.md                       # Project context and architecture
++-- PART2_COMPONENT_GENERATION.md   # Part 2 research + plan
+|
++-- agents/
+|   +-- crawler.py                  # Page discovery
+|   +-- extractor.py                # Playwright 7-source extraction
+|   +-- firecrawl_extractor.py      # Deep CSS parsing
+|   +-- normalizer.py               # Token normalization (~950 lines)
+|   +-- llm_agents.py               # AURORA, ATLAS, SENTINEL, NEXUS
+|   +-- semantic_analyzer.py        # DEPRECATED in v3.2
+|   +-- stage2_graph.py             # DEPRECATED in v3.2
+|
++-- core/
+|   +-- color_classifier.py         # Rule-based classification (815 lines)
+|   +-- color_utils.py              # Color math (hex/RGB/HSL, contrast)
+|   +-- rule_engine.py              # Type scale, WCAG, spacing grid (~1100 lines)
+|   +-- hf_inference.py             # HuggingFace Inference API client
+|   +-- token_schema.py             # Pydantic models
+|
++-- config/
+|   +-- settings.py                 # Configuration
+|
++-- tests/
+|   +-- test_stage1_extraction.py   # 82 deterministic tests
+|   +-- test_agent_evals.py         # 27 LLM agent schema/behavior tests
+|   +-- test_stage2_pipeline.py     # Pipeline integration tests
+|
++-- output_json/
+|   +-- figma-plugin-extracted/
+|       +-- figma-design-token-creator 5/
+|           +-- src/code.js          # Figma plugin (~1200 lines)
+|           +-- src/ui.html          # Plugin UI (~500 lines)
+|
++-- docs/
+    +-- MEDIUM_ARTICLE_EPISODE_6.md  # Medium article
+    +-- LINKEDIN_POST_EPISODE_6.md   # LinkedIn post
+    +-- IMAGE_GUIDE_EPISODE_6.md     # Image specs for article
+    +-- FIGMA_SPECIMEN_IDEAS.md      # Visual spec layout reference
+    +-- CONTEXT.md                   # THIS FILE
+```
+---
+## Model Assignments
+| Agent | Model | Temperature | Role |
+|-------|-------|-------------|------|
+| Rule Engine | None | - | WCAG, type scale, spacing (FREE) |
+| Color Classifier | None | - | CSS evidence -> category (FREE) |
+| AURORA | Qwen/Qwen2.5-72B-Instruct | 0.4 | Brand advisor (SECONDARY) |
+| ATLAS | meta-llama/Llama-3.3-70B-Instruct | 0.25 | Benchmark comparison |
+| SENTINEL | Qwen/Qwen2.5-72B-Instruct | 0.2 | Best practices audit |
+| NEXUS | meta-llama/Llama-3.3-70B-Instruct | 0.3 | Final synthesis |
+**Total cost per analysis:** ~$0.003
+---
+## Key Technical Decisions
 | Decision | Choice | Rationale |
 |----------|--------|-----------|
+| Color naming | Numeric shades (50-900) | Never words (light/dark/base) |
+| Naming authority | Classifier PRIMARY, LLM SECONDARY | One source of truth |
+| Export format | W3C DTCG v1 | Industry standard (Oct 2025) |
+| Token metadata | $extensions (namespaced) | Frequency, confidence, evidence |
+| Radius processing | Parse, deduplicate, sort, name | none/sm/md/lg/xl/2xl/full |
+| Shadow processing | Parse, sort by blur, name | xs/sm/md/lg/xl (always 5 levels) |
+| Accessibility | Actual FG/BG pairs from DOM | Not just color vs white |
+| Figma output | Variables + Styles + Visual Spec | Auto-generated specimen page |
+| LLM role | Advisory only, never naming authority | Deterministic reproducibility |
 ---
+## Execution Status
+### Part 1: Token Extraction + Analysis (COMPLETE)
 ```
+PHASE 1: NORMALIZER       [DONE]
+PHASE 2: STAGE 2 AGENTS   [DONE]
+PHASE 3: EXPORT + DTCG    [DONE]
+PHASE 4: EXTRACTION IMPROVEMENTS [NOT STARTED]
+  4a. Font family detection (still returns "sans-serif")
+  4b. Rule engine: radius grid analysis
+  4c. Rule engine: shadow elevation analysis
 ```
+### Part 2: Component Generation (RESEARCH COMPLETE)
+**Decision:** Custom Figma Plugin (Option A)
+**Scope:** 5 MVP components, ~86 variants, ~1400 lines new plugin code
+**See:** `PART2_COMPONENT_GENERATION.md` for full details
 ---
+## GitHub
+- **Repository:** https://github.com/hiriazmo/design-system-extractor-v3
+- **Latest commit:** `6b43e51` (DTCG compliance + naming authority)
+- **Tests:** 113 passing
 ---
+*Last updated: 2026-02-23*

docs/IMAGE_GUIDE_EPISODE_6.md CHANGED Viewed

@@ -1,188 +1,252 @@
-# 📸 Image Guide for Episode 6 Article
-## Required Images (8-10 total)
 ### 1. Hero Image
 **What:** Screenshot of the Gradio interface showing the full pipeline output
 **Where:** After title, before first section
 **Specs:** 1200x630px (LinkedIn preview size)
-**Content:** Show the Visual Previews section with colors, typography, and NEXUS synthesis visible
 ### 2. Complete Workflow Diagram
-**What:** The 8-step pipeline: Website → Agents → Figma → Compare
 **Where:** After "The Complete Workflow" section
 **Specs:** 1200x800px
 **Content:**
 ```
-🌐 Website URL
-     ↓
-🤖 AI Agents (7-source extraction)
-     ↓
-📄 AS-IS JSON
-     ↓
-🔌 Figma Plugin (Import)
-     ↓
-📋 AS-IS Specimen (Review)
-     ↓
-🧠 Rule Engine + 4 AI Agents (Stage 2)
-     ↓
-☑️ Accept/Reject (Human Decision)
-     ↓
-📄 TO-BE JSON → 🔌 Figma → 📋 TO-BE Specimen
-```
-### 3. Two-Layer Architecture Diagram
-**What:** Layer 1 (Deterministic, Free) + Layer 2 (4 Named Agents)
 **Where:** After "Architecture Overview" section
-**Specs:** 1200x600px
 **Content:**
 ```
-┌─────────────────────────────────────────────────┐
-│  LAYER 1: DETERMINISTIC (Free — $0.00)          │
-│  ├─ Crawler + 7-Source Extractor + Normalizer   │
-│  ├─ Semantic Color Analyzer (rule-based)        │
-│  ├─ WCAG Contrast Checker (math)                │
-│  ├─ Type Scale Detection (ratio math)           │
-│  ├─ Spacing Grid Analysis (GCD math)            │
-│  └─ Color Statistics (deduplication)             │
-├─────────────────────────────────────────────────┤
-│  LAYER 2: 4 AI AGENTS (~$0.003)                 │
-│  ├─ AURORA   — Brand Color Analyst   (Qwen 72B) │
-│  ├─ ATLAS    — Benchmark Advisor   (Llama 70B)  │
-│  ├─ SENTINEL — Best Practices Auditor (Qwen 72B)│
-│  └─ NEXUS    — Head Synthesizer    (Llama 70B)  │
-└─────────────────────────────────────────────────┘
-```
-### 4. Agent Pipeline Flow
-**What:** Show the 4 named agents with their flow: parallel analysis → synthesis
-**Where:** After "Layer 2" section header
 **Specs:** 1200x500px
 **Content:**
 ```
-   Rule Engine Results
-         │
-    ┌────┼────────────────┐
-    ↓    ↓                ↓
-┌──────┐ ┌──────┐ ┌────────┐
-│AURORA│ │ATLAS │ │SENTINEL│
-│Brand │ │Bench │ │Audit   │
-│Qwen  │ │Llama │ │Qwen    │
-└──┬───┘ └──┬───┘ └───┬────┘
-   └────────┼──────────┘
-            ↓
-      ┌──────────┐
-      │  NEXUS   │
-      │Synthesis │
-      │ Llama 70B│
-      └──────────┘
-            ↓
     Final Recommendations
 ```
-### 5. 7 Extraction Sources Visual
 **What:** Show the 7 different methods of extraction
-**Where:** After "Stage 1: Extraction" section
 **Specs:** 1000x600px
 **Content:**
 ```
-┌─────────────┐  ┌─────────────┐  ┌─────────────┐
-│ 1. Computed │  │ 2. CSS      │  │ 3. Inline   │
-│    Styles   │  │    Variables│  │    Styles   │
-└─────────────┘  └─────────────┘  └─────────────┘
-┌─────────────┐  ┌─────────────┐  ┌─────────────┐
-│ 4. SVG      │  │ 5. External │  │ 6. Style    │
-│    Attrs    │  │    CSS Files│  │    Blocks   │
-└─────────────┘  └─────────────┘  └─────────────┘
-┌─────────────────────────────────────────────────┐
-│              7. Firecrawl Deep Parser            │
-└─────────────────────────────────────────────────┘
 ```
-### 6. Rule Engine Output Screenshot
-**What:** Screenshot of actual rule engine output in the Gradio logs panel
 **Where:** After "The Rule Engine" section
 **Specs:** 1200x600px
-**Content:** Show the actual emoji-formatted output:
-- 📐 TYPE SCALE ANALYSIS
-- ♿ ACCESSIBILITY CHECK
-- 📏 SPACING GRID
-- 📊 CONSISTENCY SCORE
-### 7. NEXUS Synthesis Output
-**What:** Screenshot of the final synthesis with scores, top 3 actions, color recommendations
 **Where:** After "Agent 4: NEXUS" section
 **Specs:** 1200x700px
-**Content:** Show the final output with:
 - Executive summary
-- Scores dashboard (overall, accessibility, consistency, organization)
 - Top 3 actions with impact/effort
-- Color recommendations with accept/reject checkboxes
-### 8. Benchmark Comparison Table
-**What:** Screenshot of the benchmark comparison showing match percentages
-**Where:** After "Agent 2: ATLAS" section
-**Specs:** 1000x400px
 **Content:** Show:
-- 🥇 Polaris: 87% match
-- 🥈 Material 3: 77% match
-- 🥉 Atlassian: 76% match
-### 9. Before/After Comparison
 **What:** Side-by-side showing AS-IS vs TO-BE
 **Where:** After "Comparing AS-IS vs TO-BE" section
 **Specs:** 1200x500px
 **Content:**
 ```
 AS-IS                          TO-BE
-─────                          ─────
-Type: ~1.18 (random)    →     1.25 (Major Third)
-Brand: #06b2c4 (AA: 3.2) →   #048391 (AA: 4.5)
-Spacing: Mixed           →    8px grid
-Colors: 143 unique       →    ~20 semantic
-Score: 52/100            →    78/100
 ```
-### 10. Cost Comparison Table
-**What:** Visual table comparing V1 vs V2 costs + model assignments
 **Where:** After "Cost & Model Strategy" section
 **Specs:** 1000x400px
 **Content:**
 ```
-Agent       Model        Cost
-────────────────────────────
-Rule Engine  None         $0.00
-AURORA       Qwen 72B     ~Free (HF PRO)
-ATLAS        Llama 70B    ~Free (HF PRO)
-SENTINEL     Qwen 72B     ~Free (HF PRO)
-NEXUS        Llama 70B    ~$0.001
-─────────────────────────────
-TOTAL                     ~$0.003
 ```
-### 11. Figma Specimen (If Available)
-**What:** Screenshot of the Figma specimen page after JSON import
-**Where:** After "The Figma Bridge" section
-**Specs:** 1200x700px
-**Content:** Show Typography + Semantic Colors + Spacing display
 ---
 ## Image Creation Tools
 **Recommended:**
-1. **Figma** — Architecture diagrams, pipeline flows, tech stack
-2. **Screenshot tool** — Gradio interface captures (use dark mode)
-3. **Excalidraw** — Quick hand-drawn style diagrams (for the architecture)
 **Tips:**
 - Use dark background screenshots (Gradio dark mode)
 - Add subtle drop shadows to screenshots
-- Keep consistent color scheme (blues + cyans match brand color #06b2c4)
 - Use the agent names (AURORA, ATLAS, SENTINEL, NEXUS) in diagram labels
-- Color-code: Layer 1 = green (free), Layer 2 = blue (AI)
 ---
@@ -191,15 +255,17 @@ TOTAL                     ~$0.003
 ```
 episode6-hero-dashboard.png
 episode6-workflow-8steps.png
-episode6-architecture-2layers.png
 episode6-agent-pipeline.png
 episode6-extraction-7sources.png
 episode6-rule-engine-output.png
 episode6-nexus-synthesis.png
-episode6-benchmark-comparison.png
 episode6-before-after.png
-episode6-cost-table.png
-episode6-figma-specimen.png
 ```
 ---
@@ -215,3 +281,6 @@ Before taking screenshots:
 - [ ] Set consistent window size (1440px wide)
 - [ ] Run a real analysis so outputs are populated
 - [ ] Ensure agent names (AURORA, ATLAS, etc.) are visible in logs

+# Image Guide for Episode 6 Article (v3.2)
+## Required Images (10-12 total)
 ### 1. Hero Image
 **What:** Screenshot of the Gradio interface showing the full pipeline output
 **Where:** After title, before first section
 **Specs:** 1200x630px (LinkedIn preview size)
+**Content:** Show the Visual Spec page in Figma with colors, typography, and agent synthesis visible
 ### 2. Complete Workflow Diagram
+**What:** The 8-step pipeline: Website -> Agents -> Figma -> Compare
 **Where:** After "The Complete Workflow" section
 **Specs:** 1200x800px
 **Content:**
 ```
+Website URL
+     |
+7-Source Extraction (Playwright + Firecrawl)
+     |
+Normalizer (radius, shadows, colors)
+     |
+Color Classifier (deterministic)
+     |
+Rule Engine (WCAG, type scale, spacing)
+     |
+DTCG JSON (AS-IS)
+     |
+Figma Plugin -> Variables + Visual Spec
+     |
+4 AI Agents (AURORA, ATLAS, SENTINEL, NEXUS)
+     |
+Accept/Reject -> DTCG JSON (TO-BE)
+     |
+Figma Plugin -> Compare AS-IS vs TO-BE
+```
+### 3. Three-Layer Architecture Diagram
+**What:** Layer 1 (Extraction) + Layer 2 (Classification + Rules) + Layer 3 (4 Agents)
 **Where:** After "Architecture Overview" section
+**Specs:** 1200x700px
 **Content:**
 ```
++--------------------------------------------------+
+|  LAYER 1: EXTRACTION + NORMALIZATION (Free)       |
+|  +- 7-Source Extractor + Normalizer               |
+|  +- Radius/Shadow/Color normalization             |
++--------------------------------------------------+
+|  LAYER 2: CLASSIFICATION + RULE ENGINE (Free)     |
+|  +- Color Classifier (815 lines, deterministic)   |
+|  +- WCAG + Type Scale + Spacing Grid              |
++--------------------------------------------------+
+|  LAYER 3: 4 AI AGENTS (~$0.003)                   |
+|  +- AURORA -> ATLAS -> SENTINEL -> NEXUS          |
++--------------------------------------------------+
+```
+### 4. Naming Authority Chain (NEW - V3 Key Innovation)
+**What:** Diagram showing the V2 chaos vs V3 clean authority
+**Where:** After "The Naming Authority Chain" section
 **Specs:** 1200x500px
 **Content:**
 ```
+V2 (BROKEN):                         V3 (FIXED):
++----------+                         +------------------+
+|Normalizer| -> "blue.light"         |Color Classifier  | -> PRIMARY
++----------+                         |  (deterministic) |
++----------+                         +------------------+
+| Export   | -> "blue.500"                    |
++----------+                         +------------------+
++----------+                         |AURORA (advisory) | -> SECONDARY
+| AURORA   | -> "brand.primary"      |  roles only      |
++----------+                         +------------------+
+                                              |
+    = CHAOS in Figma                 +------------------+
+                                     |Normalizer        | -> FALLBACK
+                                     +------------------+
+                                         = CLEAN output
+```
+### 5. Agent Pipeline Flow
+**What:** Show the 4 named agents with their flow: parallel analysis -> synthesis
+**Where:** After "Layer 3" section header
+**Specs:** 1200x500px
+**Content:**
+```
+   Rule Engine + Classifier Results
+         |
+    +----+----------------+
+    v    v                v
++------+ +------+ +--------+
+|AURORA| |ATLAS | |SENTINEL|
+|Brand | |Bench | |Audit   |
+|Qwen  | |Llama | |Qwen    |
++--+---+ +--+---+ +---+----+
+   +--------+----------+
+            v
+      +----------+
+      |  NEXUS   |
+      |Synthesis |
+      | Llama 70B|
+      +----------+
+            v
     Final Recommendations
 ```
+### 6. 7 Extraction Sources Visual
 **What:** Show the 7 different methods of extraction
+**Where:** After "Extraction: 7 Sources" section
 **Specs:** 1000x600px
 **Content:**
 ```
++-------------+  +-------------+  +-------------+
+| 1. Computed |  | 2. CSS      |  | 3. Inline   |
+|    Styles   |  |    Variables|  |    Styles   |
++-------------+  +-------------+  +-------------+
++-------------+  +-------------+  +-------------+
+| 4. SVG      |  | 5. External |  | 6. Style    |
+|    Attrs    |  |    CSS Files|  |    Blocks   |
++-------------+  +-------------+  +-------------+
++-------------------------------------------------+
+|              7. Firecrawl Deep Parser            |
++-------------------------------------------------+
 ```
+### 7. Color Classifier Output (NEW)
+**What:** Show the classifier's evidence-based categorization
+**Where:** After "The Color Classifier" section
+**Specs:** 1200x600px
+**Content:**
+```
+[CLASSIFY] #06b2c4 -> BRAND
+  Evidence: background-color on <button> (freq=33)
+[CLASSIFY] #373737 -> TEXT
+  Evidence: color on <p> (freq=120)
+[CLASSIFY] #ffffff -> BG
+  Evidence: background-color on <body> (freq=1)
+[DEDUP] #1a1a1a merged with #1b1b1b (dist=1.7)
+Category Caps: brand(3) text(3) bg(3) border(3) feedback(4) palette(rest)
+```
+### 8. Rule Engine Output
+**What:** Screenshot of actual rule engine output
 **Where:** After "The Rule Engine" section
 **Specs:** 1200x600px
+**Content:** Show the emoji-formatted output:
+- TYPE SCALE ANALYSIS (ratio, variance, recommendation)
+- ACCESSIBILITY CHECK (actual pairs, not just vs white)
+- SPACING GRID (GCD, alignment %)
+- CONSISTENCY SCORE
+### 9. NEXUS Synthesis Output
+**What:** Screenshot of the final synthesis with scores and top 3 actions
 **Where:** After "Agent 4: NEXUS" section
 **Specs:** 1200x700px
+**Content:** Show final output with:
 - Executive summary
+- Scores (overall, accessibility, consistency, organization)
 - Top 3 actions with impact/effort
+- Color recommendations with accept/reject
+### 10. DTCG JSON Example (NEW)
+**What:** Code block showing the W3C DTCG format with $extensions
+**Where:** After "W3C DTCG v1 Compliance" section
+**Specs:** 1000x500px
 **Content:** Show:
+```json
+{
+  "color": {
+    "brand": {
+      "primary": {
+        "$type": "color",
+        "$value": "#005aa3",
+        "$extensions": {
+          "com.design-system-extractor": {
+            "frequency": 47,
+            "confidence": "high"
+          }
+        }
+      }
+    }
+  }
+}
+```
+### 11. Figma Visual Spec Page (NEW)
+**What:** Screenshot of the auto-generated visual spec in Figma
+**Where:** After "The Custom Figma Plugin" section
+**Specs:** 1200x700px
+**Content:** Show:
+- Typography frame (Desktop + Mobile) with font metadata
+- Color frame organized by semantic role (brand/text/bg/border/feedback)
+- AA compliance badges on each swatch
+- Radius display, Spacing scale, Shadow elevation
+### 12. Before/After Comparison
 **What:** Side-by-side showing AS-IS vs TO-BE
 **Where:** After "Comparing AS-IS vs TO-BE" section
 **Specs:** 1200x500px
 **Content:**
 ```
 AS-IS                          TO-BE
+-----                          -----
+Type: ~1.18 (random)    ->    1.25 (Major Third)
+Brand: #06b2c4 (AA: 3.2) ->  #048391 (AA: 4.5)
+Spacing: Mixed           ->   8px grid
+Colors: 143 unique       ->   ~20 semantic
+Radius: raw CSS          ->   none/sm/md/lg/xl/full
+Shadows: unsorted        ->   xs/sm/md/lg/xl
+Score: 52/100            ->   78/100
 ```
+### 13. V1 vs V2 vs V3 Evolution (NEW)
+**What:** Table showing the version progression
 **Where:** After "Cost & Model Strategy" section
 **Specs:** 1000x400px
 **Content:**
 ```
+Version    Cost       Naming         LLM Role        Output
+-------    -------    ----------     ----------      --------
+V1         $0.50      LLM decides    Everything      Unreliable
+V2         $0.003     3 systems      Split w/ rules  Naming chaos
+V3         $0.003     1 authority    Advisory only   Clean DTCG
 ```
 ---
 ## Image Creation Tools
 **Recommended:**
+1. **Figma** - Architecture diagrams, pipeline flows, tech stack
+2. **Screenshot tool** - Gradio interface captures (use dark mode)
+3. **Excalidraw** - Quick hand-drawn style diagrams
 **Tips:**
 - Use dark background screenshots (Gradio dark mode)
 - Add subtle drop shadows to screenshots
+- Keep consistent color scheme (blues match brand)
 - Use the agent names (AURORA, ATLAS, SENTINEL, NEXUS) in diagram labels
+- Color-code: Layer 1 = green (free), Layer 2 = blue (rules), Layer 3 = purple (AI)
+- NEW: Include W3C DTCG logo/badge where format is mentioned
+- NEW: Show the naming authority chain prominently - it's the V3 key story
 ---
 ```
 episode6-hero-dashboard.png
 episode6-workflow-8steps.png
+episode6-architecture-3layers.png
+episode6-naming-authority.png
 episode6-agent-pipeline.png
 episode6-extraction-7sources.png
+episode6-color-classifier.png
 episode6-rule-engine-output.png
 episode6-nexus-synthesis.png
+episode6-dtcg-json.png
+episode6-figma-visual-spec.png
 episode6-before-after.png
+episode6-v1-v2-v3-evolution.png
 ```
 ---
 - [ ] Set consistent window size (1440px wide)
 - [ ] Run a real analysis so outputs are populated
 - [ ] Ensure agent names (AURORA, ATLAS, etc.) are visible in logs
+- [ ] Ensure color classifier evidence logs are visible
+- [ ] Capture the Figma visual spec page with AA badges
+- [ ] Show DTCG format in JSON export preview

docs/LINKEDIN_POST_EPISODE_6.md CHANGED Viewed

@@ -1,4 +1,4 @@
-# LinkedIn Post - Episode 6: Design System Extractor
 ## Main Post (Copy-Paste Ready)
@@ -6,145 +6,162 @@
 Every designer has done this: Open DevTools. Inspect element. Copy hex code. Paste to spreadsheet. Recreate in Figma. Repeat 200 times.
-I spent 3–5 days manually extracting design tokens from websites. Then more time recreating them in Figma as variables.
-So I built a semi-automated workflow with 4 named AI agents + a free rule engine 👇
-**The Architecture:**
-Layer 1 (FREE — $0.00, <1 second):
-🔢 Rule Engine — WCAG contrast checker (pure math)
-🔢 Type scale detection + spacing grid analysis
-🔢 Color deduplication + statistics
-Layer 2 (~$0.003, 4 specialized agents):
-🎨 AURORA — identifies brand colors from usage context (Qwen 72B)
-📊 ATLAS — benchmarks against 8 industry design systems (Llama 70B)
-✅ SENTINEL — prioritizes fixes by business impact (Qwen 72B)
-🧠 NEXUS — synthesizes everything, resolves contradictions (Llama 70B)
-**The Complete Pipeline:**
-🌐 Website URL → 🤖 AI Agents → 📄 AS-IS JSON → 🔌 Figma Plugin → Variables
-                                      ↓
-                            🧠 AI Analysis (Stage 2)
-                                      ↓
-            ☑️ Accept/Reject → 📄 TO-BE JSON → 🔌 Figma Plugin → Modernized Variables
-**My V1 used LLMs for everything.**
-❌ Cost: $0.50–1.00/run
-❌ LLMs hallucinate math
-**V2 flipped the approach:**
-✅ Deterministic code handles certainty. LLMs handle ambiguity.
-✅ ~100-300x cheaper. More accurate. Always produces output.
-The rule engine does 80% of the work for $0.
-The agents handle the 20% that requires judgment.
 **Real results:**
-• 143 colors extracted (semantically categorized)
-• 220 FG/BG pairs checked for AA compliance
-• Benchmarked against Material 3, Polaris, Atlassian + 5 more
-• Type scale: random → 1.25 Major Third
-• Brand color: AA 3.2 → 4.5 (with my approval)
-• Time: 3–5 days → ~15 minutes
-• Cost: ~$0.003
 The key? **I stay in control.** AI recommends, I decide.
-📄 Full workflow + architecture: [Medium link]
-🚀 Try it: [HuggingFace Space link]
-💻 Code: [GitHub link]
-This is Episode 6 of "AI in My Daily Work."
 What design workflows are you automating?
-#UXDesign #AIEngineering #DesignSystems #Figma #HuggingFace #Accessibility #WCAG #MultiAgent #DesignTokens #BuildInPublic
 ---
 ## First Comment (Post Immediately After)
-🔗 **Resources:**
-📄 **Medium Article:** [link]
-Complete architecture breakdown + Figma integration workflow
-🚀 **Live Demo:** [HuggingFace Space link]
 Try it with any website URL
-💻 **GitHub:** [link]
-Open source — star it if useful!
 ---
-**The 4 Named Agents:**
-🎨 **AURORA** — "33 buttons + 12 CTAs using #06b2c4 = brand primary" (context LLMs understand, rules can't)
-📊 **ATLAS** — "87% aligned to Polaris. Closing the type scale gap takes 1 hour." (trade-off reasoning)
-✅ **SENTINEL** — "67 AA failures. Fix brand primary first — affects 40% of interactions." (impact prioritization)
-🧠 **NEXUS** — Synthesizes all 3 agents + rule engine → executive summary + top 3 actions
 ---
 **Previous Episodes:**
-• Episode 5: UX Friction Analysis (7 agents + Databricks)
-• Episode 4: UI Regression Testing
-• Episode 3: Review Intelligence System
-What should I build for Episode 7? Drop ideas below 👇
 ---
-## Alternative Version (Story-Driven)
 ---
 "Can you audit their design system and document it in Figma?"
-A 3–5 day task. I've done it dozens of times.
-DevTools → Inspect → Copy hex → Spreadsheet → Figma Variables → Repeat
-This time I built something different:
-A semi-automated workflow where:
-🔢 A free rule engine checks WCAG, type scale, spacing (pure math — $0)
-🎨 AURORA identifies brand colors from 143 extracted colors
-📊 ATLAS benchmarks against 8 industry design systems
-✅ SENTINEL prioritizes fixes by business impact
-🧠 NEXUS synthesizes everything into a final action plan
-🔌 A Figma plugin imports the JSON directly as variables
-The difference? **I stay in control.**
-AI doesn't auto-apply changes. It recommends:
-"Brand primary #06b2c4 fails AA (3.2:1). Suggest #048391 (4.5:1)."
-I decide if that's right for the brand.
-15 minutes. $0.003. Full design system documented and in Figma.
-📄 How I built it: [Medium link]
-🚀 Demo: [HuggingFace link]
 Episode 6 of "AI in My Daily Work"
-#DesignSystems #AIAgents #UXDesign #Figma #Automation #HuggingFace #WCAG
 ---
 ## Image Suggestions
-1. **Hero:** Architecture diagram (Layer 1 deterministic + Layer 2 four named agents)
-2. **Before/After:** AS-IS specimen vs TO-BE specimen in Figma
-3. **Agent Output:** Screenshot of NEXUS synthesis with scores
-4. **Figma Specimen:** Typography + Semantic Colors display
 ---
@@ -154,21 +171,22 @@ Primary (always include):
 #UXDesign #AIEngineering #DesignSystems #Figma #HuggingFace
 Secondary (mix based on audience):
-#DesignTokens #MultiAgent #Accessibility #WCAG #BuildInPublic #Automation #LLM
 ---
 ## Posting Strategy
-**Best time:** Tuesday–Thursday, 8–10 AM your timezone
-**Key messages:**
-1. Free rule engine does 80% of the work (cost optimization story)
-2. 4 named agents with specific roles (not generic "LLM 1, LLM 2")
-3. Semi-automation with human control (not full automation)
-4. The Figma integration + specimen view sets this apart
 **Differentiation from Episode 5:**
 - Episode 5 = UX friction analysis (GA4 + Clarity + Databricks)
-- Episode 6 = Design system extraction (Playwright + Figma + HuggingFace)
 - Same philosophy: deterministic code for certainty, LLMs for ambiguity

+# LinkedIn Post - Episode 6: Design System Extractor v3.2
 ## Main Post (Copy-Paste Ready)
 Every designer has done this: Open DevTools. Inspect element. Copy hex code. Paste to spreadsheet. Recreate in Figma. Repeat 200 times.
+I spent 3-5 days manually extracting design tokens from websites. Then more time recreating them in Figma as variables.
+So I built a 3-layer system: deterministic extraction + rule-based color classifier + 4 AI agents.
+**The Architecture (v3.2):**
+Layer 1 (FREE, <1 second):
+- 7-source extraction (Playwright + Firecrawl)
+- Normalizer: radius, shadows, colors all cleaned and named
+- Color Classifier (815 lines, deterministic): CSS evidence -> category -> token name
+Layer 2 (FREE, <1 second):
+- Rule Engine: WCAG contrast (actual FG/BG pairs), type scale detection, spacing grid
+- 113 tests passing, 100% reproducible
+Layer 3 (~$0.003, 4 specialized agents):
+- AURORA: brand color advisor (Qwen 72B) — advisory only, can't override classifier
+- ATLAS: benchmarks against 8 industry design systems (Llama 70B)
+- SENTINEL: prioritizes fixes by business impact (Qwen 72B)
+- NEXUS: synthesizes everything, resolves contradictions (Llama 70B)
+**The Pipeline:**
+Website URL -> 7-Source Extraction -> Color Classifier -> Rule Engine -> DTCG JSON
+-> Figma Plugin -> Variables + Styles + Auto-Generated Visual Spec Page
+-> AI Analysis -> Accept/Reject -> TO-BE JSON -> Compare in Figma
+**My biggest lesson building V1 -> V2 -> V3:**
+V1: LLMs for everything. $0.50/run. Hallucinated contrast ratios.
+V2: Rules + LLM split. $0.003/run. But 3 naming systems fighting in exports.
+V3: Rules + Classifier + Advisory LLM. $0.003/run. ONE naming authority. Clean output.
+The fix wasn't better AI. It was a clear authority chain:
+1. Color Classifier (PRIMARY) - deterministic, covers ALL colors
+2. AURORA LLM (SECONDARY) - can only suggest semantic roles
+3. Normalizer (FALLBACK) - hue + numeric shade
 **Real results:**
+- 143 colors extracted, classified, and named (deterministically)
+- 220 FG/BG pairs checked for AA compliance
+- Radius: raw CSS garbage -> none/sm/md/lg/xl/full (normalized)
+- Shadows: unsorted -> xs/sm/md/lg/xl (5 progressive levels)
+- Benchmarked against Material 3, Polaris, Atlassian + 5 more
+- Output: W3C DTCG v1 compliant JSON with $extensions metadata
+- Figma: auto-generated visual spec with AA badges
+- Time: 3-5 days -> ~15 minutes
+- Cost: ~$0.003
 The key? **I stay in control.** AI recommends, I decide.
+Full workflow + architecture: [Medium link]
+Try it: [HuggingFace Space link]
+Code: [GitHub link]
+Episode 6 of "AI in My Daily Work"
 What design workflows are you automating?
+#UXDesign #AIEngineering #DesignSystems #Figma #HuggingFace #Accessibility #WCAG #DesignTokens #W3CDTCG #BuildInPublic
 ---
 ## First Comment (Post Immediately After)
+**Resources:**
+Medium Article: [link]
+Complete architecture breakdown + V1 -> V2 -> V3 evolution + Figma integration
+Live Demo: [HuggingFace Space link]
 Try it with any website URL
+GitHub: [link]
+Open source - star it if useful!
 ---
+**The Naming Authority Problem (V3's key insight):**
+V2 had THREE competing systems naming colors:
+- Normalizer: "color.blue.light" (word-based)
+- Export layer: "color.blue.500" (numeric)
+- AURORA LLM: "brand.primary" (whatever it wanted)
+Result in Figma: blue.300, blue.dark, blue.light, blue.base — ALL in the same export.
+V3 fix: ONE authority. Color Classifier (deterministic) is PRIMARY. AURORA is advisory only — it can suggest "this blue should be brand.primary" but can't rename palette colors.
+`filter_aurora_naming_map()` enforces the boundary. Clean Figma output, every time.
+---
+**What's Next — Episode 7: Automated Component Generation**
+Researched 30+ tools. Found a genuine market gap:
+No production tool takes DTCG JSON and outputs Figma components with variants.
+Building it. Button (60 variants), TextInput, Card, Toast, Checkbox/Radio.
+Figma Plugin API supports everything: createComponent(), combineAsVariants(), setBoundVariable().
+Same tokens in = same components out. Fully deterministic.
 ---
 **Previous Episodes:**
+- Episode 5: UX Friction Analysis (7 agents + Databricks)
+- Episode 4: UI Regression Testing
+- Episode 3: Review Intelligence System
+What should I build for Episode 7? Drop ideas below
 ---
+## Alternative Version (Shorter, Story-Driven)
 ---
 "Can you audit their design system and document it in Figma?"
+3-5 days of DevTools, spreadsheets, and manual Figma work.
+I built something different. Three versions, actually.
+V1: Used LLMs for everything. $0.50/run. They hallucinate math.
+V2: Split into rules + AI. $0.003/run. But three systems fought over color names. Figma output was chaos.
+V3: Clear authority chain. One color classifier (deterministic, 815 lines). LLMs are advisory only. W3C DTCG-compliant JSON. Auto-generated visual spec in Figma.
+What it does now:
+- 7-source extraction from any website
+- Rule-based color classification (brand/text/bg/border/feedback)
+- WCAG AA check on 220 actual FG/BG pairs
+- 4 AI agents for brand analysis, benchmarking, audit, synthesis
+- W3C standard JSON output
+- Figma plugin: variables + styles + visual spec page
+15 minutes. $0.003. I stay in control.
+Full architecture: [Medium link]
+Demo: [HuggingFace link]
 Episode 6 of "AI in My Daily Work"
+#DesignSystems #AIAgents #UXDesign #Figma #Automation #HuggingFace #WCAG #W3CDTCG
 ---
 ## Image Suggestions
+1. **Hero:** V1 vs V2 vs V3 comparison table showing the evolution
+2. **Architecture:** 3-layer diagram (Extraction -> Classification+Rules -> 4 Agents)
+3. **Naming Authority:** Before/after showing Figma chaos vs clean output
+4. **Figma Visual Spec:** Screenshot of auto-generated spec page
+5. **Agent Output:** NEXUS synthesis with scores + top 3 actions
 ---
 #UXDesign #AIEngineering #DesignSystems #Figma #HuggingFace
 Secondary (mix based on audience):
+#DesignTokens #W3CDTCG #Accessibility #WCAG #BuildInPublic #Automation #MultiAgent
 ---
 ## Posting Strategy
+**Best time:** Tuesday-Thursday, 8-10 AM your timezone
+**Key messages for V3:**
+1. V1 -> V2 -> V3 evolution story (naming authority problem)
+2. Color Classifier (815 lines, deterministic) as key innovation
+3. W3C DTCG v1 compliance — standards over proprietary formats
+4. Figma visual spec auto-generation
+5. Component generation gap (Episode 7 teaser)
 **Differentiation from Episode 5:**
 - Episode 5 = UX friction analysis (GA4 + Clarity + Databricks)
+- Episode 6 = Design system extraction (Playwright + Classifier + Figma + HuggingFace)
 - Same philosophy: deterministic code for certainty, LLMs for ambiguity

docs/MEDIUM_ARTICLE_EPISODE_6.md CHANGED Viewed

@@ -1,10 +1,10 @@
-# 🚅 AI in My Daily Work — Episode 6: Reverse-Engineering Design Systems with 4 AI Agents + a Free Rule Engine
-## A Semi-Automated Workflow: From Website URL to Figma-Ready Design System
-*How I built a system that extracts any website's design tokens and audits them like a senior design team — for ~$0.003 per run.*
-[IMAGE: Hero - Complete workflow showing Website → AI Agents → Figma]
 ---
@@ -22,23 +22,25 @@ Whether it's analyzing a competitor, inheriting a legacy project, or bringing co
 6. Repeat for spacing, shadows, border radius...
 7. Spend days organizing into a coherent system
 8. Manually recreate in Figma as variables
-I've done this dozens of times. It takes **3–5 days** for a single website. And by the time you're done, something has already changed.
 I wanted a system that could think like a design team:
-- a **data engineer** validating extraction quality
 - an **analyst** identifying brand colors and patterns
 - a **senior reviewer** benchmarking against industry standards
 - and a **chief architect** synthesizing everything into action
-So I built one.
 ---
 ## The Solution (In One Sentence)
-I built a 4-agent system backed by a free rule engine that acts like an entire design audit team: data extraction + WCAG compliance + benchmark comparison + brand analysis + prioritized recommendations. It runs on HuggingFace Spaces, costs ~$0.003 per analysis, and feeds directly into Figma via a custom plugin.
 ---
@@ -49,292 +51,373 @@ I built a 4-agent system backed by a free rule engine that acts like an entire d
 Here's the end-to-end process I now use:
 ```
-┌──────────────────────────────────────────────────────────────┐
-│                    MY DESIGN SYSTEM WORKFLOW                    │
-├──────────────────────────────────────────────────────────────┤
-│                                                                │
-│  STEP 1: Extract AS-IS (AI Agent App)                         │
-│  ──────────────────────────────────────                       │
-│  • Enter website URL                                          │
-│  • AI auto-discovers pages                                    │
-│  • Extracts colors, typography, spacing, shadows, radius      │
-│  • Rule Engine checks WCAG + type scale + spacing grid        │
-│  • Download AS-IS JSON file                                   │
-│                                                                │
-│                           ↓                                    │
-│                                                                │
-│  STEP 2: Import to Figma (My Plugin)                          │
-│  ────────────────────────────────────                         │
-│  • Open Figma                                                 │
-│  • Upload AS-IS JSON via custom plugin                        │
-│  • Plugin creates Variables automatically                     │
-│                                                                │
-│                           ↓                                    │
-│                                                                │
-│  STEP 3: View AS-IS Specimen (Figma)                          │
-│  ────────────────────────────────────                         │
-│  • Visual display of current design system                    │
-│  • Typography (Desktop + Mobile), Colors, Spacing, etc.       │
-│  • Review what exists before modernizing                      │
-│                                                                │
-│                           ↓                                    │
-│                                                                │
-│  STEP 4: AI Analysis (AI Agent App - Stage 2)                 │
-│  ─────────────────────────────────────────────                │
-│  • Free Rule Engine: WCAG, type scale, spacing grid           │
-│  • AURORA: Brand color identification                         │
-│  • ATLAS: Industry benchmark comparison (8 systems)           │
-│  • SENTINEL: Best practices audit with priorities             │
-│  • NEXUS: Final synthesis resolving all contradictions         │
-│                                                                │
-│                           ↓                                    │
-│                                                                │
-│  STEP 5: Accept/Reject Suggestions (AI Agent App)             │
-│  ─────────────────────────────────────────────────            │
-│  • Review each recommendation                                 │
-│  • Accept ☑️ or Reject ☐ individually                         │
-│  • I stay in control of what changes                          │
-│                                                                │
-│                           ↓                                    │
-│                                                                │
-│  STEP 6: Export TO-BE (AI Agent App - Stage 3)                │
-│  ─────────────────────────────────────────────                │
-│  • Generate modernized TO-BE JSON                             │
-│  • Contains accepted improvements                             │
-│  • Download new JSON file                                     │
-│                                                                │
-│                           ↓                                    │
-│                                                                │
-│  STEP 7: Import TO-BE to Figma (My Plugin)                    │
-│  ──────────────────────────────────────────                   │
-│  • Upload TO-BE JSON via same plugin                          │
-│  • Figma Variables update with new values                     │
-│                                                                │
-│                           ↓                                    │
-│                                                                │
-│  STEP 8: View TO-BE Specimen (Figma)                          │
-│  ────────────────────────────────────                         │
-│  • Visual display of modernized design system                 │
-│  • Compare AS-IS vs TO-BE                                     │
-│  • Ready to use in production                                 │
-│                                                                │
-└──────────────────────────────────────────────────────────────┘
 ```
-**Total time:** ~15 minutes (vs 3–5 days manual)
 ---
-## Architecture Overview: Two Layers, Four Agents
 My first attempt (V1) made a classic mistake:
 **I used a large language model for everything.**
-### Why Two Layers?
-My V1 mistake: Used LLMs for everything
-❌ Cost: $0.50–1.00 per run
-❌ Speed: 15+ seconds for basic math
-❌ Accuracy: LLMs hallucinate contrast ratios
-The fix: **Not every task needs AI. Some need good engineering.**
-V2 flipped the approach.
-> **Deterministic code handles certainty. LLMs handle ambiguity.**
-This led to a two-layer architecture.
-[IMAGE: Architecture diagram — Layer 1 (Deterministic) → Layer 2 (4 Named Agents)]
 ```
-┌─────────────────────────────────────────────────┐
-│  LAYER 1: DETERMINISTIC (Free — $0.00)          │
-│  ├─ Crawler + 7-Source Extractor + Normalizer   │
-│  ├─ Semantic Color Analyzer (rule-based)        │
-│  ├─ WCAG Contrast Checker (math)                │
-│  ├─ Type Scale Detection (ratio math)           │
-│  ├─ Spacing Grid Analysis (GCD math)            │
-│  └─ Color Statistics (deduplication)             │
-├─────────────────────────────────────────────────┤
-│  LAYER 2: 4 AI AGENTS (~$0.003)                 │
-│  ├─ AURORA   — Brand Color Analyst   (Qwen 72B) │
-│  ├─ ATLAS    — Benchmark Advisor   (Llama 70B)  │
-│  ├─ SENTINEL — Best Practices Auditor (Qwen 72B)│
-│  └─ NEXUS    — Head Synthesizer    (Llama 70B)  │
-└─────────────────────────────────────────────────┘
 ```
----
-## Layer 1: Deterministic Intelligence (No LLM)
-These agents do the heavy lifting — no LLMs involved.
-### Stage 1: Extraction
-A Playwright-powered browser visits each page at **two viewports** (1440px desktop + 375px mobile) and extracts every design token from **7 sources**:
-[IMAGE: 7 Extraction Sources diagram]
 ```
-Source 1: Computed Styles → What the browser actually renders
-Source 2: CSS Variables    → --primary-color, --spacing-md
-Source 3: Inline Styles    → style="color: #06b2c4"
-Source 4: SVG Attributes   → fill, stroke colors
-Source 5: Stylesheets      → External .css files
-Source 6: Style Blocks     → <style> tags
-Source 7: Firecrawl        → Deep CSS parsing (bypasses CORS)
 ```
-A **Normalizer** then deduplicates (exact match + Delta-E color distance), infers semantic roles from frequency, and assigns suggested names like `brand.primary`, `text.secondary`.
-A **Semantic Analyzer** categorizes every color by *actual CSS usage*:
-| Role | Detection Method |
-|------|------------------|
-| Brand | Saturated colors on buttons, CTAs, links |
-| Text | Low saturation with `color` property |
-| Background | Used with `background-color` on containers |
-| Border | Used with `border-color` properties |
-| Feedback | Red=error, Green=success, Yellow=warning |
 **Cost: $0.00 | Runtime: ~90 seconds**
-The user reviews these tokens before anything touches an LLM.
-### The Rule Engine (The Single Biggest Optimization)
-After extraction, a rule engine runs every check that can be done with pure math:
 ```
-📐 TYPE SCALE ANALYSIS
-├─ Detected Ratio: 1.167
-├─ Closest Standard: Minor Third (1.2)
-├─ Consistent: ⚠️ No (variance: 0.24)
-└─ 💡 Recommendation: 1.25 (Major Third)
-♿ ACCESSIBILITY CHECK (WCAG AA/AAA)
-├─ Colors Analyzed: 210
-├─ FG/BG Pairs Checked: 220
-├─ AA Pass: 143 ✅
-├─ AA Fail (real FG/BG pairs): 67 ❌
-│  ├─ fg:#06b2c4 on bg:#ffffff → 💡 Fix: #048391 (4.5:1)
-│  ├─ fg:#999999 on bg:#ffffff → 💡 Fix: #757575 (4.6:1)
-│  └─ ... and 62 more
-📏 SPACING GRID
-├─ Detected Base: 1px (GCD)
-├─ Grid Aligned: ⚠️ 0%
-└─ 💡 Recommendation: 8px grid
-📊 CONSISTENCY SCORE: 52/100
 ```
-Not just "color vs white" — it tests **actual foreground/background pairs** found on the page. And algorithmically generates compliant alternatives.
-This entire layer runs **in under 1 second** and costs nothing beyond compute — the single biggest cost optimization in the system.
 ---
-## Layer 2: AI Analysis & Interpretation (4 Named Agents)
-This is where language models actually add value — tasks that require **context, reasoning, and judgment**.
-[IMAGE: Agent pipeline diagram — AURORA → ATLAS → SENTINEL → NEXUS]
 ---
-### Agent 1: AURORA — Brand Color Analyst
 **Model:** Qwen 72B (HuggingFace PRO)
-**Cost:** Free within PRO subscription ($9/month)
-**Temperature:** 0.4
-**The Challenge:** The rule engine found 143 colors. Which one is the *brand* primary?
-A rule engine can count that `#06b2c4` appears in 33 buttons. But it can't reason: "33 buttons + 12 CTAs + dominant accent positioning = this is almost certainly the brand primary." That requires **context understanding**.
-**Sample Output:**
 ```
 AURORA's Analysis:
-━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
-🎨 Brand Primary:  #06b2c4 (confidence: HIGH)
-   └─ 33 buttons, 12 CTAs, dominant accent
-🎨 Brand Secondary: #c1df1f (confidence: MEDIUM)
-   └─ 15 accent elements, secondary CTA
 Palette Strategy: Complementary
 Cohesion Score: 7/10
-   └─ "Clear hierarchy, accent colors differentiated"
-Self-Evaluation: confidence=8/10, data=good
 ```
 ---
 ### Agent 2: ATLAS — Benchmark Advisor
 **Model:** Llama 3.3 70B (128K context)
-**Cost:** Free within PRO subscription
-**Temperature:** 0.25
 **Unique Capability:** Industry benchmarking against **8 design systems** (Material 3, Polaris, Atlassian, Carbon, Apple HIG, Tailwind, Ant, Chakra).
 [IMAGE: Benchmark comparison table from the UI]
-This agent doesn't just pick the closest match — it reasons about **effort vs. value**:
 ```
 ATLAS's Recommendation:
-━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
-🥇 Shopify Polaris: 87% match
 Alignment Changes:
-  ├─ Type scale: 1.17 → 1.25 (effort: medium)
-  ├─ Spacing grid: mixed → 4px (effort: high)
-  └─ Base size: 16px → 16px (already aligned ✅)
 Pros: Closest match, e-commerce proven, well-documented
 Cons: Spacing migration is significant effort
-🥈 Alternative: Material 3 (77% match)
-  └─ "Stronger mobile patterns, but 8px grid
        requires more restructuring"
 ```
-ATLAS's Value Add:
-> "You're 87% aligned to Polaris already. Closing the gap on type scale takes ~1 hour and makes your system industry-standard. **Priority: MEDIUM.**"
 ---
 ### Agent 3: SENTINEL — Best Practices Auditor
 **Model:** Qwen 72B
-**Cost:** Free within PRO subscription
-**Temperature:** 0.2 (strict, consistent)
-**The Challenge:** The rule engine says "67 AA failures." But which ones matter most?
 SENTINEL prioritizes by **business impact** — not just severity:
 ```
 SENTINEL's Audit:
-━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 Overall Score: 68/100
 Checks:
-  ├─ ✅ Type Scale Standard (1.25 ratio)
-  ├─ ⚠️ Type Scale Consistency (variance 0.18)
-  ├─ ✅ Base Size Accessible (16px)
-  ├─ ❌ AA Compliance (67 failures)
-  ├─ ⚠️ Spacing Grid (0% aligned)
-  └─ ❌ Near-Duplicates (351 pairs)
 Priority Fixes:
   #1 Fix brand color AA compliance
      Impact: HIGH | Effort: 5 min
-     → "Affects 40% of interactive elements"
   #2 Consolidate near-duplicate colors
      Impact: MEDIUM | Effort: 2 hours
@@ -343,95 +426,135 @@ Priority Fixes:
      Impact: MEDIUM | Effort: 1 hour
 ```
 ---
-### Agent 4: NEXUS — Head Synthesizer (Final Output)
 **Model:** Llama 3.3 70B (128K context)
-**Cost:** ~$0.001
-**Temperature:** 0.3
-NEXUS is the senior architect. It takes outputs from **all three agents + the rule engine** and synthesizes a final recommendation — **resolving contradictions**, weighting scores, and producing the executive summary the user sees.
-If ATLAS says "close to Polaris" but SENTINEL says "spacing misaligned," NEXUS reconciles: *"Align to Polaris type scale now (low effort) but defer spacing migration (high effort)."*
 ```
 NEXUS Final Synthesis:
-━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
-📝 Executive Summary:
 "Your design system scores 68/100. Critical:
 67 color pairs fail AA. Top action: fix brand
 primary contrast (5 min, high impact)."
-📊 Scores:
-  ├─ Overall:       68/100
-  ├─ Accessibility:  45/100
-  ├─ Consistency:    75/100
-  └─ Organization:   70/100
-🎯 Top 3 Actions:
-  1. Fix brand color AA (#06b2c4 → #048391)
      Impact: HIGH | Effort: 5 min
   2. Align type scale to 1.25
      Impact: MEDIUM | Effort: 1 hour
-  3. Consolidate 143 → ~20 semantic colors
      Impact: MEDIUM | Effort: 2 hours
-🎨 Color Recommendations:
-  ├─ ✅ brand.primary: #06b2c4 → #048391 (auto-accept)
-  ├─ ✅ text.secondary: #999999 → #757575 (auto-accept)
-  └─ ❌ brand.accent: #FF6B35 → #E65100 (user decides)
 ```
 ---
-## The Figma Bridge: JSON → Variables → Specimen
 [IMAGE: Figma plugin UI showing import options]
-I built a custom Figma plugin that closes the loop:
-1. **Imports JSON** → Creates Figma Variables
-2. **Maps token types:**
-   - Colors → Color Variables
-   - Typography → Text Styles
-   - Spacing → Number Variables
-   - Radius → Number Variables
-   - Shadows → Effect Styles
-3. **Generates a Specimen Page** — visual display of the entire system
-The plugin handles both AS-IS and TO-BE imports identically — just different JSON files.
-### Viewing the Specimen
-[IMAGE: Figma specimen page showing all tokens visually]
 ```
-┌─────────────────────────────────────────────────────────────┐
-│  🎨 BRAND        📝 TEXT         🖼️ BACKGROUND    🚨 FEEDBACK │
-├─────────────────────────────────────────────────────────────┤
-│  ┌────┐ ┌────┐   ┌────┐ ┌────┐   ┌────┐ ┌────┐   ┌────┐    │
-│  │Prim│ │Sec │   │Prim│ │Sec │   │Prim│ │Sec │   │Err │    │
-│  └────┘ └────┘   └────┘ └────┘   └────┘ └────┘   └────┘    │
-│  #06b2c4 #c1df1f #373737 #666666 #fff   #f5f5f5  #dc2626   │
-│  AA:⚠️   AA:⚠️   AA:✓    AA:✓                     AA:✓     │
-└─────────────────────────────────────────────────────────────┘
 ```
 ---
 ## Comparing AS-IS vs TO-BE
 [IMAGE: Side-by-side comparison of AS-IS and TO-BE specimens]
-The real power is seeing the transformation:
 | Token | AS-IS | TO-BE | Change |
 |-------|-------|-------|--------|
-| Type Scale | ~1.18 (random) | 1.25 (Major Third) | ✓ Consistent |
-| brand.primary | #06b2c4 | #048391 | AA: 3.2 → 4.5 |
-| Spacing Grid | Mixed | 8px base | ✓ Standardized |
-| Color Ramps | None | 50-950 | ✓ Generated |
-| Unique Colors | 143 | ~20 semantic | ✓ Consolidated |
 ---
@@ -439,13 +562,16 @@ The real power is seeing the transformation:
 | Metric | Manual Process | My Workflow |
 |--------|---------------|-------------|
-| Time | 3–5 days | ~15 minutes |
 | Cost | Designer salary | ~$0.003 |
-| Coverage | ~50 colors | 143 colors (7 sources) |
 | Accuracy | Human error | Computed styles (exact) |
 | Accessibility | Manual spot checks | Full AA/AAA (all 220 pairs) |
 | Benchmarking | Subjective | 8 industry systems compared |
-| Figma Ready | Hours more | Instant (JSON plugin) |
 ---
@@ -457,8 +583,10 @@ Different agents use different models — intentionally.
 | Agent | Model | Why This Model | Cost |
 |-------|-------|---------------|------|
 | Rule Engine | None | Math doesn't need AI | $0.00 |
-| AURORA | Qwen 72B | Creative color reasoning | ~Free (HF PRO) |
 | ATLAS | Llama 3.3 70B | 128K context for benchmarks | ~Free (HF PRO) |
 | SENTINEL | Qwen 72B | Strict, consistent evaluation | ~Free (HF PRO) |
 | NEXUS | Llama 3.3 70B | 128K context for synthesis | ~$0.001 |
@@ -466,10 +594,10 @@ Different agents use different models — intentionally.
 For designer-scale usage (weekly runs), inference costs are effectively negligible, with HuggingFace PRO ($9/month) covering most models.
-Compared to V1 (LLM-for-everything):
-- **~100–300x cost reduction**
-- **Faster execution** (rule engine: <1s vs LLM: 15s for the same math)
-- **Better accuracy** (LLMs hallucinate math; rule engines don't)
 ---
@@ -479,11 +607,12 @@ The system **always produces output**, even when components fail:
 | If This Fails... | What Happens |
 |-------------------|-------------|
-| LLM agents down | Rule engine analysis still works (free) |
 | Firecrawl unavailable | DOM-only extraction (slightly fewer tokens) |
 | Benchmark fetch fails | Hardcoded fallback data from 8 systems |
 | NEXUS synthesis fails | `create_fallback_synthesis()` from rule engine |
-| **Entire AI layer** | **Full rule-engine-only report — still useful** |
 ---
@@ -492,18 +621,21 @@ The system **always produces output**, even when components fail:
 [IMAGE: Tech stack diagram with logos]
 **AI Agent App:**
-- Playwright (browser automation, 7-source extraction)
 - Firecrawl (deep CSS parsing)
 - Gradio (UI framework)
 - Qwen/Qwen2.5-72B-Instruct (AURORA + SENTINEL)
 - meta-llama/Llama-3.3-70B-Instruct (ATLAS + NEXUS)
 - HuggingFace Spaces (hosting) + HF Inference API
 - Docker (containerized deployment)
 **Figma Integration:**
-- Custom Figma Plugin
-- Variables API
-- Tokens Studio compatible JSON
 ---
@@ -513,17 +645,23 @@ The system **always produces output**, even when components fail:
 If rules can do it faster and cheaper — use rules. My WCAG checker is 100% accurate. An LLM's contrast ratio calculation? Maybe 85% accurate, and 100x slower.
-The rule engine does 80% of the work for $0.
-### 2. Industry Benchmarks Are Gold
-Without benchmarks: "Your type scale is inconsistent" → *PM nods*
-With benchmarks: "You're 87% aligned to Shopify Polaris. Closing the gap takes 1 hour and makes your system industry-standard." → *PM schedules meeting*
 Time to build benchmark database: 1 day.
 Value: Transforms analysis into prioritized action.
-### 3. Semi-Automation > Full Automation
 I don't want AI to make all decisions. The workflow has human checkpoints:
 - Review AS-IS in Figma before modernizing
@@ -532,21 +670,17 @@ I don't want AI to make all decisions. The workflow has human checkpoints:
 AI as **copilot**, not autopilot.
-### 4. Specialized Agents > One Big Prompt
 One mega-prompt doing brand analysis + benchmark comparison + accessibility audit + synthesis = confused, unfocused output. Four agents, each with a single responsibility = sharp, reliable analysis.
-### 5. The JSON Bridge Works
-JSON is the perfect interchange format:
-- AI agents export JSON
-- Figma plugin imports JSON
-- No direct integration needed
-- Each tool does what it's best at
-### 6. Semantic Context Changes Everything
-Raw hex values are useless. Knowing that `#06b2c4` is the **brand primary used on 33 buttons** changes how you evaluate it — and how agents reason about it.
 ---
@@ -554,22 +688,24 @@ Raw hex values are useless. Knowing that `#06b2c4` is the **brand primary used o
 **On HuggingFace Spaces:** I'm using HF Spaces as the hosting platform with a Gradio frontend running in Docker. The LLM models (Qwen 72B, Llama 3.3 70B) are called via HuggingFace Inference API. Browser automation (Playwright + Chromium) runs inside the container.
-**On the Data:** This system works on **live websites** — point it at any URL and it extracts real design tokens from the actual DOM. No synthetic data. The architecture, LLM integrations, and rule engine are production-ready.
 ---
 ## Try It Yourself
 **AI Agent App:**
-- 🚀 Live Demo: [HuggingFace Space link]
-- 💻 GitHub: [Repository link]
 **Workflow:**
-1. Enter website URL → Extract AS-IS
-2. Download JSON → Import to Figma
-3. Review specimen → Run AI analysis
-4. Accept suggestions → Export TO-BE
-5. Import to Figma → Compare specimens
 ---
@@ -579,21 +715,36 @@ AI engineering isn't about fancy models or complex architecture. It's about know
 It's **compression** — compressing days of manual audit, multiple expert perspectives, and industry benchmarking into something a team can act on Monday morning.
-Instead of 3–5 days reviewing DevTools, your team gets:
-> "Top 3 issues, ranked by impact, with specific fixes, benchmark alignment, and a Figma-ready specimen to compare before and after."
 That's AI amplifying design systems impact.
-🔗 Full code on GitHub: [link]
 ---
-## What's Next
 **Coming in Episode 7:**
-- Auto-generating Figma components from tokens
-- Component pattern detection (buttons, cards, forms)
-- Design system documentation generation
 ---
@@ -620,8 +771,8 @@ I'm Riaz, a UX Design Manager with 10+ years of experience in consumer apps. I c
 ---
-#AIAgents #DesignSystems #UXDesign #Figma #MultiAgentSystems #DesignTokens #Automation #AIEngineering #HuggingFace #WCAG
 ---
-*Published on Medium • ~10 min read*

+# AI in My Daily Work — Episode 6: Reverse-Engineering Design Systems with 4 AI Agents, a Rule-Based Color Classifier & a Free Rule Engine
+## A Semi-Automated Workflow: From Website URL to Figma-Ready Design System (v3.2)
+*How I built a system that extracts any website's design tokens, classifies colors deterministically, audits them like a senior design team, and generates a visual spec in Figma — for ~$0.003 per run.*
+[IMAGE: Hero - Complete workflow showing Website -> AI Agents -> Figma Visual Spec]
 ---
 6. Repeat for spacing, shadows, border radius...
 7. Spend days organizing into a coherent system
 8. Manually recreate in Figma as variables
+9. Manually build a visual spec page
+I've done this dozens of times. It takes **3-5 days** for a single website. And by the time you're done, something has already changed.
 I wanted a system that could think like a design team:
+- a **data engineer** extracting and normalizing every token
+- a **color scientist** classifying colors by actual CSS usage (not guessing)
 - an **analyst** identifying brand colors and patterns
 - a **senior reviewer** benchmarking against industry standards
 - and a **chief architect** synthesizing everything into action
+So I built one. Three versions later, here's what works.
 ---
 ## The Solution (In One Sentence)
+I built a 3-layer system — deterministic extraction + rule-based color classification + 4 AI agents — that acts like an entire design audit team. It outputs W3C DTCG-compliant JSON that feeds directly into Figma via a custom plugin that auto-generates a visual spec page. Cost: ~$0.003 per analysis.
 ---
 Here's the end-to-end process I now use:
 ```
++--------------------------------------------------------------+
+|                    MY DESIGN SYSTEM WORKFLOW                    |
++--------------------------------------------------------------+
+|                                                                |
+|  STEP 1: Extract AS-IS (AI Agent App)                         |
+|  ----------------------------------------                     |
+|  * Enter website URL                                          |
+|  * AI auto-discovers pages                                    |
+|  * Extracts colors, typography, spacing, shadows, radius      |
+|  * Normalizes: dedup, sort, name (radius, shadows, colors)    |
+|  * Color Classifier: deterministic role assignment             |
+|  * Rule Engine: WCAG + type scale + spacing grid              |
+|  * Download AS-IS JSON (W3C DTCG v1 format)                  |
+|                                                                |
+|                           |                                    |
+|                           v                                    |
+|                                                                |
+|  STEP 2: Import to Figma (My Plugin)                          |
+|  ----------------------------------------                     |
+|  * Open Figma                                                 |
+|  * Upload AS-IS JSON via custom plugin                        |
+|  * Plugin auto-detects DTCG format                            |
+|  * Creates Variables + Paint/Text/Effect Styles                |
+|  * Auto-generates Visual Spec Page                            |
+|                                                                |
+|                           |                                    |
+|                           v                                    |
+|                                                                |
+|  STEP 3: View AS-IS Visual Spec (Figma)                       |
+|  ----------------------------------------                     |
+|  * Typography (Desktop + Mobile) with AA badges               |
+|  * Colors organized by semantic role                           |
+|  * Spacing scale, Radius display, Shadow elevation            |
+|  * Review what exists before modernizing                      |
+|                                                                |
+|                           |                                    |
+|                           v                                    |
+|                                                                |
+|  STEP 4: AI Analysis (AI Agent App - Stage 2)                 |
+|  ----------------------------------------                     |
+|  * Free Rule Engine: WCAG, type scale, spacing grid           |
+|  * AURORA: Brand color identification (advisory)              |
+|  * ATLAS: Industry benchmark comparison (8 systems)           |
+|  * SENTINEL: Best practices audit with priorities             |
+|  * NEXUS: Final synthesis resolving all contradictions         |
+|                                                                |
+|                           |                                    |
+|                           v                                    |
+|                                                                |
+|  STEP 5: Accept/Reject Suggestions (AI Agent App)             |
+|  ----------------------------------------                     |
+|  * Review each recommendation                                 |
+|  * Accept or Reject individually                              |
+|  * I stay in control of what changes                          |
+|                                                                |
+|                           |                                    |
+|                           v                                    |
+|                                                                |
+|  STEP 6: Export TO-BE (AI Agent App - Stage 3)                |
+|  ----------------------------------------                     |
+|  * Generate modernized TO-BE JSON (DTCG compliant)            |
+|  * Contains accepted improvements                             |
+|  * Download new JSON file                                     |
+|                                                                |
+|                           |                                    |
+|                           v                                    |
+|                                                                |
+|  STEP 7: Import TO-BE to Figma (My Plugin)                    |
+|  ----------------------------------------                     |
+|  * Upload TO-BE JSON via same plugin                          |
+|  * Figma Variables update with new values                     |
+|  * New Visual Spec generated for comparison                   |
+|                                                                |
+|                           |                                    |
+|                           v                                    |
+|                                                                |
+|  STEP 8: Compare AS-IS vs TO-BE (Figma)                       |
+|  ----------------------------------------                     |
+|  * Side-by-side visual spec pages                             |
+|  * See exactly what changed and why                           |
+|  * Ready to use in production                                 |
+|                                                                |
++--------------------------------------------------------------+
 ```
+**Total time:** ~15 minutes (vs 3-5 days manual)
 ---
+## Architecture Overview: Three Layers, One Clear Authority Chain
 My first attempt (V1) made a classic mistake:
 **I used a large language model for everything.**
+V1 cost $0.50-1.00 per run, took 15+ seconds for basic math, and LLMs hallucinated contrast ratios.
+V2 split the work into rules vs AI. Better, but a new problem emerged: **three competing naming systems** for colors. The normalizer used word-based shades ("blue.light"), the export layer used numeric shades ("blue.500"), and the LLM agent used whatever it felt like ("brand.primary"). The output in Figma was chaos.
+V3 fixed this with a clear authority chain and a dedicated color classifier:
+> **Rule-based code handles certainty. LLMs handle ambiguity. And there's ONE naming authority.**
+[IMAGE: Architecture diagram - Layer 1 (Extraction) -> Layer 2 (Classification + Analysis) -> Layer 3 (4 Named Agents)]
+```
++--------------------------------------------------+
+|  LAYER 1: EXTRACTION + NORMALIZATION (Free)       |
+|  +- Crawler + 7-Source Extractor (Playwright)     |
+|  +- Normalizer: colors, radius, shadows, typo     |
+|  |   +- Radius: parse, deduplicate, sort, name   |
+|  |   +- Shadows: parse, sort by blur, name        |
+|  |   +- Colors: hue + numeric shade (50-900)      |
+|  +- Firecrawl: deep CSS parsing (bypass CORS)     |
++--------------------------------------------------+
+|  LAYER 2: CLASSIFICATION + RULE ENGINE (Free)     |
+|  +- Color Classifier (815 lines, deterministic)   |
+|  |   +- CSS evidence -> category -> token name    |
+|  |   +- Capped: brand(3), text(3), bg(3), etc.   |
+|  |   +- Every decision logged with evidence       |
+|  +- WCAG Contrast Checker (actual FG/BG pairs)    |
+|  +- Type Scale Detection (ratio math)             |
+|  +- Spacing Grid Analysis (GCD math)              |
+|  +- Color Statistics (deduplication)               |
++--------------------------------------------------+
+|  LAYER 3: 4 AI AGENTS (~$0.003)                   |
+|  +- AURORA   - Brand Advisor        (Qwen 72B)   |
+|  +- ATLAS    - Benchmark Advisor    (Llama 70B)   |
+|  +- SENTINEL - Best Practices Audit (Qwen 72B)   |
+|  +- NEXUS    - Head Synthesizer     (Llama 70B)   |
++--------------------------------------------------+
+```
+### The Naming Authority Chain (V3's Key Innovation)
+This was the single hardest problem to solve. In V2, three systems produced color names:
+| System | Convention | Example | Problem |
+|--------|-----------|---------|---------|
+| Normalizer | Word shades | `color.blue.light` | Inconsistent |
+| Export function | Numeric shades | `color.blue.500` | Conflicts |
+| AURORA LLM | Whatever it wants | `brand.primary` | Unpredictable |
+**Result in Figma: `blue.300`, `blue.dark`, `blue.light`, `blue.base` in the same export. Unusable.**
+V3 established a clear chain:
 ```
+1. Color Classifier (PRIMARY) - deterministic, covers ALL colors
+   +- Rule-based: CSS evidence -> category -> token name
+   +- 100% reproducible, logged with evidence
+2. AURORA LLM (SECONDARY) - semantic role enhancer ONLY
+   +- Can promote "color.blue.500" -> "color.brand.primary"
+   +- CANNOT rename palette colors
+   +- Only brand/text/bg/border/feedback roles accepted
+3. Normalizer (FALLBACK) - preliminary hue+shade names
+   +- Only used if classifier hasn't run yet
 ```
+One naming authority. No conflicts. Clean Figma output every time.
+---
+## Layer 1: Extraction + Normalization (No LLM)
+### Extraction: 7 Sources
+A Playwright-powered browser visits each page at **two viewports** (1440px desktop + 375px mobile) and extracts every design token from **8 sources**:
+[IMAGE: 8 Extraction Sources diagram]
 ```
+--- Playwright (7 internal sources) ---
+Source 1: Computed Styles   -> What the browser actually renders
+Source 2: CSS Variables     -> --primary-color, --spacing-md
+Source 3: Inline Styles     -> style="color: #06b2c4"
+Source 4: SVG Attributes    -> fill, stroke colors
+Source 5: Stylesheets       -> CSS rules, hover states, pseudo-elements
+Source 6: External CSS      -> Fetched & parsed CSS files
+Source 7: Page Scan         -> Brute-force regex on style blocks
+--- Separate deep extraction ---
+Source 8: Firecrawl         -> Deep CSS parsing (bypasses CORS)
 ```
+### Normalization: Not Just Dedup
+The normalizer in V2 was a major pain point. Colors got named, but radius and shadows were passed through raw. Multi-value CSS like `"0px 0px 16px 16px"` became garbage tokens. Percentage values like `"50%"` couldn't be used in Figma.
+V3's normalizer actually processes everything:
+**Colors:** Deduplicate by exact hex + RGB distance < 30. Assign hue family + numeric shade (50-900). Never use words like "light" or "dark" for shades. Add role hints from CSS context for the classifier.
+**Radius:** Parse multi-value shorthand (take max), convert rem/em/% to px, deduplicate by resolved value, sort by size, name semantically (none/sm/md/lg/xl/2xl/full). A raw extraction of `["8px", "0px 0px 16px 16px", "50%", "1rem"]` becomes:
+```
+radius.sm   = 4px    (from 0.25rem context)
+radius.md   = 8px
+radius.xl   = 16px   (max of 0 0 16 16)
+radius.full = 9999px (from 50%)
+```
+**Shadows:** Parse CSS shadow strings into components (offset, blur, spread, color). Filter out spread-only (border simulation) and inset shadows. Sort by blur radius. Deduplicate by blur bucket. Name by elevation (xs/sm/md/lg/xl). If fewer than 5 shadows extracted, interpolate to always produce 5 elevation levels.
 **Cost: $0.00 | Runtime: ~90 seconds**
+---
+## Layer 2: Color Classification + Rule Engine (No LLM)
+### The Color Classifier (V3's Biggest Addition)
+This is 815 lines of deterministic code that replaced what AURORA used to do badly.
+**The problem it solves:** Given 30+ extracted colors, which is the brand primary? Which are text colors? Which are backgrounds?
+An LLM can reason about this, but inconsistently. The same color might be called "brand.primary" in one run and "accent.main" in the next. And it only named 10 colors, leaving the rest in chaos.
+The classifier uses CSS evidence:
+```
+CSS Evidence -> Category:
+  background-color on <button> + saturated + freq>5 -> BRAND
+  color on <p>/<span> + low saturation              -> TEXT
+  background-color on <div>/<body> + neutral         -> BG
+  border-color + low saturation                      -> BORDER
+  red hue + sat>0.6 + low freq                       -> FEEDBACK (error)
+  everything else                                    -> PALETTE (by hue.shade)
+```
+**Key features:**
+- **Aggressive deduplication**: Colors within RGB distance < 30 AND same category get merged (13 text grays become 3)
+- **Capped categories**: brand (max 3), text (max 3), bg (max 3), border (max 3), feedback (max 4), palette (rest)
+- **User-selectable naming convention**: semantic, tailwind, or material
+- **Every decision logged with evidence**: `[DEDUP] merged #1a1a1a with #1b1b1b (dist=1.7)`, `[CLASSIFY] #06b2c4 -> brand (background-color on <button>, freq=33)`
+**Cost: $0.00 | Reproducible: 100% | Runtime: <100ms**
+### The Rule Engine
+After classification, the rule engine runs every check that can be done with pure math:
 ```
+TYPE SCALE ANALYSIS
++- Detected Ratio: 1.167
++- Closest Standard: Minor Third (1.2)
++- Consistent: Warning (variance: 0.24)
++- Recommendation: 1.25 (Major Third)
+ACCESSIBILITY CHECK (WCAG AA/AAA)
++- Colors Analyzed: 210
++- FG/BG Pairs Checked: 220
++- AA Pass: 143
++- AA Fail (real FG/BG pairs): 67
+|  +- fg:#06b2c4 on bg:#ffffff -> Fix: #048391 (4.5:1)
+|  +- fg:#999999 on bg:#ffffff -> Fix: #757575 (4.6:1)
+|  +- ... and 62 more
+SPACING GRID
++- Detected Base: 1px (GCD)
++- Grid Aligned: Warning 0%
++- Recommendation: 8px grid
+CONSISTENCY SCORE: 52/100
 ```
+Not just "color vs white" — it tests **actual foreground/background pairs** found on the page. And algorithmically generates AA-compliant alternatives.
+This entire layer runs **in under 1 second** and costs nothing — the single biggest cost optimization in the system.
 ---
+## Layer 3: AI Analysis & Interpretation (4 Named Agents)
+This is where language models actually add value — tasks that require **context, reasoning, and judgment**. But in V3, they're advisory only. They don't control naming.
+[IMAGE: Agent pipeline diagram - AURORA -> ATLAS -> SENTINEL -> NEXUS]
 ---
+### Agent 1: AURORA — Brand Color Advisor
 **Model:** Qwen 72B (HuggingFace PRO)
+**Role change in V3:** Advisory only. Cannot rename colors. Can promote palette colors to semantic roles.
+**What AURORA does now:**
+The color classifier handles the naming. AURORA's job shifted to:
+- Identify brand strategy (complementary? analogous? monochrome?)
+- Suggest which palette colors deserve semantic roles (e.g., "color.blue.500 should be color.brand.primary")
+- Assess palette cohesion (score 1-10)
+- Provide reasoning that helps designers understand the brand's color story
+**The key constraint:** `filter_aurora_naming_map()` strips any non-semantic names from AURORA's output. If AURORA tries to rename `color.blue.500` to `color.ocean.primary`, it's rejected. Only `brand.`, `text.`, `bg.`, `border.`, `feedback.` role assignments pass through.
 ```
 AURORA's Analysis:
+------------------------------------------
+Brand Primary:  #06b2c4 (confidence: HIGH)
+  +- 33 buttons, 12 CTAs, dominant accent
+  +- Classifier already tagged as brand
+Brand Secondary: #c1df1f (confidence: MEDIUM)
+  +- 15 accent elements, secondary CTA
 Palette Strategy: Complementary
 Cohesion Score: 7/10
+  +- "Clear hierarchy, accent colors differentiated"
 ```
 ---
 ### Agent 2: ATLAS — Benchmark Advisor
 **Model:** Llama 3.3 70B (128K context)
 **Unique Capability:** Industry benchmarking against **8 design systems** (Material 3, Polaris, Atlassian, Carbon, Apple HIG, Tailwind, Ant, Chakra).
 [IMAGE: Benchmark comparison table from the UI]
+This agent reasons about **effort vs. value**:
 ```
 ATLAS's Recommendation:
+------------------------------------------
+1st: Shopify Polaris: 87% match
 Alignment Changes:
+  +- Type scale: 1.17 -> 1.25 (effort: medium)
+  +- Spacing grid: mixed -> 4px (effort: high)
+  +- Base size: 16px -> 16px (already aligned)
 Pros: Closest match, e-commerce proven, well-documented
 Cons: Spacing migration is significant effort
+2nd: Material 3 (77% match)
+  +- "Stronger mobile patterns, but 8px grid
        requires more restructuring"
 ```
+ATLAS adds the context that turns analysis into action:
+> "You're 87% aligned to Polaris already. Closing the gap on type scale takes ~1 hour and makes your system industry-standard."
 ---
 ### Agent 3: SENTINEL — Best Practices Auditor
 **Model:** Qwen 72B
+**V3 improvement:** Must cite specific data from rule engine. Cross-reference critic validates that scores match actual data.
 SENTINEL prioritizes by **business impact** — not just severity:
 ```
 SENTINEL's Audit:
+------------------------------------------
 Overall Score: 68/100
 Checks:
+  +- PASS:    Type Scale Standard (1.25 ratio)
+  +- WARNING: Type Scale Consistency (variance 0.18)
+  +- PASS:    Base Size Accessible (16px)
+  +- FAIL:    AA Compliance (67 failures)
+  +- WARNING: Spacing Grid (0% aligned)
+  +- FAIL:    Near-Duplicates (351 pairs)
 Priority Fixes:
   #1 Fix brand color AA compliance
      Impact: HIGH | Effort: 5 min
+     -> "Affects 40% of interactive elements"
   #2 Consolidate near-duplicate colors
      Impact: MEDIUM | Effort: 2 hours
      Impact: MEDIUM | Effort: 1 hour
 ```
+**V3's grounding rule:** If the rule engine says 67 AA failures, SENTINEL's AA check **must** be "fail." A cross-reference critic catches contradictions.
 ---
+### Agent 4: NEXUS — Head Synthesizer
 **Model:** Llama 3.3 70B (128K context)
+NEXUS takes outputs from **all three agents + the rule engine** and synthesizes a final recommendation using a two-perspective evaluation:
+- **Perspective A (Accessibility-First):** Weights AA compliance at 40%
+- **Perspective B (Balanced):** Equal weights across dimensions
+It evaluates both, then picks the perspective that best reflects the actual data.
 ```
 NEXUS Final Synthesis:
+------------------------------------------
+Executive Summary:
 "Your design system scores 68/100. Critical:
 67 color pairs fail AA. Top action: fix brand
 primary contrast (5 min, high impact)."
+Scores:
+  +- Overall:       68/100
+  +- Accessibility:  45/100
+  +- Consistency:    75/100
+  +- Organization:   70/100
+Top 3 Actions:
+  1. Fix brand color AA (#06b2c4 -> #048391)
      Impact: HIGH | Effort: 5 min
   2. Align type scale to 1.25
      Impact: MEDIUM | Effort: 1 hour
+  3. Consolidate 143 -> ~20 semantic colors
      Impact: MEDIUM | Effort: 2 hours
+Color Recommendations:
+  +- PASS:   brand.primary: #06b2c4 -> #048391 (auto-accept)
+  +- PASS:   text.secondary: #999999 -> #757575 (auto-accept)
+  +- REJECT: brand.accent: #FF6B35 -> #E65100 (user decides)
 ```
 ---
+## The Figma Bridge: DTCG JSON -> Variables -> Visual Spec
 [IMAGE: Figma plugin UI showing import options]
+### W3C DTCG v1 Compliance
+V3's export follows the W3C Design Tokens Community Group specification (stable October 2025):
+```json
+{
+  "color": {
+    "brand": {
+      "primary": {
+        "$type": "color",
+        "$value": "#005aa3",
+        "$description": "[classifier] brand: primary_action",
+        "$extensions": {
+          "com.design-system-extractor": {
+            "frequency": 47,
+            "confidence": "high",
+            "category": "brand",
+            "evidence": ["background-color on <a>", "background-color on <button>"]
+          }
+        }
+      }
+    }
+  },
+  "radius": {
+    "md": { "$type": "dimension", "$value": "8px" }
+  },
+  "shadow": {
+    "sm": {
+      "$type": "shadow",
+      "$value": {
+        "offsetX": "0px", "offsetY": "2px",
+        "blur": "8px", "spread": "0px",
+        "color": "#00000026"
+      }
+    }
+  }
+}
+```
+Every token includes `$type`, `$value`, and `$description`. Colors include `$extensions` with extraction metadata (frequency, confidence, category, evidence). This means any DTCG-compatible tool can consume our output.
+### The Custom Figma Plugin
+The plugin closes the loop:
+1. **Auto-detects DTCG format** (vs legacy JSON)
+2. **Creates Figma Variables** — Color, Number, and String variable collections
+3. **Creates Styles** — Paint styles, Text styles, Effect styles
+4. **Generates Visual Spec Page** — Separate frames for typography, colors, spacing, radius, shadows
+[IMAGE: Figma visual spec page showing all tokens]
 ```
++-------------------------------------------------------------+
+|  BRAND        TEXT           BACKGROUND      FEEDBACK        |
++-------------------------------------------------------------+
+|  +----+ +----+   +----+ +----+   +----+ +----+   +----+     |
+|  |Prim| |Sec |   |Prim| |Sec |   |Prim| |Sec |   |Err |     |
+|  +----+ +----+   +----+ +----+   +----+ +----+   +----+     |
+|  #005aa3 #c1df1f #373737 #666666 #fff   #f5f5f5  #dc2626    |
+|  AA:Pass AA:Warn AA:Pass AA:Pass                  AA:Pass    |
++-------------------------------------------------------------+
 ```
+The visual spec uses horizontal auto-layout with AA compliance badges on every color swatch. Typography renders in the actual detected font family with size, weight, and line-height metadata.
 ---
 ## Comparing AS-IS vs TO-BE
 [IMAGE: Side-by-side comparison of AS-IS and TO-BE specimens]
 | Token | AS-IS | TO-BE | Change |
 |-------|-------|-------|--------|
+| Type Scale | ~1.18 (random) | 1.25 (Major Third) | Consistent |
+| brand.primary | #06b2c4 | #048391 | AA: 3.2 -> 4.5 |
+| Spacing Grid | Mixed | 8px base | Standardized |
+| Color Ramps | None | 50-950 | Generated |
+| Unique Colors | 143 | ~20 semantic | Consolidated |
+| Radius | Raw CSS garbage | none/sm/md/lg/xl/full | Normalized |
+| Shadows | Unsorted, unnamed | xs/sm/md/lg/xl (5 levels) | Progressive |
 ---
 | Metric | Manual Process | My Workflow |
 |--------|---------------|-------------|
+| Time | 3-5 days | ~15 minutes |
 | Cost | Designer salary | ~$0.003 |
+| Coverage | ~50 colors | 143 colors (8 sources) |
 | Accuracy | Human error | Computed styles (exact) |
 | Accessibility | Manual spot checks | Full AA/AAA (all 220 pairs) |
 | Benchmarking | Subjective | 8 industry systems compared |
+| Color naming | Manual | Deterministic classifier (100% reproducible) |
+| Radius/shadows | Copy raw CSS | Normalized, sorted, named |
+| Figma ready | Hours more | Instant (DTCG plugin + visual spec) |
+| Format | Proprietary | W3C DTCG v1 standard |
 ---
 | Agent | Model | Why This Model | Cost |
 |-------|-------|---------------|------|
+| Normalizer | None | Math doesn't need AI | $0.00 |
+| Color Classifier | None (815 lines) | Deterministic, reproducible | $0.00 |
 | Rule Engine | None | Math doesn't need AI | $0.00 |
+| AURORA | Qwen 72B | Creative brand reasoning | ~Free (HF PRO) |
 | ATLAS | Llama 3.3 70B | 128K context for benchmarks | ~Free (HF PRO) |
 | SENTINEL | Qwen 72B | Strict, consistent evaluation | ~Free (HF PRO) |
 | NEXUS | Llama 3.3 70B | 128K context for synthesis | ~$0.001 |
 For designer-scale usage (weekly runs), inference costs are effectively negligible, with HuggingFace PRO ($9/month) covering most models.
+The V1-to-V3 journey:
+- **V1:** LLM for everything. $0.50-1.00/run. Hallucinated contrast ratios.
+- **V2:** Rules + LLM split. $0.003/run. But 3 naming systems fighting.
+- **V3:** Rules + Classifier + Advisory LLM. $0.003/run. One naming authority. Clean output.
 ---
 | If This Fails... | What Happens |
 |-------------------|-------------|
+| LLM agents down | Color classifier + rule engine still works (free) |
 | Firecrawl unavailable | DOM-only extraction (slightly fewer tokens) |
 | Benchmark fetch fails | Hardcoded fallback data from 8 systems |
 | NEXUS synthesis fails | `create_fallback_synthesis()` from rule engine |
+| AURORA returns garbage | `filter_aurora_naming_map()` strips invalid names |
+| **Entire AI layer** | **Full classifier + rule-engine-only report - still useful** |
 ---
 [IMAGE: Tech stack diagram with logos]
 **AI Agent App:**
+- Playwright (browser automation, 8-source extraction)
 - Firecrawl (deep CSS parsing)
 - Gradio (UI framework)
 - Qwen/Qwen2.5-72B-Instruct (AURORA + SENTINEL)
 - meta-llama/Llama-3.3-70B-Instruct (ATLAS + NEXUS)
 - HuggingFace Spaces (hosting) + HF Inference API
 - Docker (containerized deployment)
+- 148 tests (82 deterministic + 27 agent evals + 35 live evals + 4 pipeline)
 **Figma Integration:**
+- Custom Figma Plugin (v7)
+- W3C DTCG v1 compliant JSON
+- Variables API + Paint/Text/Effect Styles
+- Auto-generated Visual Spec pages
+- Tokens Studio compatible
 ---
 If rules can do it faster and cheaper — use rules. My WCAG checker is 100% accurate. An LLM's contrast ratio calculation? Maybe 85% accurate, and 100x slower.
+The rule engine + color classifier do 90% of the work for $0.
+### 2. The Naming Authority Problem Is Real
+V2's biggest failure wasn't technical — it was organizational. Three systems producing color names with no clear hierarchy. The fix wasn't better AI, it was a clear authority chain: classifier is PRIMARY, LLM is SECONDARY (advisory only), normalizer is FALLBACK.
+**Lesson:** When multiple systems touch the same data, establish ONE authority. Don't merge competing outputs.
+### 3. Industry Benchmarks Are Gold
+Without benchmarks: "Your type scale is inconsistent" -- *PM nods*
+With benchmarks: "You're 87% aligned to Shopify Polaris. Closing the gap takes 1 hour and makes your system industry-standard." -- *PM schedules meeting*
 Time to build benchmark database: 1 day.
 Value: Transforms analysis into prioritized action.
+### 4. Semi-Automation > Full Automation
 I don't want AI to make all decisions. The workflow has human checkpoints:
 - Review AS-IS in Figma before modernizing
 AI as **copilot**, not autopilot.
+### 5. Specialized Agents > One Big Prompt
 One mega-prompt doing brand analysis + benchmark comparison + accessibility audit + synthesis = confused, unfocused output. Four agents, each with a single responsibility = sharp, reliable analysis.
+### 6. W3C Standards Matter
+Adopting the DTCG v1 spec (October 2025) means our JSON output works with Tokens Studio, Style Dictionary v4, and any tool that follows the standard. Custom formats create lock-in. Standards create ecosystems.
+### 7. Deterministic Classification Beats LLM Classification
+AURORA (LLM) named 10 colors per run, inconsistently. The color classifier names ALL colors, every time, with logged evidence. For categorization tasks where you have structured input data (CSS properties, element types, frequency), rules beat LLMs on accuracy, speed, cost, and reproducibility.
 ---
 **On HuggingFace Spaces:** I'm using HF Spaces as the hosting platform with a Gradio frontend running in Docker. The LLM models (Qwen 72B, Llama 3.3 70B) are called via HuggingFace Inference API. Browser automation (Playwright + Chromium) runs inside the container.
+**On the Data:** This system works on **live websites** — point it at any URL and it extracts real design tokens from the actual DOM. No synthetic data. The architecture, LLM integrations, and rule engine are production-ready with 148 passing tests.
+**On the Standard:** The W3C DTCG specification reached stable v1 in October 2025. Our output includes `$type`, `$value`, `$description`, and `$extensions` with namespaced metadata. Any DTCG-compatible tool can consume it.
 ---
 ## Try It Yourself
 **AI Agent App:**
+- Live Demo: [HuggingFace Space link]
+- GitHub: [Repository link]
 **Workflow:**
+1. Enter website URL -> Extract AS-IS
+2. Download DTCG JSON -> Import to Figma
+3. Review visual spec -> Run AI analysis
+4. Accept suggestions -> Export TO-BE
+5. Import to Figma -> Compare visual specs
 ---
 It's **compression** — compressing days of manual audit, multiple expert perspectives, and industry benchmarking into something a team can act on Monday morning.
+Instead of 3-5 days reviewing DevTools, your team gets:
+> "Top 3 issues, ranked by impact, with specific fixes, benchmark alignment, and a Figma-ready visual spec to compare before and after."
 That's AI amplifying design systems impact.
+Full code on GitHub: [link]
 ---
+## What's Next: Automated Component Generation (Part 2)
+The token extraction and analysis story is complete. But design systems aren't just tokens — they're **components**.
+After exhaustive research into 30+ tools (Tokens Studio, Figr Identity, Figma Make, MCP bridges, story.to.design, and more), I found a genuine market gap:
+**No production tool takes DTCG JSON and outputs Figma components with proper variants.**
+Every tool either:
+- Imports tokens as variables (but doesn't create components)
+- Creates components from brand config (but can't consume YOUR tokens)
+- Uses AI to write to Figma (but is non-deterministic)
+- Needs a full Storybook pipeline as intermediary
+So I'm building it. The Figma Plugin API supports everything needed: `createComponent()`, `combineAsVariants()`, `setBoundVariable()`. Our existing plugin already imports tokens and creates variables.
 **Coming in Episode 7:**
+- Auto-generating Figma components from extracted tokens
+- Button (60 variants), TextInput (8), Card, Toast, Checkbox/Radio
+- Token-to-component binding: `color.brand.primary` -> Button fill, `radius.md` -> Button corners
+- Fully deterministic: same tokens in = same components out
 ---
 ---
+#AIAgents #DesignSystems #UXDesign #Figma #MultiAgentSystems #DesignTokens #Automation #AIEngineering #HuggingFace #WCAG #W3CDTCG
 ---
+*Published on Medium - ~12 min read*