Spaces:

riazmo
/

Design-System-Extractor-2

Sleeping

App Files Files Community

riazmo commited on Jan 27

Commit

64ebfd5

verified ·

1 Parent(s): efdcf76

Upload CONTEXT.md

Browse files

Files changed (1) hide show

docs/CONTEXT.md +111 -38

docs/CONTEXT.md CHANGED Viewed

@@ -10,10 +10,12 @@
 | File | What Changed |
 |------|--------------|
-| `agents/extractor.py` | Enhanced 5-source extraction (DOM, CSS vars, SVG, inline, stylesheets) |
-| `core/preview_generator.py` | Added AS-IS previews for Stage 1 (colors, spacing, radius, shadows) |
-| `app.py` | Stage 1 UI now has 5 preview tabs, enhanced logging shows extraction sources |
-| `docs/CONTEXT.md` | Updated with all changes, enhanced architecture diagrams |
 ---
@@ -314,17 +316,79 @@ Shows detailed extraction progress:
 🔀 Merging Firecrawl colors with Playwright extraction...
    ✅ Added 12 new colors from Firecrawl
    📊 Total colors now: 44
 ```
-### Stage 2 LLM Analysis Logs
-Shows detailed reasoning from each agent:
-- What they analyzed
-- Scores per category (Typography, Colors, AA, Spacing)
-- Specific findings and recommendations
-- Confidence levels
-- How HEAD resolved disagreements
-- Cost per call and total
 ---
@@ -347,7 +411,7 @@ Shows detailed reasoning from each agent:
     7. Page content scan (brute-force regex on HTML)
 - **Output:** Raw tokens with frequency, context, confidence, source type
-### Agent 1B: Firecrawl CSS Deep Diver (NEW)
 - **Persona:** CSS Deep Diver
 - **Tool:** Firecrawl / httpx fallback
 - **Job:**
@@ -357,6 +421,22 @@ Shows detailed reasoning from each agent:
   - Find colors missed by DOM inspection
 - **Output:** Additional colors merged into main extraction
 ### Agent 2: Token Normalizer & Structurer
 - **Persona:** Design System Librarian
 - **Job:**
@@ -389,12 +469,13 @@ Shows detailed reasoning from each agent:
 - **Purpose:** Trust building — show exactly what was extracted
 - **Shows:**
   - Token tables (colors, typography, spacing)
-  - **5 Visual Preview Tabs (AS-IS, no enhancements):**
     1. 🔤 Typography — actual font rendered
-    2. 🎨 Colors — simple swatches (no ramps)
-    3. 📏 Spacing — visual bars
-    4. 🔘 Radius — rounded boxes
-    5. 🌑 Shadows — shadow cards
 - **Human Actions:** Accept/reject tokens, flag anomalies, toggle Desktop↔Mobile
 ### Stage 2: Upgrade Playground (MOST IMPORTANT)
@@ -403,6 +484,7 @@ Shows detailed reasoning from each agent:
   - Side-by-side option selector + live preview
   - **Color Ramps (50-950 shades with AA compliance)**
   - Type scale options (1.2, 1.25, 1.333)
 - **Human Actions:** Select type scale A/B/C, spacing system, color ramps — preview updates instantly
 ### Stage 3: Final Review & Export
@@ -429,44 +511,35 @@ design-system-extractor/
 │   ├── __init__.py
 │   ├── state.py                    # LangGraph state definitions
 │   ├── graph.py                    # LangGraph workflow orchestration
-│   ├── crawler.py                  # Agent 1: Website crawler
-│   ├── extractor.py                # Agent 1: Token extraction
 │   ├── normalizer.py               # Agent 2: Token normalization
 │   ├── advisor.py                  # Agent 3: Best practices
 │   └── generator.py                # Agent 4: JSON generator
 │
 ├── core/
 │   ├── __init__.py
-│   ├── browser.py                  # Playwright browser management
-│   ├── css_parser.py               # CSS/computed style extraction
 │   ├── color_utils.py              # Color analysis, contrast, ramps
-│   ├── typography_utils.py         # Type scale detection & generation
-│   ├── spacing_utils.py            # Spacing pattern detection
 │   └── token_schema.py             # Token data structures (Pydantic)
 │
 ├── ui/
-│   ├── __init__.py
-│   ├── components.py               # Reusable Gradio components
-│   ├── stage1_extraction.py        # Stage 1 UI
-│   ├── stage2_upgrade.py           # Stage 2 UI
-│   ├── stage3_export.py            # Stage 3 UI
-│   └── preview_generator.py        # HTML preview generation
 │
 ├── templates/
-│   ├── preview.html                # Live preview base template
-│   └── specimen.html               # Design system specimen template
 │
 ├── storage/
-│   └── persistence.py              # HF Spaces storage management
 │
 ├── tests/
-│   ├── test_crawler.py
-│   ├── test_extractor.py
-│   └── test_normalizer.py
 │
 └── docs/
-    ├── CONTEXT.md                  # THIS FILE - upload for context refresh
-    └── API.md                      # API documentation
 ```
 ---

 | File | What Changed |
 |------|--------------|
+| `agents/extractor.py` | Enhanced 7-source extraction (DOM, CSS vars, SVG, inline, stylesheets, external CSS, page scan) |
+| `agents/firecrawl_extractor.py` | **NEW** Agent 1B for deep CSS parsing |
+| `agents/semantic_analyzer.py` | **NEW** Agent 1C for semantic color categorization (brand/text/bg/border) |
+| `core/preview_generator.py` | AS-IS previews + Color Ramps sorted by brand priority |
+| `app.py` | Stage 1 UI now has 6 preview tabs including Semantic Colors |
+| `docs/CONTEXT.md` | Updated with semantic analyzer, full architecture diagrams |
 ---
 🔀 Merging Firecrawl colors with Playwright extraction...
    ✅ Added 12 new colors from Firecrawl
    📊 Total colors now: 44
+============================================================
+🧠 SEMANTIC COLOR ANALYSIS
+============================================================
+   📊 Analyzing 143 colors...
+   Using rule-based analysis (no LLM)
+📊 SEMANTIC ANALYSIS RESULTS:
+   🎨 BRAND COLORS:
+      primary: #06b2c4 (high)
+         └─ Most frequent saturated color on interactive elements (freq: 33)
+      secondary: #c1df1f (medium)
+         └─ Second most frequent brand color (freq: 15)
+   📝 TEXT COLORS:
+      primary: #373737 (high)
+      secondary: #666666 (medium)
+   🖼️ BACKGROUND COLORS:
+      primary: #ffffff (high)
+      secondary: #f5f5f5 (medium)
+   📈 SUMMARY:
+      Total colors analyzed: 143
+      Brand colors found: 2
+      Clear hierarchy: Yes
+      Analysis method: rule-based
 ```
+### Stage 2 LLM Analysis Logs (With Semantic Context)
+Shows detailed reasoning from each agent WITH semantic context:
+```
+============================================================
+🧠 STAGE 2: MULTI-AGENT ANALYSIS
+============================================================
+🧠 SEMANTIC CONTEXT FROM STAGE 1:
+   Brand Primary: #06b2c4
+   Text Primary: #373737
+   Analysis Method: rule-based
+=======================================================
+🤖 LLM 1: meta-llama/Llama-3.1-70B-Instruct
+=======================================================
+   Provider: novita
+   💰 Cost: $0.29/M in, $0.59/M out
+   📝 Task: Typography, Colors, AA, Spacing analysis
+   🧠 Semantic context: Yes  ← NEW: LLM knows color roles!
+   📊 LLM 1 FINDINGS:
+   COLORS (with semantic context):
+   ├─ Brand Primary (#06b2c4): "Fails AA on white (3.2:1)"
+   ├─ Suggested fix: "#0891a8 (4.6:1)"
+   └─ Score: 6/10
+=======================================================
+🎯 HEAD: Compiling final recommendations...
+=======================================================
+   📥 INPUT: Analyzing outputs from LLM 1 + LLM 2 + Rules + Semantic...
+   📊 HEAD SYNTHESIS:
+   COLOR RECOMMENDATIONS (per semantic role):
+   ├─ brand.primary: #06b2c4 → Keep for branding, use #0891a8 for text
+   ├─ text.primary: #373737 → Keep (passes AA)
+   └─ Generate ramps for: brand.primary, brand.secondary, neutral
+```
 ---
     7. Page content scan (brute-force regex on HTML)
 - **Output:** Raw tokens with frequency, context, confidence, source type
+### Agent 1B: Firecrawl CSS Deep Diver
 - **Persona:** CSS Deep Diver
 - **Tool:** Firecrawl / httpx fallback
 - **Job:**
   - Find colors missed by DOM inspection
 - **Output:** Additional colors merged into main extraction
+### Agent 1C: Semantic Color Analyzer (NEW - LLM)
+- **Persona:** Design System Semanticist
+- **Tool:** Rule-based analysis (LLM optional)
+- **Job:**
+  - Analyze colors based on actual CSS usage (not guessing)
+  - Categorize into semantic roles:
+    - **Brand Colors:** Used on buttons, CTAs, links (interactive elements)
+    - **Text Colors:** Used with `color` property on p, span, h1-h6
+    - **Background Colors:** Used with `background-color` on containers
+    - **Border Colors:** Used with `border-color` properties
+    - **Feedback Colors:** Error (red), success (green), warning (yellow)
+  - Detect color hierarchy (primary → secondary → muted)
+- **Input:** Colors WITH context data (css_properties, elements, frequency)
+- **Output:** Semantic categorization with confidence levels
+- **Why:** Stage 2 LLMs can now give SPECIFIC recommendations per role
 ### Agent 2: Token Normalizer & Structurer
 - **Persona:** Design System Librarian
 - **Job:**
 - **Purpose:** Trust building — show exactly what was extracted
 - **Shows:**
   - Token tables (colors, typography, spacing)
+  - **6 Visual Preview Tabs (AS-IS, no enhancements):**
     1. 🔤 Typography — actual font rendered
+    2. 🎨 Colors — simple swatches sorted by frequency (no ramps)
+    3. 🧠 Semantic Colors — colors organized by usage (brand/text/bg/border)
+    4. 📏 Spacing — visual bars
+    5. 🔘 Radius — rounded boxes
+    6. 🌑 Shadows — shadow cards
 - **Human Actions:** Accept/reject tokens, flag anomalies, toggle Desktop↔Mobile
 ### Stage 2: Upgrade Playground (MOST IMPORTANT)
   - Side-by-side option selector + live preview
   - **Color Ramps (50-950 shades with AA compliance)**
   - Type scale options (1.2, 1.25, 1.333)
+  - **Semantic-aware recommendations:** "Your brand primary #06b2c4 fails AA, consider #0891a8"
 - **Human Actions:** Select type scale A/B/C, spacing system, color ramps — preview updates instantly
 ### Stage 3: Final Review & Export
 │   ├── __init__.py
 │   ├── state.py                    # LangGraph state definitions
 │   ├── graph.py                    # LangGraph workflow orchestration
+│   ├── crawler.py                  # Agent 1A: Website crawler
+│   ├── extractor.py                # Agent 1A: Token extraction (7 sources)
+│   ├── firecrawl_extractor.py      # Agent 1B: Deep CSS parsing
+│   ├── semantic_analyzer.py        # Agent 1C: Semantic color categorization
 │   ├── normalizer.py               # Agent 2: Token normalization
 │   ├── advisor.py                  # Agent 3: Best practices
+│   ├── stage2_graph.py             # Stage 2 multi-agent LLM workflow
 │   └── generator.py                # Agent 4: JSON generator
 │
 ├── core/
 │   ├── __init__.py
 │   ├── color_utils.py              # Color analysis, contrast, ramps
+│   ├── preview_generator.py        # HTML preview generation
+│   ├── hf_inference.py             # HuggingFace LLM inference
 │   └── token_schema.py             # Token data structures (Pydantic)
 │
 ├── ui/
+│   └── __init__.py
 │
 ├── templates/
 │
 ├── storage/
+│   └── __init__.py
 │
 ├── tests/
+│   └── __init__.py
 │
 └── docs/
+    └── CONTEXT.md                  # THIS FILE - upload for context refresh
 ```
 ---