Spaces:
Sleeping
Sleeping
| # Design System Extractor v2 β Master Context File | |
| > **Upload this file to refresh Claude's context when continuing work on this project.** | |
| **Last Updated:** January 2026 | |
| --- | |
| ## π Files Changed in Latest Session | |
| | File | What Changed | | |
| |------|--------------| | |
| | `agents/extractor.py` | Enhanced 7-source extraction (DOM, CSS vars, SVG, inline, stylesheets, external CSS, page scan) | | |
| | `agents/firecrawl_extractor.py` | **NEW** Agent 1B for deep CSS parsing | | |
| | `agents/semantic_analyzer.py` | **NEW** Agent 1C for semantic color categorization (brand/text/bg/border) | | |
| | `core/preview_generator.py` | AS-IS previews + Color Ramps sorted by brand priority | | |
| | `app.py` | Stage 1 UI now has 6 preview tabs including Semantic Colors | | |
| | `docs/CONTEXT.md` | Updated with semantic analyzer, full architecture diagrams | | |
| --- | |
| ## π― Project Goal | |
| Build a **semi-automated, human-in-the-loop agentic system** that: | |
| 1. Reverse-engineers a design system from a live website | |
| 2. Reconstructs and upgrades it into a modern, scalable design system | |
| 3. Outputs production-ready JSON tokens (Figma Tokens Studio compatible) | |
| **Philosophy:** This is a design-aware co-pilot, NOT a magic button. Humans decide, agents propose. | |
| --- | |
| ## π€ Why This Project? (Market Differentiation) | |
| ### The Problem We Solve | |
| | Pain Point | Who Has It | Current Solutions | Why They Fail | | |
| |------------|------------|-------------------|---------------| | |
| | Legacy websites with no design system | Enterprise teams | Manual audit (weeks) | Time-consuming, error-prone | | |
| | Inconsistent design tokens scattered in CSS | Agencies inheriting projects | Figma plugins (style extractors) | Only extract from Figma, not live sites | | |
| | Need to modernize without breaking existing | Product teams | Design system generators | Generate new, don't reverse-engineer existing | | |
| | AA compliance gaps unknown | Accessibility teams | Contrast checkers | Check one color at a time, no system view | | |
| ### Existing Tools & Their Gaps | |
| | Tool | What It Does | Gap We Fill | | |
| |------|--------------|-------------| | |
| | **Figma Tokens Studio** | Manages tokens in Figma | Doesn't extract from websites | | |
| | **Style Dictionary** | Transforms tokens to code | Needs tokens first (we create them) | | |
| | **Polypane/VisBug** | Inspect live sites | No systematic extraction or upgrade | | |
| | **AI Design Tools** (Galileo, Uizard) | Generate new designs | Don't reverse-engineer existing | | |
| | **CSS Stats** | Analyze CSS files | Statistics only, no actionable tokens | | |
| | **Chromatic/Percy** | Visual regression | Compare, don't extract or upgrade | | |
| ### Our Unique Value Proposition | |
| ``` | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β WHAT MAKES US DIFFERENT β | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ | |
| β β | |
| β 1. REVERSE-ENGINEERING (not generation) β | |
| β β’ Extracts from LIVE websites, not design files β | |
| β β’ Preserves what's working, upgrades what's broken β | |
| β β’ Respects existing brand decisions β | |
| β β | |
| β 2. MULTI-AGENT REASONING (not single LLM) β | |
| β β’ Two analysts with different perspectives β | |
| β β’ HEAD compiler resolves conflicts β | |
| β β’ Shows reasoning, not just results β | |
| β β | |
| β 3. HUMAN-IN-THE-LOOP (not magic button) β | |
| β β’ Designer reviews every stage β | |
| β β’ Accept/reject individual tokens β | |
| β β’ Choose from upgrade OPTIONS, not forced decisions β | |
| β β | |
| β 4. VISUAL PREVIEWS (not just data tables) β | |
| β β’ Typography rendered in actual detected font β | |
| β β’ Color ramps with AA compliance per shade β | |
| β β’ See before you export β | |
| β β | |
| β 5. COST-TRANSPARENT (not black box) β | |
| β β’ Shows token usage and cost per analysis β | |
| β β’ Uses HF free tier ($0.10/mo) or Pro ($2/mo) β | |
| β β’ ~$0.05 per full analysis β | |
| β β | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| ``` | |
| ### Target Users | |
| | User | Use Case | Value | | |
| |------|----------|-------| | |
| | **UX Managers** (like you!) | Modernize legacy booking platforms | Weeks β Hours | | |
| | **Design System Teams** | Audit and standardize existing properties | Systematic, not ad-hoc | | |
| | **Agencies** | Onboard client projects with no documentation | Instant design inventory | | |
| | **Accessibility Consultants** | AA compliance audit with fixes | Full palette view | | |
| | **Developers** | Get production-ready tokens from designer's website | No manual translation | | |
| ### Why Not Just Use [X]? | |
| **"Why not just inspect the CSS manually?"** | |
| β You could, but it takes weeks for a complex site. We do it in minutes with systematic coverage. | |
| **"Why not use Figma's native styles?"** | |
| β Many legacy sites were never in Figma. We extract from the source of truth: the live website. | |
| **"Why do you need AI? Can't rules handle this?"** | |
| β Rules extract tokens. AI understands *design intent* β why is this color used here? What scale was intended? Where does it deviate from best practices? | |
| **"Isn't this just CSS Stats with AI?"** | |
| β CSS Stats tells you what exists. We tell you what it *should* be and give you actionable upgrade paths. | |
| --- | |
| ## ποΈ Architecture Overview | |
| ``` | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β TECH STACK β | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ | |
| β Frontend: Gradio (long-scroll, sectioned UI with live preview) β | |
| β Orchestration: LangGraph (agent state management & workflow) β | |
| β Models: HuggingFace Inference Providers (Novita, Groq, etc.) β | |
| β Hosting: Hugging Face Spaces β | |
| β Storage: HF Spaces persistent storage β | |
| β Output: Platform-agnostic JSON tokens (Figma Tokens Studio) β | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| ``` | |
| --- | |
| ## π§ Model Assignments | |
| ### Stage 2: Multi-Agent Analysis | |
| | Agent | Role | Model | Provider | Cost | | |
| |-------|------|-------|----------|------| | |
| | **LLM 1** | Design Analyst 1 | `Qwen/Qwen2.5-72B-Instruct` | Novita | $0.29/M in, $0.59/M out | | |
| | **LLM 2** | Design Analyst 2 | `meta-llama/Llama-3.3-70B-Instruct` | Novita | $0.59/M in, $0.79/M out | | |
| | **HEAD** | Compiler | `meta-llama/Llama-3.3-70B-Instruct` | Novita | $0.59/M in, $0.79/M out | | |
| | **Rules** | Calculations | None (Rule-based) | β | FREE | | |
| **Architecture:** | |
| ``` | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β PARALLEL ANALYSIS β | |
| β β | |
| β LLM 1 (Qwen) LLM 2 (Llama) Rule Engine β | |
| β ββββββββββββ ββββββββββββ ββββββββββββ β | |
| β β Global β β Western β β Math β β | |
| β β Design β β Design β β Only β β | |
| β β Patterns β β Patterns β β (FREE) β β | |
| β ββββββ¬ββββββ ββββββ¬ββββββ ββββββ¬ββββββ β | |
| β β β β β | |
| β βββββββββββββββββββββββΌββββββββββββββββββββββ β | |
| β β β | |
| β βΌ β | |
| β βββββββββββββββββββ β | |
| β β HEAD COMPILER β β | |
| β β β β | |
| β β β’ Compare views β β | |
| β β β’ Resolve diff β β | |
| β β β’ Final recs β β | |
| β βββββββββββββββββββ β | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| ``` | |
| ### Other Agents | |
| | Agent | Role | Model | Provider | Why | | |
| |-------|------|-------|----------|-----| | |
| | **Agent 1** | Crawler & Extractor | None (Rule-based) | β | Pure CSS extraction, no LLM needed | | |
| | **Agent 2** | Normalizer | `microsoft/Phi-3.5-mini-instruct` | Novita | Fast, great structured output | | |
| | **Agent 4** | Generator | `mistralai/Codestral-22B-v0.1` | Novita | Code specialist, JSON formatting | | |
| ### Provider Configuration | |
| Default provider: **Novita** (configurable in `config/agents.yaml`) | |
| Available providers (via HuggingFace Inference Providers): | |
| - **novita** - Default, good balance | |
| - **groq** - Fastest | |
| - **cerebras** - Ultra-fast | |
| - **sambanova** - Good for Llama | |
| - **together** - Wide model selection | |
| ### Cost Tracking | |
| Estimated cost per Stage 2 analysis: **~$0.05** | |
| - Free tier: $0.10/month | |
| - Pro tier: $2.00/month ($9/mo subscription) | |
| --- | |
| ## ποΈ Visual Previews | |
| ### Stage 1: AS-IS Previews (No Enhancements) | |
| Shows raw extracted values exactly as found on the website: | |
| | Preview | What It Shows | | |
| |---------|---------------| | |
| | **Typography** | Actual font rendered with detected styles | | |
| | **Colors** | Simple swatches with hex, frequency, context, AA status | | |
| | **Spacing** | Visual bars representing each spacing value | | |
| | **Radius** | Boxes with each border-radius applied | | |
| | **Shadows** | Cards with each box-shadow applied | | |
| ### Stage 2: Enhanced Previews (Upgraded) | |
| Shows proposed upgrades and improvements: | |
| | Preview | What It Shows | | |
| |---------|---------------| | |
| | **Typography** | Type scale comparison (1.2, 1.25, 1.333 ratios) | | |
| | **Color Ramps** | 11 shades (50-950) with AA compliance per shade | | |
| --- | |
| ## π Enhanced Extraction (Agent 1) | |
| Agent 1 now extracts from **5 sources** to capture ALL colors: | |
| ``` | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β ENHANCED EXTRACTION SOURCES β | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ | |
| β β | |
| β 1. DOM Computed Styles β | |
| β β’ window.getComputedStyle(element) β | |
| β β’ Captures: color, background-color, border-color, etc. β | |
| β β | |
| β 2. CSS Variables β | |
| β β’ :root { --primary-color: #3860be; } β | |
| β β’ Parses all stylesheets for CSS custom properties β | |
| β β | |
| β 3. SVG Colors β | |
| β β’ <svg fill="#00c4cc"> β | |
| β β’ <path stroke="#3860be"> β | |
| β β | |
| β 4. Inline Styles β | |
| β β’ <div style="background-color: #bcd432;"> β | |
| β β’ Parses style attributes for color values β | |
| β β | |
| β 5. Stylesheet Rules β | |
| β β’ Parses CSS rules that may not be applied to visible elements β | |
| β β’ Catches hover states, pseudo-elements, etc. β | |
| β β | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| ``` | |
| --- | |
| ## π Enhanced Logging | |
| ### Stage 1 Extraction Logs | |
| Shows detailed extraction progress: | |
| ``` | |
| ============================================================ | |
| π₯οΈ DESKTOP EXTRACTION (1440px) | |
| ============================================================ | |
| π‘ Enhanced extraction from 5 sources: | |
| 1. DOM computed styles (getComputedStyle) | |
| 2. CSS variables (:root { --color: }) | |
| 3. SVG colors (fill, stroke) | |
| 4. Inline styles (style='color:') | |
| 5. Stylesheet rules (CSS files) | |
| 6. External CSS files (fetch & parse) | |
| 7. Page content scan (brute-force) | |
| π EXTRACTION RESULTS: | |
| Colors: 45 unique | |
| Typography: 12 styles | |
| Spacing: 28 values | |
| Radius: 8 values | |
| Shadows: 4 values | |
| π¨ CSS Variables found: 15 | |
| --primary-color: #3860be | |
| --accent-color: #00c4cc | |
| --brand-lime: #bcd432 | |
| ... and 12 more | |
| π Normalizing (deduping, naming)... | |
| β Normalized: 32 colors, 10 typography, 18 spacing | |
| ============================================================ | |
| π₯ FIRECRAWL CSS EXTRACTION | |
| ============================================================ | |
| π Scraping: https://example.com | |
| β Page scraped (125000 chars) | |
| π Parsing <style> blocks... | |
| Found 5 style blocks | |
| π Finding linked CSS files... | |
| Found 8 CSS files | |
| π Fetching: main.css... | |
| β Parsed (234 colors) | |
| π Fetching: theme.css... | |
| β Parsed (45 colors) | |
| π FIRECRAWL RESULTS: | |
| CSS files parsed: 8 | |
| Style blocks parsed: 5 | |
| CSS variables found: 23 | |
| Unique colors found: 156 | |
| π¨ Top colors found: | |
| #06b2c4 (used 45x) | |
| #c1df1f (used 38x) | |
| #373737 (used 120x) | |
| π Merging Firecrawl colors with Playwright extraction... | |
| β Added 12 new colors from Firecrawl | |
| π Total colors now: 44 | |
| ============================================================ | |
| π§ SEMANTIC COLOR ANALYSIS | |
| ============================================================ | |
| π Analyzing 143 colors... | |
| Using rule-based analysis (no LLM) | |
| π SEMANTIC ANALYSIS RESULTS: | |
| π¨ BRAND COLORS: | |
| primary: #06b2c4 (high) | |
| ββ Most frequent saturated color on interactive elements (freq: 33) | |
| secondary: #c1df1f (medium) | |
| ββ Second most frequent brand color (freq: 15) | |
| π TEXT COLORS: | |
| primary: #373737 (high) | |
| secondary: #666666 (medium) | |
| πΌοΈ BACKGROUND COLORS: | |
| primary: #ffffff (high) | |
| secondary: #f5f5f5 (medium) | |
| π SUMMARY: | |
| Total colors analyzed: 143 | |
| Brand colors found: 2 | |
| Clear hierarchy: Yes | |
| Analysis method: rule-based | |
| ``` | |
| ### Stage 2 LLM Analysis Logs (With Semantic Context) | |
| Shows detailed reasoning from each agent WITH semantic context: | |
| ``` | |
| ============================================================ | |
| π§ STAGE 2: MULTI-AGENT ANALYSIS | |
| ============================================================ | |
| π§ SEMANTIC CONTEXT FROM STAGE 1: | |
| Brand Primary: #06b2c4 | |
| Text Primary: #373737 | |
| Analysis Method: rule-based | |
| ======================================================= | |
| π€ LLM 1: meta-llama/Llama-3.1-70B-Instruct | |
| ======================================================= | |
| Provider: novita | |
| π° Cost: $0.29/M in, $0.59/M out | |
| π Task: Typography, Colors, AA, Spacing analysis | |
| π§ Semantic context: Yes β NEW: LLM knows color roles! | |
| π LLM 1 FINDINGS: | |
| COLORS (with semantic context): | |
| ββ Brand Primary (#06b2c4): "Fails AA on white (3.2:1)" | |
| ββ Suggested fix: "#0891a8 (4.6:1)" | |
| ββ Score: 6/10 | |
| ======================================================= | |
| π― HEAD: Compiling final recommendations... | |
| ======================================================= | |
| π₯ INPUT: Analyzing outputs from LLM 1 + LLM 2 + Rules + Semantic... | |
| π HEAD SYNTHESIS: | |
| COLOR RECOMMENDATIONS (per semantic role): | |
| ββ brand.primary: #06b2c4 β Keep for branding, use #0891a8 for text | |
| ββ text.primary: #373737 β Keep (passes AA) | |
| ββ Generate ramps for: brand.primary, brand.secondary, neutral | |
| ``` | |
| --- | |
| ## π€ Agent Personas | |
| ### Agent 1A: Website Crawler & Enhanced Extractor | |
| - **Persona:** Meticulous Design Archaeologist | |
| - **Tool:** Playwright | |
| - **Job:** | |
| - Auto-discover 10+ pages from base URL | |
| - Crawl Desktop (1440px) + Mobile (375px) separately | |
| - Scroll to bottom + wait for network idle | |
| - **ENHANCED: Extract from 7 sources:** | |
| 1. DOM computed styles (`getComputedStyle`) | |
| 2. CSS variables (`:root { --primary: #xxx }`) | |
| 3. SVG colors (`fill`, `stroke` attributes) | |
| 4. Inline styles (`style="background-color: #xxx"`) | |
| 5. Stylesheet rules (CSS files, hover states, pseudo-elements) | |
| 6. External CSS files (fetch & parse to bypass CORS) | |
| 7. Page content scan (brute-force regex on HTML) | |
| - **Output:** Raw tokens with frequency, context, confidence, source type | |
| ### Agent 1B: Firecrawl CSS Deep Diver | |
| - **Persona:** CSS Deep Diver | |
| - **Tool:** Firecrawl / httpx fallback | |
| - **Job:** | |
| - Fetch and parse ALL linked CSS files | |
| - Extract colors from CSS rules and variables | |
| - Bypass CORS restrictions | |
| - Find colors missed by DOM inspection | |
| - **Output:** Additional colors merged into main extraction | |
| ### Agent 1C: Semantic Color Analyzer (NEW - LLM) | |
| - **Persona:** Design System Semanticist | |
| - **Tool:** Rule-based analysis (LLM optional) | |
| - **Job:** | |
| - Analyze colors based on actual CSS usage (not guessing) | |
| - Categorize into semantic roles: | |
| - **Brand Colors:** Used on buttons, CTAs, links (interactive elements) | |
| - **Text Colors:** Used with `color` property on p, span, h1-h6 | |
| - **Background Colors:** Used with `background-color` on containers | |
| - **Border Colors:** Used with `border-color` properties | |
| - **Feedback Colors:** Error (red), success (green), warning (yellow) | |
| - Detect color hierarchy (primary β secondary β muted) | |
| - **Input:** Colors WITH context data (css_properties, elements, frequency) | |
| - **Output:** Semantic categorization with confidence levels | |
| - **Why:** Stage 2 LLMs can now give SPECIFIC recommendations per role | |
| ### Agent 2: Token Normalizer & Structurer | |
| - **Persona:** Design System Librarian | |
| - **Job:** | |
| - Clean noisy extraction, dedupe | |
| - Infer naming patterns | |
| - Tag tokens as: `detected` | `inferred` | `low-confidence` | |
| - **Output:** Structured token sets with metadata | |
| ### Agent 3: Design System Best Practices Advisor | |
| - **Persona:** Senior Staff Design Systems Architect | |
| - **Job:** | |
| - Research modern DS patterns (Material, Polaris, Carbon, etc.) | |
| - Propose upgrade OPTIONS (not decisions) | |
| - Suggest: type scales (3 options), spacing (8px), color ramps (AA compliant), naming conventions | |
| - **Output:** Option sets with rationale | |
| ### Agent 4: Plugin & JSON Generator | |
| - **Persona:** Automation Engineer | |
| - **Job:** | |
| - Convert finalized tokens to Figma-compatible JSON | |
| - Generate: typography, color (with tints/shades), spacing variables | |
| - Maintain Desktop + Mobile + version metadata | |
| - **Output:** Production-ready JSON (flat structure for Figma Tokens Studio) | |
| --- | |
| ## π₯οΈ UI Stages (3 Stages) | |
| ### Stage 1: Extraction Review (AS-IS) | |
| - **Purpose:** Trust building β show exactly what was extracted | |
| - **Shows:** | |
| - Token tables (colors, typography, spacing) | |
| - **6 Visual Preview Tabs (AS-IS, no enhancements):** | |
| 1. π€ Typography β actual font rendered | |
| 2. π¨ Colors β simple swatches sorted by frequency (no ramps) | |
| 3. π§ Semantic Colors β colors organized by usage (brand/text/bg/border) | |
| 4. π Spacing β visual bars | |
| 5. π Radius β rounded boxes | |
| 6. π Shadows β shadow cards | |
| - **Human Actions:** Accept/reject tokens, flag anomalies, toggle DesktopβMobile | |
| ### Stage 2: Upgrade Playground (MOST IMPORTANT) | |
| - **Purpose:** Decision-making through live visuals | |
| - **Shows:** | |
| - Side-by-side option selector + live preview | |
| - **Color Ramps (50-950 shades with AA compliance)** | |
| - Type scale options (1.2, 1.25, 1.333) | |
| - **Semantic-aware recommendations:** "Your brand primary #06b2c4 fails AA, consider #0891a8" | |
| - **Human Actions:** Select type scale A/B/C, spacing system, color ramps β preview updates instantly | |
| ### Stage 3: Final Review & Export | |
| - **Purpose:** Confidence before export | |
| - **Shows:** Token preview, JSON tree, diff view (original vs final) | |
| - **Human Actions:** Download JSON, save version, label version | |
| --- | |
| ## π Project Structure | |
| ``` | |
| design-system-extractor/ | |
| βββ app.py # Gradio main entry point | |
| βββ requirements.txt | |
| βββ README.md | |
| β | |
| βββ config/ | |
| β βββ .env.example # Environment variables template | |
| β βββ agents.yaml # Agent personas & configurations | |
| β βββ settings.py # Application settings | |
| β | |
| βββ agents/ | |
| β βββ __init__.py | |
| β βββ state.py # LangGraph state definitions | |
| β βββ graph.py # LangGraph workflow orchestration | |
| β βββ crawler.py # Agent 1A: Website crawler | |
| β βββ extractor.py # Agent 1A: Token extraction (7 sources) | |
| β βββ firecrawl_extractor.py # Agent 1B: Deep CSS parsing | |
| β βββ semantic_analyzer.py # Agent 1C: Semantic color categorization | |
| β βββ normalizer.py # Agent 2: Token normalization | |
| β βββ advisor.py # Agent 3: Best practices | |
| β βββ stage2_graph.py # Stage 2 multi-agent LLM workflow | |
| β βββ generator.py # Agent 4: JSON generator | |
| β | |
| βββ core/ | |
| β βββ __init__.py | |
| β βββ color_utils.py # Color analysis, contrast, ramps | |
| β βββ preview_generator.py # HTML preview generation | |
| β βββ hf_inference.py # HuggingFace LLM inference | |
| β βββ token_schema.py # Token data structures (Pydantic) | |
| β | |
| βββ ui/ | |
| β βββ __init__.py | |
| β | |
| βββ templates/ | |
| β | |
| βββ storage/ | |
| β βββ __init__.py | |
| β | |
| βββ tests/ | |
| β βββ __init__.py | |
| β | |
| βββ docs/ | |
| βββ CONTEXT.md # THIS FILE - upload for context refresh | |
| ``` | |
| --- | |
| ## π§ Key Technical Decisions | |
| | Decision | Choice | Rationale | | |
| |----------|--------|-----------| | |
| | Viewports | Fixed 1440px + 375px | Simplicity, covers main use cases | | |
| | Scrolling | Bottom + network idle | Captures lazy-loaded content | | |
| | Infinite scroll | Skip | Avoid complexity | | |
| | Modals | Manual trigger | User decides what to capture | | |
| | Color ramps | 5-10 shades, AA compliant | Industry standard | | |
| | Type scales | 3 options (1.25, 1.333, 1.414) | User selects | | |
| | Spacing | 8px base system | Modern standard | | |
| | ML models | Minimal, rule-based preferred | Simplicity, reliability | | |
| | Versioning | HF Spaces persistent storage | Built-in, free | | |
| | Preview | Gradio + iframe (best for dynamic) | Smooth updates | | |
| --- | |
| ## π Token Schema (Core Data Structures) | |
| ```python | |
| class TokenSource(Enum): | |
| DETECTED = "detected" # Directly found in CSS | |
| INFERRED = "inferred" # Derived from patterns | |
| UPGRADED = "upgraded" # User-selected improvement | |
| class Confidence(Enum): | |
| HIGH = "high" # 10+ occurrences | |
| MEDIUM = "medium" # 3-9 occurrences | |
| LOW = "low" # 1-2 occurrences | |
| class Viewport(Enum): | |
| DESKTOP = "desktop" # 1440px | |
| MOBILE = "mobile" # 375px | |
| ``` | |
| ### Token Types: | |
| - **ColorToken:** value, frequency, contexts, elements, contrast ratios | |
| - **TypographyToken:** family, size, weight, line-height, elements | |
| - **SpacingToken:** value, frequency, contexts, fits_base_8 | |
| - **RadiusToken:** value, frequency, elements | |
| - **ShadowToken:** value, frequency, elements | |
| --- | |
| ## π LangGraph Workflow | |
| ``` | |
| βββββββββββββββ | |
| β START β | |
| ββββββββ¬βββββββ | |
| β | |
| βΌ | |
| βββββββββββββββ | |
| β URL Input β | |
| ββββββββ¬βββββββ | |
| β | |
| βΌ | |
| ββββββββββββββββββββββββββ | |
| β Agent 1: Discover β | |
| β (find pages) β | |
| βββββββββββββ¬βββββββββββββ | |
| β | |
| βΌ | |
| ββββββββββββββββββββββββββ | |
| β HUMAN: Confirm pages βββββ Checkpoint 1 | |
| βββββββββββββ¬βββββββββββββ | |
| β | |
| βΌ | |
| ββββββββββββββββββββββββββ | |
| β Agent 1: Extract β | |
| β (crawl & extract) β | |
| βββββββββββββ¬βββββββββββββ | |
| β | |
| βΌ | |
| ββββββββββββββββββββββββββ | |
| β Agent 2: Normalize β | |
| βββββββββββββ¬βββββββββββββ | |
| β | |
| βΌ | |
| ββββββββββββββββββββββββββ | |
| β HUMAN: Review tokens βββββ Checkpoint 2 (Stage 1 UI) | |
| βββββββββββββ¬βββββββββββββ | |
| β | |
| βββββββββββββββββ΄ββββββββββββββββ | |
| β β | |
| βΌ βΌ | |
| ββββββββββββββββββββ ββββββββββββββββββββ | |
| β Agent 3: Advise β β (parallel) β | |
| β (best practices) β β β | |
| ββββββββββ¬ββββββββββ ββββββββββββββββββββ | |
| β | |
| βΌ | |
| ββββββββββββββββββββββββββ | |
| β HUMAN: Select options βββββ Checkpoint 3 (Stage 2 UI) | |
| βββββββββββββ¬βββββββββββββ | |
| β | |
| βΌ | |
| ββββββββββββββββββββββββββ | |
| β Agent 4: Generate β | |
| β (final JSON) β | |
| βββββββββββββ¬βββββββββββββ | |
| β | |
| βΌ | |
| ββββββββββββββββββββββββββ | |
| β HUMAN: Export βββββ Checkpoint 4 (Stage 3 UI) | |
| βββββββββββββ¬βββββββββββββ | |
| β | |
| βΌ | |
| βββββββββββ | |
| β END β | |
| βββββββββββ | |
| ``` | |
| --- | |
| ## π¦ Human-in-the-Loop Rules | |
| 1. **No irreversible automation** | |
| 2. **Agents propose β Humans decide** | |
| 3. **Every auto action must be:** | |
| - Visible | |
| - Reversible | |
| - Previewed | |
| --- | |
| ## π¦ Output JSON Format | |
| ```json | |
| { | |
| "metadata": { | |
| "source_url": "https://example.com", | |
| "extracted_at": "2025-01-23T10:00:00Z", | |
| "version": "v1-recovered", | |
| "viewport": "desktop" | |
| }, | |
| "colors": { | |
| "primary": { | |
| "50": { "value": "#e6f2ff", "source": "upgraded" }, | |
| "500": { "value": "#007bff", "source": "detected" }, | |
| "900": { "value": "#001a33", "source": "upgraded" } | |
| } | |
| }, | |
| "typography": { | |
| "heading-xl": { | |
| "fontFamily": "Inter", | |
| "fontSize": "32px", | |
| "fontWeight": 700, | |
| "lineHeight": "1.2", | |
| "source": "detected" | |
| } | |
| }, | |
| "spacing": { | |
| "xs": { "value": "4px", "source": "upgraded" }, | |
| "sm": { "value": "8px", "source": "detected" }, | |
| "md": { "value": "16px", "source": "detected" } | |
| } | |
| } | |
| ``` | |
| --- | |
| ## π οΈ Implementation Phases & Current Status | |
| ### Phase 1 β COMPLETE | |
| - [x] Project structure | |
| - [x] Configuration files | |
| - [x] Token schema (Pydantic models) | |
| - [x] Agent 1: Crawler (page discovery) | |
| - [x] Agent 1: Enhanced Extractor (5-source extraction) | |
| - [x] Agent 2: Normalizer | |
| - [x] Stage 1 UI with 5 AS-IS preview tabs | |
| - [x] LangGraph basic workflow | |
| - [x] JSON export (flat structure for Figma) | |
| ### Phase 2 β MOSTLY COMPLETE | |
| - [x] Agent 3: Multi-LLM Advisor (Qwen + Llama + HEAD) | |
| - [x] Stage 2 UI (Upgrade Playground) | |
| - [x] Live preview system (typography, color ramps) | |
| - [x] Enhanced LLM logging with reasoning | |
| - [ ] Accept/Reject checkbox wiring to export | |
| ### Phase 3 π IN PROGRESS | |
| - [ ] Agent 4: Generator (component patterns) | |
| - [ ] Stage 3 UI (diff view) | |
| - [ ] Arabic page filtering | |
| ### Phase 4 β³ PENDING | |
| - [ ] Full LangGraph orchestration | |
| - [ ] HF Spaces deployment | |
| - [ ] Persistent storage | |
| - [ ] MCP Claude / Figma plugin integration (Part 2 of article) | |
| --- | |
| ## π Known Issues & Pending Fixes | |
| | Issue | Status | Fix | | |
| |-------|--------|-----| | |
| | Arabic pages included | Pending | Filter `/ar/` URLs in crawler | | |
| | Accept/Reject not wired | Pending | Export should respect checkbox state | | |
| | Stage 1 vs Stage 2 preview confusion | β Fixed | Stage 1 now shows AS-IS (no ramps) | | |
| | Colors missed from CSS variables | β Fixed | Enhanced 5-source extraction | | |
| | JSON nested structure | β Fixed | Flat structure for Figma compatibility | | |
| --- | |
| ## π Environment Variables | |
| ```env | |
| # Required | |
| HF_TOKEN=your_huggingface_token | |
| # Model Configuration (defaults shown β diverse providers) | |
| AGENT2_MODEL=microsoft/Phi-3.5-mini-instruct # Microsoft - Fast naming | |
| AGENT3_MODEL=meta-llama/Llama-3.1-70B-Instruct # Meta - Strong reasoning | |
| AGENT4_MODEL=mistralai/Codestral-22B-v0.1 # Mistral - Code/JSON | |
| # Optional | |
| DEBUG=true | |
| LOG_LEVEL=INFO | |
| ``` | |
| --- | |
| ## π Notes for Claude | |
| When continuing this project: | |
| 1. **Check current phase** in Implementation Phases section | |
| 2. **Review agent personas** in agents.yaml for consistent behavior | |
| 3. **Follow token schema** defined in core/token_schema.py | |
| 4. **Maintain LangGraph state** consistency across agents | |
| 5. **Use Gradio components** from ui/components.py for consistency | |
| 6. **Test with** real websites before deployment | |
| 7. **Enhanced extraction** captures from 5 sources β check logs to verify | |
| 8. **Stage 1 = AS-IS** (no ramps), **Stage 2 = Enhanced** (with ramps) | |
| --- | |
| *Last updated: 2025-01-23* | |