Spaces:
Sleeping
Design System Extractor v2 β Master Context File
Upload this file to refresh Claude's context when continuing work on this project.
Last Updated: January 2026
π Files Changed in Latest Session
| File | What Changed |
|---|---|
agents/extractor.py |
Enhanced 7-source extraction (DOM, CSS vars, SVG, inline, stylesheets, external CSS, page scan) |
agents/firecrawl_extractor.py |
NEW Agent 1B for deep CSS parsing |
agents/semantic_analyzer.py |
NEW Agent 1C for semantic color categorization (brand/text/bg/border) |
core/preview_generator.py |
AS-IS previews + Color Ramps sorted by brand priority |
app.py |
Stage 1 UI now has 6 preview tabs including Semantic Colors |
docs/CONTEXT.md |
Updated with semantic analyzer, full architecture diagrams |
π― Project Goal
Build a semi-automated, human-in-the-loop agentic system that:
- Reverse-engineers a design system from a live website
- Reconstructs and upgrades it into a modern, scalable design system
- Outputs production-ready JSON tokens (Figma Tokens Studio compatible)
Philosophy: This is a design-aware co-pilot, NOT a magic button. Humans decide, agents propose.
π€ Why This Project? (Market Differentiation)
The Problem We Solve
| Pain Point | Who Has It | Current Solutions | Why They Fail |
|---|---|---|---|
| Legacy websites with no design system | Enterprise teams | Manual audit (weeks) | Time-consuming, error-prone |
| Inconsistent design tokens scattered in CSS | Agencies inheriting projects | Figma plugins (style extractors) | Only extract from Figma, not live sites |
| Need to modernize without breaking existing | Product teams | Design system generators | Generate new, don't reverse-engineer existing |
| AA compliance gaps unknown | Accessibility teams | Contrast checkers | Check one color at a time, no system view |
Existing Tools & Their Gaps
| Tool | What It Does | Gap We Fill |
|---|---|---|
| Figma Tokens Studio | Manages tokens in Figma | Doesn't extract from websites |
| Style Dictionary | Transforms tokens to code | Needs tokens first (we create them) |
| Polypane/VisBug | Inspect live sites | No systematic extraction or upgrade |
| AI Design Tools (Galileo, Uizard) | Generate new designs | Don't reverse-engineer existing |
| CSS Stats | Analyze CSS files | Statistics only, no actionable tokens |
| Chromatic/Percy | Visual regression | Compare, don't extract or upgrade |
Our Unique Value Proposition
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β WHAT MAKES US DIFFERENT β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β 1. REVERSE-ENGINEERING (not generation) β
β β’ Extracts from LIVE websites, not design files β
β β’ Preserves what's working, upgrades what's broken β
β β’ Respects existing brand decisions β
β β
β 2. MULTI-AGENT REASONING (not single LLM) β
β β’ Two analysts with different perspectives β
β β’ HEAD compiler resolves conflicts β
β β’ Shows reasoning, not just results β
β β
β 3. HUMAN-IN-THE-LOOP (not magic button) β
β β’ Designer reviews every stage β
β β’ Accept/reject individual tokens β
β β’ Choose from upgrade OPTIONS, not forced decisions β
β β
β 4. VISUAL PREVIEWS (not just data tables) β
β β’ Typography rendered in actual detected font β
β β’ Color ramps with AA compliance per shade β
β β’ See before you export β
β β
β 5. COST-TRANSPARENT (not black box) β
β β’ Shows token usage and cost per analysis β
β β’ Uses HF free tier ($0.10/mo) or Pro ($2/mo) β
β β’ ~$0.05 per full analysis β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Target Users
| User | Use Case | Value |
|---|---|---|
| UX Managers (like you!) | Modernize legacy booking platforms | Weeks β Hours |
| Design System Teams | Audit and standardize existing properties | Systematic, not ad-hoc |
| Agencies | Onboard client projects with no documentation | Instant design inventory |
| Accessibility Consultants | AA compliance audit with fixes | Full palette view |
| Developers | Get production-ready tokens from designer's website | No manual translation |
Why Not Just Use [X]?
"Why not just inspect the CSS manually?" β You could, but it takes weeks for a complex site. We do it in minutes with systematic coverage.
"Why not use Figma's native styles?" β Many legacy sites were never in Figma. We extract from the source of truth: the live website.
"Why do you need AI? Can't rules handle this?" β Rules extract tokens. AI understands design intent β why is this color used here? What scale was intended? Where does it deviate from best practices?
"Isn't this just CSS Stats with AI?" β CSS Stats tells you what exists. We tell you what it should be and give you actionable upgrade paths.
ποΈ Architecture Overview
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β TECH STACK β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Frontend: Gradio (long-scroll, sectioned UI with live preview) β
β Orchestration: LangGraph (agent state management & workflow) β
β Models: HuggingFace Inference Providers (Novita, Groq, etc.) β
β Hosting: Hugging Face Spaces β
β Storage: HF Spaces persistent storage β
β Output: Platform-agnostic JSON tokens (Figma Tokens Studio) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π§ Model Assignments
Stage 2: Multi-Agent Analysis
| Agent | Role | Model | Provider | Cost |
|---|---|---|---|---|
| LLM 1 | Design Analyst 1 | Qwen/Qwen2.5-72B-Instruct |
Novita | $0.29/M in, $0.59/M out |
| LLM 2 | Design Analyst 2 | meta-llama/Llama-3.3-70B-Instruct |
Novita | $0.59/M in, $0.79/M out |
| HEAD | Compiler | meta-llama/Llama-3.3-70B-Instruct |
Novita | $0.59/M in, $0.79/M out |
| Rules | Calculations | None (Rule-based) | β | FREE |
Architecture:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β PARALLEL ANALYSIS β
β β
β LLM 1 (Qwen) LLM 2 (Llama) Rule Engine β
β ββββββββββββ ββββββββββββ ββββββββββββ β
β β Global β β Western β β Math β β
β β Design β β Design β β Only β β
β β Patterns β β Patterns β β (FREE) β β
β ββββββ¬ββββββ ββββββ¬ββββββ ββββββ¬ββββββ β
β β β β β
β βββββββββββββββββββββββΌββββββββββββββββββββββ β
β β β
β βΌ β
β βββββββββββββββββββ β
β β HEAD COMPILER β β
β β β β
β β β’ Compare views β β
β β β’ Resolve diff β β
β β β’ Final recs β β
β βββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Other Agents
| Agent | Role | Model | Provider | Why |
|---|---|---|---|---|
| Agent 1 | Crawler & Extractor | None (Rule-based) | β | Pure CSS extraction, no LLM needed |
| Agent 2 | Normalizer | microsoft/Phi-3.5-mini-instruct |
Novita | Fast, great structured output |
| Agent 4 | Generator | mistralai/Codestral-22B-v0.1 |
Novita | Code specialist, JSON formatting |
Provider Configuration
Default provider: Novita (configurable in config/agents.yaml)
Available providers (via HuggingFace Inference Providers):
- novita - Default, good balance
- groq - Fastest
- cerebras - Ultra-fast
- sambanova - Good for Llama
- together - Wide model selection
Cost Tracking
Estimated cost per Stage 2 analysis: ~$0.05
- Free tier: $0.10/month
- Pro tier: $2.00/month ($9/mo subscription)
ποΈ Visual Previews
Stage 1: AS-IS Previews (No Enhancements)
Shows raw extracted values exactly as found on the website:
| Preview | What It Shows |
|---|---|
| Typography | Actual font rendered with detected styles |
| Colors | Simple swatches with hex, frequency, context, AA status |
| Spacing | Visual bars representing each spacing value |
| Radius | Boxes with each border-radius applied |
| Shadows | Cards with each box-shadow applied |
Stage 2: Enhanced Previews (Upgraded)
Shows proposed upgrades and improvements:
| Preview | What It Shows |
|---|---|
| Typography | Type scale comparison (1.2, 1.25, 1.333 ratios) |
| Color Ramps | 11 shades (50-950) with AA compliance per shade |
π Enhanced Extraction (Agent 1)
Agent 1 now extracts from 5 sources to capture ALL colors:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β ENHANCED EXTRACTION SOURCES β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β 1. DOM Computed Styles β
β β’ window.getComputedStyle(element) β
β β’ Captures: color, background-color, border-color, etc. β
β β
β 2. CSS Variables β
β β’ :root { --primary-color: #3860be; } β
β β’ Parses all stylesheets for CSS custom properties β
β β
β 3. SVG Colors β
β β’ <svg fill="#00c4cc"> β
β β’ <path stroke="#3860be"> β
β β
β 4. Inline Styles β
β β’ <div style="background-color: #bcd432;"> β
β β’ Parses style attributes for color values β
β β
β 5. Stylesheet Rules β
β β’ Parses CSS rules that may not be applied to visible elements β
β β’ Catches hover states, pseudo-elements, etc. β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π Enhanced Logging
Stage 1 Extraction Logs
Shows detailed extraction progress: ```
π₯οΈ DESKTOP EXTRACTION (1440px)
π‘ Enhanced extraction from 5 sources:
- DOM computed styles (getComputedStyle)
- CSS variables (:root { --color: })
- SVG colors (fill, stroke)
- Inline styles (style='color:')
- Stylesheet rules (CSS files)
- External CSS files (fetch & parse)
- Page content scan (brute-force)
π EXTRACTION RESULTS: Colors: 45 unique Typography: 12 styles Spacing: 28 values Radius: 8 values Shadows: 4 values
π¨ CSS Variables found: 15 --primary-color: #3860be --accent-color: #00c4cc --brand-lime: #bcd432 ... and 12 more
π Normalizing (deduping, naming)... β Normalized: 32 colors, 10 typography, 18 spacing
============================================================ π₯ FIRECRAWL CSS EXTRACTION
π Scraping: https://example.com β Page scraped (125000 chars) π Parsing