riazmo commited on
Commit
9f5ee50
·
verified ·
1 Parent(s): a23a2ee

Upload 20 files

Browse files
Dockerfile ADDED
@@ -0,0 +1,60 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM python:3.11-slim
2
+
3
+ # Set working directory
4
+ WORKDIR /app
5
+
6
+ # Install system dependencies for Playwright
7
+ RUN apt-get update && apt-get install -y \
8
+ wget \
9
+ gnupg \
10
+ ca-certificates \
11
+ fonts-liberation \
12
+ libasound2 \
13
+ libatk-bridge2.0-0 \
14
+ libatk1.0-0 \
15
+ libatspi2.0-0 \
16
+ libcups2 \
17
+ libdbus-1-3 \
18
+ libdrm2 \
19
+ libgbm1 \
20
+ libgtk-3-0 \
21
+ libnspr4 \
22
+ libnss3 \
23
+ libxcomposite1 \
24
+ libxdamage1 \
25
+ libxfixes3 \
26
+ libxkbcommon0 \
27
+ libxrandr2 \
28
+ xdg-utils \
29
+ libu2f-udev \
30
+ libvulkan1 \
31
+ && rm -rf /var/lib/apt/lists/*
32
+
33
+ # Copy requirements first for caching
34
+ COPY requirements.txt .
35
+
36
+ # Install Python dependencies
37
+ RUN pip install --no-cache-dir -r requirements.txt
38
+
39
+ # Install Playwright browsers
40
+ RUN playwright install chromium
41
+ RUN playwright install-deps chromium
42
+
43
+ # Copy application code
44
+ COPY . .
45
+
46
+ # Create non-root user for security
47
+ RUN useradd -m -u 1000 user
48
+ USER user
49
+
50
+ # Set environment variables
51
+ ENV HOME=/home/user \
52
+ PATH=/home/user/.local/bin:$PATH \
53
+ GRADIO_SERVER_NAME=0.0.0.0 \
54
+ GRADIO_SERVER_PORT=7860
55
+
56
+ # Expose port
57
+ EXPOSE 7860
58
+
59
+ # Run the application
60
+ CMD ["python", "app.py"]
README.md CHANGED
@@ -1,10 +1,233 @@
1
  ---
2
- title: Design System Extractor 2
3
- emoji: 🐠
4
- colorFrom: blue
5
- colorTo: purple
6
  sdk: docker
7
  pinned: false
 
8
  ---
9
 
10
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: Design System Extractor v2
3
+ emoji: 🎨
4
+ colorFrom: purple
5
+ colorTo: blue
6
  sdk: docker
7
  pinned: false
8
+ license: mit
9
  ---
10
 
11
+ # Design System Extractor v2
12
+
13
+ > 🎨 A semi-automated, human-in-the-loop agentic system that reverse-engineers design systems from live websites.
14
+
15
+ ## 🎯 What It Does
16
+
17
+ When you have a website but no design system documentation (common when the original Sketch/Figma files are lost), this tool helps you:
18
+
19
+ 1. **Crawl** your website to discover pages
20
+ 2. **Extract** design tokens (colors, typography, spacing, shadows)
21
+ 3. **Review** and validate extracted tokens with visual previews
22
+ 4. **Upgrade** your system with modern best practices (optional)
23
+ 5. **Export** production-ready JSON tokens for Figma/code
24
+
25
+ ## 🧠 Philosophy
26
+
27
+ This is **not a magic button** — it's a design-aware co-pilot.
28
+
29
+ - **Agents propose → Humans decide**
30
+ - **Every action is visible, reversible, and previewed**
31
+ - **No irreversible automation**
32
+
33
+ ## 🏗️ Architecture
34
+
35
+ ```
36
+ ┌──────────────────────────────────────────────────────────────┐
37
+ │ TECH STACK │
38
+ ├──────────────────────────────────────────────────────────────┤
39
+ │ Frontend: Gradio (interactive UI with live preview) │
40
+ │ Orchestration: LangGraph (agent workflow management) │
41
+ │ Models: Claude API (reasoning) + Rule-based │
42
+ │ Browser: Playwright (crawling & extraction) │
43
+ │ Hosting: Hugging Face Spaces │
44
+ └──────────────────────────────────────────────────────────────┘
45
+ ```
46
+
47
+ ### Agent Personas
48
+
49
+ | Agent | Persona | Job |
50
+ |-------|---------|-----|
51
+ | **Agent 1** | Design Archaeologist | Discover pages, extract raw tokens |
52
+ | **Agent 2** | Design System Librarian | Normalize, dedupe, structure tokens |
53
+ | **Agent 3** | Senior DS Architect | Recommend upgrades (type scales, spacing, a11y) |
54
+ | **Agent 4** | Automation Engineer | Generate final JSON for Figma/code |
55
+
56
+ ## 🚀 Quick Start
57
+
58
+ ### Prerequisites
59
+
60
+ - Python 3.11+
61
+ - Node.js (for some dependencies)
62
+
63
+ ### Installation
64
+
65
+ ```bash
66
+ # Clone the repository
67
+ git clone <repo-url>
68
+ cd design-system-extractor
69
+
70
+ # Create virtual environment
71
+ python -m venv venv
72
+ source venv/bin/activate # or `venv\Scripts\activate` on Windows
73
+
74
+ # Install dependencies
75
+ pip install -r requirements.txt
76
+
77
+ # Install Playwright browsers
78
+ playwright install chromium
79
+
80
+ # Copy environment file
81
+ cp config/.env.example config/.env
82
+ # Edit .env and add your ANTHROPIC_API_KEY
83
+ ```
84
+
85
+ ### Running
86
+
87
+ ```bash
88
+ python app.py
89
+ ```
90
+
91
+ Open `http://localhost:7860` in your browser.
92
+
93
+ ## 📖 Usage Guide
94
+
95
+ ### Stage 1: Discovery
96
+
97
+ 1. Enter your website URL (e.g., `https://example.com`)
98
+ 2. Click "Discover Pages"
99
+ 3. Review discovered pages and select which to extract from
100
+ 4. Ensure you have a mix of page types (homepage, listing, detail, etc.)
101
+
102
+ ### Stage 2: Extraction
103
+
104
+ 1. Choose viewport (Desktop 1440px or Mobile 375px)
105
+ 2. Click "Extract Tokens"
106
+ 3. Review extracted:
107
+ - **Colors**: With frequency, context, and AA compliance
108
+ - **Typography**: Font families, sizes, weights
109
+ - **Spacing**: Values with 8px grid fit indicators
110
+ 4. Accept or reject individual tokens
111
+
112
+ ### Stage 3: Export
113
+
114
+ 1. Review final token set
115
+ 2. Export as JSON
116
+ 3. Import into Figma via Tokens Studio or your plugin
117
+
118
+ ## 📁 Project Structure
119
+
120
+ ```
121
+ design-system-extractor/
122
+ ├── app.py # Main Gradio application
123
+ ├── requirements.txt
124
+ ├── README.md
125
+
126
+ ├── config/
127
+ │ ├── .env.example # Environment template
128
+ │ ├── agents.yaml # Agent personas & settings
129
+ │ └── settings.py # Configuration loader
130
+
131
+ ├── agents/
132
+ │ ├── state.py # LangGraph state definitions
133
+ │ ├── graph.py # Workflow orchestration
134
+ │ ├── crawler.py # Agent 1: Page discovery
135
+ │ ├── extractor.py # Agent 1: Token extraction
136
+ │ ├── normalizer.py # Agent 2: Normalization
137
+ │ ├── advisor.py # Agent 3: Best practices
138
+ │ └── generator.py # Agent 4: JSON generation
139
+
140
+ ├── core/
141
+ │ ├── token_schema.py # Pydantic data models
142
+ │ └── color_utils.py # Color analysis utilities
143
+
144
+ ├── ui/
145
+ │ └── (Gradio components)
146
+
147
+ └── docs/
148
+ └── CONTEXT.md # Context file for AI assistance
149
+ ```
150
+
151
+ ## 🔧 Configuration
152
+
153
+ ### Environment Variables
154
+
155
+ ```env
156
+ # Required
157
+ ANTHROPIC_API_KEY=your_key_here
158
+
159
+ # Optional
160
+ DEBUG=false
161
+ LOG_LEVEL=INFO
162
+ BROWSER_HEADLESS=true
163
+ ```
164
+
165
+ ### Agent Configuration
166
+
167
+ Agent personas and behavior are defined in `config/agents.yaml`. This includes:
168
+
169
+ - Extraction targets (colors, typography, spacing)
170
+ - Naming conventions
171
+ - Confidence thresholds
172
+ - Upgrade options
173
+
174
+ ## 🛠️ Development
175
+
176
+ ### Running Tests
177
+
178
+ ```bash
179
+ pytest tests/
180
+ ```
181
+
182
+ ### Adding New Features
183
+
184
+ 1. Update token schema in `core/token_schema.py`
185
+ 2. Add agent logic in `agents/`
186
+ 3. Update UI in `app.py`
187
+ 4. Update `docs/CONTEXT.md` for AI assistance
188
+
189
+ ## 📦 Output Format
190
+
191
+ Tokens are exported in a platform-agnostic JSON format:
192
+
193
+ ```json
194
+ {
195
+ "metadata": {
196
+ "source_url": "https://example.com",
197
+ "version": "v1-recovered",
198
+ "viewport": "desktop"
199
+ },
200
+ "colors": {
201
+ "primary-500": {
202
+ "value": "#007bff",
203
+ "source": "detected",
204
+ "contrast_white": 4.5
205
+ }
206
+ },
207
+ "typography": {
208
+ "heading-lg": {
209
+ "fontFamily": "Inter",
210
+ "fontSize": "24px",
211
+ "fontWeight": 700
212
+ }
213
+ },
214
+ "spacing": {
215
+ "md": {
216
+ "value": "16px",
217
+ "source": "detected"
218
+ }
219
+ }
220
+ }
221
+ ```
222
+
223
+ ## 🤝 Contributing
224
+
225
+ Contributions are welcome! Please read the contribution guidelines first.
226
+
227
+ ## 📄 License
228
+
229
+ MIT
230
+
231
+ ---
232
+
233
+ Built with ❤️ for designers who've lost their source files.
agents/__init__.py ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Agents for Design System Extractor v2.
3
+
4
+ This package contains the LangGraph agents:
5
+ - Agent 1: Crawler & Extractor (Design Archaeologist)
6
+ - Agent 2: Normalizer (Design System Librarian)
7
+ - Agent 3: Advisor (Senior Staff DS Architect)
8
+ - Agent 4: Generator (Automation Engineer)
9
+ """
10
+
11
+ from agents.state import (
12
+ AgentState,
13
+ create_initial_state,
14
+ get_stage_progress,
15
+ )
16
+
17
+ from agents.graph import (
18
+ build_workflow_graph,
19
+ WorkflowRunner,
20
+ create_workflow,
21
+ )
22
+
23
+ __all__ = [
24
+ # State
25
+ "AgentState",
26
+ "create_initial_state",
27
+ "get_stage_progress",
28
+ # Graph
29
+ "build_workflow_graph",
30
+ "WorkflowRunner",
31
+ "create_workflow",
32
+ ]
agents/crawler.py ADDED
@@ -0,0 +1,350 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Agent 1: Website Crawler
3
+ Design System Extractor v2
4
+
5
+ Persona: Meticulous Design Archaeologist
6
+
7
+ Responsibilities:
8
+ - Auto-discover pages from base URL
9
+ - Classify page types (homepage, listing, detail, etc.)
10
+ - Prepare page list for user confirmation
11
+ """
12
+
13
+ import asyncio
14
+ import re
15
+ from urllib.parse import urljoin, urlparse
16
+ from typing import Optional, Callable
17
+ from datetime import datetime
18
+
19
+ from playwright.async_api import async_playwright, Browser, Page, BrowserContext
20
+
21
+ from core.token_schema import DiscoveredPage, PageType, Viewport
22
+ from config.settings import get_settings
23
+
24
+
25
+ class PageDiscoverer:
26
+ """
27
+ Discovers pages from a website for design system extraction.
28
+
29
+ This is the first part of Agent 1's job — finding pages before
30
+ the human confirms which ones to crawl.
31
+ """
32
+
33
+ def __init__(self):
34
+ self.settings = get_settings()
35
+ self.browser: Optional[Browser] = None
36
+ self.context: Optional[BrowserContext] = None
37
+ self.visited_urls: set[str] = set()
38
+ self.discovered_pages: list[DiscoveredPage] = []
39
+
40
+ async def __aenter__(self):
41
+ """Async context manager entry."""
42
+ await self._init_browser()
43
+ return self
44
+
45
+ async def __aexit__(self, exc_type, exc_val, exc_tb):
46
+ """Async context manager exit."""
47
+ await self._close_browser()
48
+
49
+ async def _init_browser(self):
50
+ """Initialize Playwright browser."""
51
+ playwright = await async_playwright().start()
52
+ self.browser = await playwright.chromium.launch(
53
+ headless=self.settings.browser.headless
54
+ )
55
+ self.context = await self.browser.new_context(
56
+ viewport={
57
+ "width": self.settings.viewport.desktop_width,
58
+ "height": self.settings.viewport.desktop_height,
59
+ },
60
+ user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36"
61
+ )
62
+
63
+ async def _close_browser(self):
64
+ """Close browser and cleanup."""
65
+ if self.context:
66
+ await self.context.close()
67
+ if self.browser:
68
+ await self.browser.close()
69
+
70
+ def _normalize_url(self, url: str, base_url: str) -> Optional[str]:
71
+ """Normalize and validate URL."""
72
+ # Handle relative URLs
73
+ if not url.startswith(('http://', 'https://')):
74
+ url = urljoin(base_url, url)
75
+
76
+ parsed = urlparse(url)
77
+ base_parsed = urlparse(base_url)
78
+
79
+ # Only allow same domain
80
+ if parsed.netloc != base_parsed.netloc:
81
+ return None
82
+
83
+ # Remove fragments and normalize
84
+ normalized = f"{parsed.scheme}://{parsed.netloc}{parsed.path}"
85
+
86
+ # Remove trailing slash for consistency
87
+ if normalized.endswith('/') and len(normalized) > len(f"{parsed.scheme}://{parsed.netloc}/"):
88
+ normalized = normalized.rstrip('/')
89
+
90
+ return normalized
91
+
92
+ def _classify_page_type(self, url: str, title: str = "") -> PageType:
93
+ """
94
+ Classify page type based on URL patterns and title.
95
+
96
+ This is a heuristic — not perfect, but good enough for discovery.
97
+ """
98
+ url_lower = url.lower()
99
+ title_lower = title.lower() if title else ""
100
+
101
+ # Check URL patterns
102
+ patterns = {
103
+ PageType.HOMEPAGE: [r'/$', r'/home$', r'/index'],
104
+ PageType.LISTING: [r'/products', r'/catalog', r'/list', r'/category', r'/collection', r'/search'],
105
+ PageType.DETAIL: [r'/product/', r'/item/', r'/detail/', r'/p/', r'/[a-z-]+/\d+'],
106
+ PageType.FORM: [r'/contact', r'/form', r'/apply', r'/submit', r'/register'],
107
+ PageType.AUTH: [r'/login', r'/signin', r'/signup', r'/auth', r'/account'],
108
+ PageType.CHECKOUT: [r'/cart', r'/checkout', r'/basket', r'/payment'],
109
+ PageType.MARKETING: [r'/landing', r'/promo', r'/campaign', r'/offer'],
110
+ PageType.ABOUT: [r'/about', r'/team', r'/company', r'/story'],
111
+ PageType.CONTACT: [r'/contact', r'/support', r'/help'],
112
+ }
113
+
114
+ for page_type, url_patterns in patterns.items():
115
+ for pattern in url_patterns:
116
+ if re.search(pattern, url_lower):
117
+ return page_type
118
+
119
+ # Check title patterns
120
+ title_patterns = {
121
+ PageType.HOMEPAGE: ['home', 'welcome'],
122
+ PageType.LISTING: ['products', 'catalog', 'collection', 'browse'],
123
+ PageType.DETAIL: ['product', 'item'],
124
+ PageType.AUTH: ['login', 'sign in', 'sign up', 'register'],
125
+ PageType.ABOUT: ['about', 'our story', 'team'],
126
+ PageType.CONTACT: ['contact', 'get in touch', 'support'],
127
+ }
128
+
129
+ for page_type, keywords in title_patterns.items():
130
+ for keyword in keywords:
131
+ if keyword in title_lower:
132
+ return page_type
133
+
134
+ return PageType.OTHER
135
+
136
+ async def _extract_links(self, page: Page, base_url: str) -> list[str]:
137
+ """Extract all internal links from a page."""
138
+ links = await page.evaluate("""
139
+ () => {
140
+ const links = Array.from(document.querySelectorAll('a[href]'));
141
+ return links.map(a => a.href).filter(href =>
142
+ href &&
143
+ !href.startsWith('javascript:') &&
144
+ !href.startsWith('mailto:') &&
145
+ !href.startsWith('tel:') &&
146
+ !href.includes('#')
147
+ );
148
+ }
149
+ """)
150
+
151
+ # Normalize and filter
152
+ valid_links = []
153
+ for link in links:
154
+ normalized = self._normalize_url(link, base_url)
155
+ if normalized and normalized not in self.visited_urls:
156
+ valid_links.append(normalized)
157
+
158
+ return list(set(valid_links))
159
+
160
+ async def _get_page_title(self, page: Page) -> str:
161
+ """Get page title."""
162
+ try:
163
+ return await page.title()
164
+ except Exception:
165
+ return ""
166
+
167
+ async def discover(
168
+ self,
169
+ base_url: str,
170
+ max_pages: int = None,
171
+ progress_callback: Optional[Callable[[float], None]] = None
172
+ ) -> list[DiscoveredPage]:
173
+ """
174
+ Discover pages from a website.
175
+
176
+ Args:
177
+ base_url: The starting URL
178
+ max_pages: Maximum pages to discover (default from settings)
179
+ progress_callback: Optional callback for progress updates
180
+
181
+ Returns:
182
+ List of discovered pages
183
+ """
184
+ max_pages = max_pages or self.settings.crawl.max_pages
185
+
186
+ async with self:
187
+ # Start with homepage
188
+ normalized_base = self._normalize_url(base_url, base_url)
189
+ if not normalized_base:
190
+ raise ValueError(f"Invalid base URL: {base_url}")
191
+
192
+ queue = [normalized_base]
193
+ self.visited_urls = set()
194
+ self.discovered_pages = []
195
+
196
+ while queue and len(self.discovered_pages) < max_pages:
197
+ current_url = queue.pop(0)
198
+
199
+ if current_url in self.visited_urls:
200
+ continue
201
+
202
+ self.visited_urls.add(current_url)
203
+
204
+ try:
205
+ page = await self.context.new_page()
206
+
207
+ # Navigate to page
208
+ await page.goto(
209
+ current_url,
210
+ wait_until="networkidle",
211
+ timeout=self.settings.browser.timeout
212
+ )
213
+
214
+ # Get page info
215
+ title = await self._get_page_title(page)
216
+ page_type = self._classify_page_type(current_url, title)
217
+ depth = len(urlparse(current_url).path.split('/')) - 1
218
+
219
+ # Create discovered page
220
+ discovered = DiscoveredPage(
221
+ url=current_url,
222
+ title=title,
223
+ page_type=page_type,
224
+ depth=depth,
225
+ selected=True,
226
+ )
227
+ self.discovered_pages.append(discovered)
228
+
229
+ # Extract links for further crawling
230
+ new_links = await self._extract_links(page, base_url)
231
+
232
+ # Prioritize certain page types
233
+ priority_patterns = ['/product', '/listing', '/category', '/about', '/contact']
234
+ priority_links = [l for l in new_links if any(p in l.lower() for p in priority_patterns)]
235
+ other_links = [l for l in new_links if l not in priority_links]
236
+
237
+ # Add to queue (priority first)
238
+ for link in priority_links + other_links:
239
+ if link not in self.visited_urls and link not in queue:
240
+ queue.append(link)
241
+
242
+ await page.close()
243
+
244
+ # Progress callback
245
+ if progress_callback:
246
+ progress = len(self.discovered_pages) / max_pages
247
+ progress_callback(min(progress, 1.0))
248
+
249
+ # Rate limiting
250
+ await asyncio.sleep(self.settings.crawl.crawl_delay_ms / 1000)
251
+
252
+ except Exception as e:
253
+ # Log error but continue
254
+ discovered = DiscoveredPage(
255
+ url=current_url,
256
+ title="",
257
+ page_type=PageType.OTHER,
258
+ depth=0,
259
+ selected=False,
260
+ error=str(e),
261
+ )
262
+ self.discovered_pages.append(discovered)
263
+
264
+ return self.discovered_pages
265
+
266
+ def get_pages_by_type(self) -> dict[PageType, list[DiscoveredPage]]:
267
+ """Group discovered pages by type."""
268
+ grouped: dict[PageType, list[DiscoveredPage]] = {}
269
+ for page in self.discovered_pages:
270
+ if page.page_type not in grouped:
271
+ grouped[page.page_type] = []
272
+ grouped[page.page_type].append(page)
273
+ return grouped
274
+
275
+ def get_suggested_pages(self, min_pages: int = None) -> list[DiscoveredPage]:
276
+ """
277
+ Get suggested pages for extraction.
278
+
279
+ Ensures diversity of page types and prioritizes key templates.
280
+ """
281
+ min_pages = min_pages or self.settings.crawl.min_pages
282
+
283
+ # Priority order for page types
284
+ priority_types = [
285
+ PageType.HOMEPAGE,
286
+ PageType.LISTING,
287
+ PageType.DETAIL,
288
+ PageType.FORM,
289
+ PageType.MARKETING,
290
+ PageType.AUTH,
291
+ PageType.ABOUT,
292
+ PageType.CONTACT,
293
+ PageType.OTHER,
294
+ ]
295
+
296
+ selected = []
297
+ grouped = self.get_pages_by_type()
298
+
299
+ # First pass: get at least one of each priority type
300
+ for page_type in priority_types:
301
+ if page_type in grouped and grouped[page_type]:
302
+ # Take the first (usually shallowest) page of this type
303
+ page = sorted(grouped[page_type], key=lambda p: p.depth)[0]
304
+ if page not in selected:
305
+ selected.append(page)
306
+
307
+ # Second pass: fill up to min_pages with remaining pages
308
+ remaining = [p for p in self.discovered_pages if p not in selected and not p.error]
309
+ remaining.sort(key=lambda p: p.depth)
310
+
311
+ while len(selected) < min_pages and remaining:
312
+ selected.append(remaining.pop(0))
313
+
314
+ # Mark as selected
315
+ for page in selected:
316
+ page.selected = True
317
+
318
+ return selected
319
+
320
+
321
+ # =============================================================================
322
+ # CONVENIENCE FUNCTIONS
323
+ # =============================================================================
324
+
325
+ async def discover_pages(base_url: str, max_pages: int = 20) -> list[DiscoveredPage]:
326
+ """Convenience function to discover pages."""
327
+ discoverer = PageDiscoverer()
328
+ return await discoverer.discover(base_url, max_pages)
329
+
330
+
331
+ async def quick_discover(base_url: str) -> dict:
332
+ """Quick discovery returning summary dict."""
333
+ pages = await discover_pages(base_url)
334
+
335
+ return {
336
+ "total_found": len(pages),
337
+ "by_type": {
338
+ pt.value: len([p for p in pages if p.page_type == pt])
339
+ for pt in PageType
340
+ },
341
+ "pages": [
342
+ {
343
+ "url": p.url,
344
+ "title": p.title,
345
+ "type": p.page_type.value,
346
+ "selected": p.selected,
347
+ }
348
+ for p in pages
349
+ ],
350
+ }
agents/extractor.py ADDED
@@ -0,0 +1,608 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Agent 1: Token Extractor
3
+ Design System Extractor v2
4
+
5
+ Persona: Meticulous Design Archaeologist
6
+
7
+ Responsibilities:
8
+ - Crawl pages at specified viewport
9
+ - Extract computed styles from all elements
10
+ - Collect colors, typography, spacing, radius, shadows
11
+ - Track frequency and context for each token
12
+ """
13
+
14
+ import asyncio
15
+ import re
16
+ from typing import Optional, Callable
17
+ from datetime import datetime
18
+ from collections import defaultdict
19
+
20
+ from playwright.async_api import async_playwright, Browser, Page, BrowserContext
21
+
22
+ from core.token_schema import (
23
+ Viewport,
24
+ ExtractedTokens,
25
+ ColorToken,
26
+ TypographyToken,
27
+ SpacingToken,
28
+ RadiusToken,
29
+ ShadowToken,
30
+ FontFamily,
31
+ TokenSource,
32
+ Confidence,
33
+ )
34
+ from core.color_utils import (
35
+ normalize_hex,
36
+ parse_color,
37
+ get_contrast_with_white,
38
+ get_contrast_with_black,
39
+ check_wcag_compliance,
40
+ )
41
+ from config.settings import get_settings
42
+
43
+
44
+ class TokenExtractor:
45
+ """
46
+ Extracts design tokens from web pages.
47
+
48
+ This is the second part of Agent 1's job — after pages are confirmed,
49
+ we crawl and extract all CSS values.
50
+ """
51
+
52
+ def __init__(self, viewport: Viewport = Viewport.DESKTOP):
53
+ self.settings = get_settings()
54
+ self.viewport = viewport
55
+ self.browser: Optional[Browser] = None
56
+ self.context: Optional[BrowserContext] = None
57
+
58
+ # Token collection
59
+ self.colors: dict[str, ColorToken] = {}
60
+ self.typography: dict[str, TypographyToken] = {}
61
+ self.spacing: dict[str, SpacingToken] = {}
62
+ self.radius: dict[str, RadiusToken] = {}
63
+ self.shadows: dict[str, ShadowToken] = {}
64
+
65
+ # Font tracking
66
+ self.font_families: dict[str, FontFamily] = {}
67
+
68
+ # Statistics
69
+ self.total_elements = 0
70
+ self.errors: list[str] = []
71
+ self.warnings: list[str] = []
72
+
73
+ async def __aenter__(self):
74
+ """Async context manager entry."""
75
+ await self._init_browser()
76
+ return self
77
+
78
+ async def __aexit__(self, exc_type, exc_val, exc_tb):
79
+ """Async context manager exit."""
80
+ await self._close_browser()
81
+
82
+ async def _init_browser(self):
83
+ """Initialize Playwright browser."""
84
+ playwright = await async_playwright().start()
85
+ self.browser = await playwright.chromium.launch(
86
+ headless=self.settings.browser.headless
87
+ )
88
+
89
+ # Set viewport based on extraction mode
90
+ if self.viewport == Viewport.DESKTOP:
91
+ width = self.settings.viewport.desktop_width
92
+ height = self.settings.viewport.desktop_height
93
+ else:
94
+ width = self.settings.viewport.mobile_width
95
+ height = self.settings.viewport.mobile_height
96
+
97
+ self.context = await self.browser.new_context(
98
+ viewport={"width": width, "height": height},
99
+ user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36"
100
+ )
101
+
102
+ async def _close_browser(self):
103
+ """Close browser and cleanup."""
104
+ if self.context:
105
+ await self.context.close()
106
+ if self.browser:
107
+ await self.browser.close()
108
+
109
+ async def _scroll_page(self, page: Page):
110
+ """Scroll page to load lazy content."""
111
+ await page.evaluate("""
112
+ async () => {
113
+ const delay = ms => new Promise(resolve => setTimeout(resolve, ms));
114
+ const height = document.body.scrollHeight;
115
+ const step = window.innerHeight;
116
+
117
+ for (let y = 0; y < height; y += step) {
118
+ window.scrollTo(0, y);
119
+ await delay(100);
120
+ }
121
+
122
+ // Scroll back to top
123
+ window.scrollTo(0, 0);
124
+ }
125
+ """)
126
+
127
+ # Wait for network idle after scrolling
128
+ await page.wait_for_load_state("networkidle", timeout=self.settings.browser.network_idle_timeout)
129
+
130
+ async def _extract_styles_from_page(self, page: Page) -> dict:
131
+ """
132
+ Extract computed styles from all elements on the page.
133
+
134
+ This is the core extraction logic — we get getComputedStyle for every element.
135
+ """
136
+ styles_data = await page.evaluate("""
137
+ () => {
138
+ const elements = document.querySelectorAll('*');
139
+ const results = {
140
+ colors: [],
141
+ typography: [],
142
+ spacing: [],
143
+ radius: [],
144
+ shadows: [],
145
+ elements_count: elements.length,
146
+ };
147
+
148
+ const colorProperties = [
149
+ 'color', 'background-color', 'border-color',
150
+ 'border-top-color', 'border-right-color',
151
+ 'border-bottom-color', 'border-left-color',
152
+ 'outline-color', 'text-decoration-color',
153
+ ];
154
+
155
+ const spacingProperties = [
156
+ 'margin-top', 'margin-right', 'margin-bottom', 'margin-left',
157
+ 'padding-top', 'padding-right', 'padding-bottom', 'padding-left',
158
+ 'gap', 'row-gap', 'column-gap',
159
+ ];
160
+
161
+ elements.forEach(el => {
162
+ const tag = el.tagName.toLowerCase();
163
+ const styles = window.getComputedStyle(el);
164
+
165
+ // Skip invisible elements
166
+ if (styles.display === 'none' || styles.visibility === 'hidden') {
167
+ return;
168
+ }
169
+
170
+ // --- COLORS ---
171
+ colorProperties.forEach(prop => {
172
+ const value = styles.getPropertyValue(prop);
173
+ if (value && value !== 'rgba(0, 0, 0, 0)' && value !== 'transparent') {
174
+ results.colors.push({
175
+ value: value,
176
+ property: prop,
177
+ element: tag,
178
+ context: prop.includes('background') ? 'background' :
179
+ prop.includes('border') ? 'border' : 'text',
180
+ });
181
+ }
182
+ });
183
+
184
+ // --- TYPOGRAPHY ---
185
+ const fontFamily = styles.getPropertyValue('font-family');
186
+ const fontSize = styles.getPropertyValue('font-size');
187
+ const fontWeight = styles.getPropertyValue('font-weight');
188
+ const lineHeight = styles.getPropertyValue('line-height');
189
+ const letterSpacing = styles.getPropertyValue('letter-spacing');
190
+
191
+ if (fontSize && fontFamily) {
192
+ results.typography.push({
193
+ fontFamily: fontFamily,
194
+ fontSize: fontSize,
195
+ fontWeight: fontWeight,
196
+ lineHeight: lineHeight,
197
+ letterSpacing: letterSpacing,
198
+ element: tag,
199
+ });
200
+ }
201
+
202
+ // --- SPACING ---
203
+ spacingProperties.forEach(prop => {
204
+ const value = styles.getPropertyValue(prop);
205
+ if (value && value !== '0px' && value !== 'auto' && value !== 'normal') {
206
+ const px = parseFloat(value);
207
+ if (!isNaN(px) && px > 0 && px < 500) {
208
+ results.spacing.push({
209
+ value: value,
210
+ valuePx: Math.round(px),
211
+ property: prop,
212
+ context: prop.includes('margin') ? 'margin' :
213
+ prop.includes('padding') ? 'padding' : 'gap',
214
+ });
215
+ }
216
+ }
217
+ });
218
+
219
+ // --- BORDER RADIUS ---
220
+ const radiusProps = [
221
+ 'border-radius', 'border-top-left-radius',
222
+ 'border-top-right-radius', 'border-bottom-left-radius',
223
+ 'border-bottom-right-radius',
224
+ ];
225
+
226
+ radiusProps.forEach(prop => {
227
+ const value = styles.getPropertyValue(prop);
228
+ if (value && value !== '0px') {
229
+ results.radius.push({
230
+ value: value,
231
+ element: tag,
232
+ });
233
+ }
234
+ });
235
+
236
+ // --- BOX SHADOW ---
237
+ const shadow = styles.getPropertyValue('box-shadow');
238
+ if (shadow && shadow !== 'none') {
239
+ results.shadows.push({
240
+ value: shadow,
241
+ element: tag,
242
+ });
243
+ }
244
+ });
245
+
246
+ return results;
247
+ }
248
+ """)
249
+
250
+ return styles_data
251
+
252
+ def _process_color(self, color_data: dict) -> Optional[str]:
253
+ """Process and normalize a color value."""
254
+ value = color_data.get("value", "")
255
+
256
+ # Parse and normalize
257
+ parsed = parse_color(value)
258
+ if not parsed:
259
+ return None
260
+
261
+ return parsed.hex
262
+
263
+ def _aggregate_colors(self, raw_colors: list[dict]):
264
+ """Aggregate color data from extraction."""
265
+ for color_data in raw_colors:
266
+ hex_value = self._process_color(color_data)
267
+ if not hex_value:
268
+ continue
269
+
270
+ if hex_value not in self.colors:
271
+ # Calculate contrast ratios
272
+ contrast_white = get_contrast_with_white(hex_value)
273
+ contrast_black = get_contrast_with_black(hex_value)
274
+ compliance = check_wcag_compliance(hex_value, "#ffffff")
275
+
276
+ self.colors[hex_value] = ColorToken(
277
+ value=hex_value,
278
+ frequency=0,
279
+ contexts=[],
280
+ elements=[],
281
+ css_properties=[],
282
+ contrast_white=round(contrast_white, 2),
283
+ contrast_black=round(contrast_black, 2),
284
+ wcag_aa_large_text=compliance["aa_large_text"],
285
+ wcag_aa_small_text=compliance["aa_normal_text"],
286
+ )
287
+
288
+ # Update frequency and context
289
+ token = self.colors[hex_value]
290
+ token.frequency += 1
291
+
292
+ context = color_data.get("context", "")
293
+ if context and context not in token.contexts:
294
+ token.contexts.append(context)
295
+
296
+ element = color_data.get("element", "")
297
+ if element and element not in token.elements:
298
+ token.elements.append(element)
299
+
300
+ prop = color_data.get("property", "")
301
+ if prop and prop not in token.css_properties:
302
+ token.css_properties.append(prop)
303
+
304
+ def _aggregate_typography(self, raw_typography: list[dict]):
305
+ """Aggregate typography data from extraction."""
306
+ for typo_data in raw_typography:
307
+ # Create unique key
308
+ font_family = typo_data.get("fontFamily", "")
309
+ font_size = typo_data.get("fontSize", "")
310
+ font_weight = typo_data.get("fontWeight", "400")
311
+ line_height = typo_data.get("lineHeight", "normal")
312
+
313
+ key = f"{font_size}|{font_weight}|{font_family[:50]}"
314
+
315
+ if key not in self.typography:
316
+ # Parse font size to px
317
+ font_size_px = None
318
+ if font_size.endswith("px"):
319
+ try:
320
+ font_size_px = float(font_size.replace("px", ""))
321
+ except ValueError:
322
+ pass
323
+
324
+ # Parse line height
325
+ line_height_computed = None
326
+ if line_height and line_height != "normal":
327
+ if line_height.endswith("px") and font_size_px:
328
+ try:
329
+ lh_px = float(line_height.replace("px", ""))
330
+ line_height_computed = round(lh_px / font_size_px, 2)
331
+ except ValueError:
332
+ pass
333
+ else:
334
+ try:
335
+ line_height_computed = float(line_height)
336
+ except ValueError:
337
+ pass
338
+
339
+ self.typography[key] = TypographyToken(
340
+ font_family=font_family.split(",")[0].strip().strip('"\''),
341
+ font_size=font_size,
342
+ font_size_px=font_size_px,
343
+ font_weight=int(font_weight) if font_weight.isdigit() else 400,
344
+ line_height=line_height,
345
+ line_height_computed=line_height_computed,
346
+ letter_spacing=typo_data.get("letterSpacing"),
347
+ frequency=0,
348
+ elements=[],
349
+ )
350
+
351
+ # Update
352
+ token = self.typography[key]
353
+ token.frequency += 1
354
+
355
+ element = typo_data.get("element", "")
356
+ if element and element not in token.elements:
357
+ token.elements.append(element)
358
+
359
+ # Track font families
360
+ primary_font = token.font_family
361
+ if primary_font not in self.font_families:
362
+ self.font_families[primary_font] = FontFamily(
363
+ name=primary_font,
364
+ fallbacks=[f.strip().strip('"\'') for f in font_family.split(",")[1:]],
365
+ frequency=0,
366
+ )
367
+ self.font_families[primary_font].frequency += 1
368
+
369
+ def _aggregate_spacing(self, raw_spacing: list[dict]):
370
+ """Aggregate spacing data from extraction."""
371
+ for space_data in raw_spacing:
372
+ value = space_data.get("value", "")
373
+ value_px = space_data.get("valuePx", 0)
374
+
375
+ key = str(value_px)
376
+
377
+ if key not in self.spacing:
378
+ self.spacing[key] = SpacingToken(
379
+ value=f"{value_px}px",
380
+ value_px=value_px,
381
+ frequency=0,
382
+ contexts=[],
383
+ properties=[],
384
+ fits_base_4=value_px % 4 == 0,
385
+ fits_base_8=value_px % 8 == 0,
386
+ )
387
+
388
+ token = self.spacing[key]
389
+ token.frequency += 1
390
+
391
+ context = space_data.get("context", "")
392
+ if context and context not in token.contexts:
393
+ token.contexts.append(context)
394
+
395
+ prop = space_data.get("property", "")
396
+ if prop and prop not in token.properties:
397
+ token.properties.append(prop)
398
+
399
+ def _aggregate_radius(self, raw_radius: list[dict]):
400
+ """Aggregate border radius data."""
401
+ for radius_data in raw_radius:
402
+ value = radius_data.get("value", "")
403
+
404
+ # Normalize to simple format
405
+ # "8px 8px 8px 8px" -> "8px"
406
+ parts = value.split()
407
+ if len(set(parts)) == 1:
408
+ value = parts[0]
409
+
410
+ if value not in self.radius:
411
+ value_px = None
412
+ if value.endswith("px"):
413
+ try:
414
+ value_px = int(float(value.replace("px", "")))
415
+ except ValueError:
416
+ pass
417
+
418
+ self.radius[value] = RadiusToken(
419
+ value=value,
420
+ value_px=value_px,
421
+ frequency=0,
422
+ elements=[],
423
+ fits_base_4=value_px % 4 == 0 if value_px else False,
424
+ fits_base_8=value_px % 8 == 0 if value_px else False,
425
+ )
426
+
427
+ token = self.radius[value]
428
+ token.frequency += 1
429
+
430
+ element = radius_data.get("element", "")
431
+ if element and element not in token.elements:
432
+ token.elements.append(element)
433
+
434
+ def _aggregate_shadows(self, raw_shadows: list[dict]):
435
+ """Aggregate box shadow data."""
436
+ for shadow_data in raw_shadows:
437
+ value = shadow_data.get("value", "")
438
+
439
+ if value not in self.shadows:
440
+ self.shadows[value] = ShadowToken(
441
+ value=value,
442
+ frequency=0,
443
+ elements=[],
444
+ )
445
+
446
+ token = self.shadows[value]
447
+ token.frequency += 1
448
+
449
+ element = shadow_data.get("element", "")
450
+ if element and element not in token.elements:
451
+ token.elements.append(element)
452
+
453
+ def _calculate_confidence(self, frequency: int) -> Confidence:
454
+ """Calculate confidence level based on frequency."""
455
+ if frequency >= 10:
456
+ return Confidence.HIGH
457
+ elif frequency >= 3:
458
+ return Confidence.MEDIUM
459
+ return Confidence.LOW
460
+
461
+ def _detect_spacing_base(self) -> Optional[int]:
462
+ """Detect the base spacing unit (4 or 8)."""
463
+ fits_4 = sum(1 for s in self.spacing.values() if s.fits_base_4)
464
+ fits_8 = sum(1 for s in self.spacing.values() if s.fits_base_8)
465
+
466
+ total = len(self.spacing)
467
+ if total == 0:
468
+ return None
469
+
470
+ # If 80%+ values fit base 8, use 8
471
+ if fits_8 / total >= 0.8:
472
+ return 8
473
+ # If 80%+ values fit base 4, use 4
474
+ elif fits_4 / total >= 0.8:
475
+ return 4
476
+
477
+ return None
478
+
479
+ async def extract(
480
+ self,
481
+ pages: list[str],
482
+ progress_callback: Optional[Callable[[float], None]] = None
483
+ ) -> ExtractedTokens:
484
+ """
485
+ Extract tokens from a list of pages.
486
+
487
+ Args:
488
+ pages: List of URLs to crawl
489
+ progress_callback: Optional callback for progress updates
490
+
491
+ Returns:
492
+ ExtractedTokens with all discovered tokens
493
+ """
494
+ start_time = datetime.now()
495
+ pages_crawled = []
496
+
497
+ async with self:
498
+ for i, url in enumerate(pages):
499
+ try:
500
+ page = await self.context.new_page()
501
+
502
+ # Navigate
503
+ await page.goto(
504
+ url,
505
+ wait_until="domcontentloaded",
506
+ timeout=self.settings.browser.timeout
507
+ )
508
+
509
+ # Scroll to load lazy content
510
+ await self._scroll_page(page)
511
+
512
+ # Extract styles
513
+ styles = await self._extract_styles_from_page(page)
514
+
515
+ # Aggregate
516
+ self._aggregate_colors(styles.get("colors", []))
517
+ self._aggregate_typography(styles.get("typography", []))
518
+ self._aggregate_spacing(styles.get("spacing", []))
519
+ self._aggregate_radius(styles.get("radius", []))
520
+ self._aggregate_shadows(styles.get("shadows", []))
521
+
522
+ self.total_elements += styles.get("elements_count", 0)
523
+ pages_crawled.append(url)
524
+
525
+ await page.close()
526
+
527
+ # Progress callback
528
+ if progress_callback:
529
+ progress_callback((i + 1) / len(pages))
530
+
531
+ # Rate limiting
532
+ await asyncio.sleep(self.settings.crawl.crawl_delay_ms / 1000)
533
+
534
+ except Exception as e:
535
+ self.errors.append(f"Error extracting {url}: {str(e)}")
536
+
537
+ # Calculate confidence for all tokens
538
+ for token in self.colors.values():
539
+ token.confidence = self._calculate_confidence(token.frequency)
540
+ for token in self.typography.values():
541
+ token.confidence = self._calculate_confidence(token.frequency)
542
+ for token in self.spacing.values():
543
+ token.confidence = self._calculate_confidence(token.frequency)
544
+
545
+ # Detect spacing base
546
+ spacing_base = self._detect_spacing_base()
547
+
548
+ # Mark outliers in spacing
549
+ if spacing_base:
550
+ for token in self.spacing.values():
551
+ if spacing_base == 8 and not token.fits_base_8:
552
+ token.is_outlier = True
553
+ elif spacing_base == 4 and not token.fits_base_4:
554
+ token.is_outlier = True
555
+
556
+ # Determine primary font
557
+ if self.font_families:
558
+ primary_font = max(self.font_families.values(), key=lambda f: f.frequency)
559
+ primary_font.usage = "primary"
560
+
561
+ # Build result
562
+ end_time = datetime.now()
563
+ duration_ms = int((end_time - start_time).total_seconds() * 1000)
564
+
565
+ return ExtractedTokens(
566
+ viewport=self.viewport,
567
+ source_url=pages[0] if pages else "",
568
+ pages_crawled=pages_crawled,
569
+ colors=list(self.colors.values()),
570
+ typography=list(self.typography.values()),
571
+ spacing=list(self.spacing.values()),
572
+ radius=list(self.radius.values()),
573
+ shadows=list(self.shadows.values()),
574
+ font_families=list(self.font_families.values()),
575
+ spacing_base=spacing_base,
576
+ extraction_timestamp=start_time,
577
+ extraction_duration_ms=duration_ms,
578
+ total_elements_analyzed=self.total_elements,
579
+ unique_colors=len(self.colors),
580
+ unique_font_sizes=len(set(t.font_size for t in self.typography.values())),
581
+ unique_spacing_values=len(self.spacing),
582
+ errors=self.errors,
583
+ warnings=self.warnings,
584
+ )
585
+
586
+
587
+ # =============================================================================
588
+ # CONVENIENCE FUNCTIONS
589
+ # =============================================================================
590
+
591
+ async def extract_from_pages(
592
+ pages: list[str],
593
+ viewport: Viewport = Viewport.DESKTOP
594
+ ) -> ExtractedTokens:
595
+ """Convenience function to extract tokens from pages."""
596
+ extractor = TokenExtractor(viewport=viewport)
597
+ return await extractor.extract(pages)
598
+
599
+
600
+ async def extract_both_viewports(pages: list[str]) -> tuple[ExtractedTokens, ExtractedTokens]:
601
+ """Extract tokens from both desktop and mobile viewports."""
602
+ desktop_extractor = TokenExtractor(viewport=Viewport.DESKTOP)
603
+ mobile_extractor = TokenExtractor(viewport=Viewport.MOBILE)
604
+
605
+ desktop_result = await desktop_extractor.extract(pages)
606
+ mobile_result = await mobile_extractor.extract(pages)
607
+
608
+ return desktop_result, mobile_result
agents/graph.py ADDED
@@ -0,0 +1,540 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ LangGraph Workflow Orchestration
3
+ Design System Extractor v2
4
+
5
+ Defines the main workflow graph with agents, checkpoints, and transitions.
6
+ """
7
+
8
+ from typing import Literal
9
+ from datetime import datetime
10
+ from langgraph.graph import StateGraph, END
11
+ from langgraph.checkpoint.memory import MemorySaver
12
+
13
+ from agents.state import AgentState, create_initial_state, get_stage_progress
14
+ from core.token_schema import Viewport
15
+
16
+
17
+ # =============================================================================
18
+ # NODE FUNCTIONS (Agent Entry Points)
19
+ # =============================================================================
20
+
21
+ async def discover_pages(state: AgentState) -> AgentState:
22
+ """
23
+ Agent 1 - Part 1: Discover pages from base URL.
24
+
25
+ This node:
26
+ 1. Takes the base URL
27
+ 2. Crawls to find linked pages
28
+ 3. Classifies page types (homepage, listing, detail, etc.)
29
+ 4. Returns discovered pages for user confirmation
30
+ """
31
+ from agents.crawler import PageDiscoverer
32
+
33
+ state["current_stage"] = "discover"
34
+ state["stage_started_at"] = datetime.now()
35
+
36
+ try:
37
+ discoverer = PageDiscoverer()
38
+ pages = await discoverer.discover(state["base_url"])
39
+
40
+ state["discovered_pages"] = pages
41
+ state["awaiting_human_input"] = True
42
+ state["checkpoint_name"] = "confirm_pages"
43
+
44
+ except Exception as e:
45
+ state["errors"].append(f"Discovery failed: {str(e)}")
46
+
47
+ return state
48
+
49
+
50
+ async def extract_tokens_desktop(state: AgentState) -> AgentState:
51
+ """
52
+ Agent 1 - Part 2a: Extract tokens from desktop viewport.
53
+ """
54
+ from agents.extractor import TokenExtractor
55
+
56
+ state["current_stage"] = "extract"
57
+
58
+ try:
59
+ extractor = TokenExtractor(viewport=Viewport.DESKTOP)
60
+ result = await extractor.extract(
61
+ pages=state["pages_to_crawl"],
62
+ progress_callback=lambda p: state.update({"desktop_crawl_progress": p})
63
+ )
64
+
65
+ state["desktop_extraction"] = result
66
+
67
+ except Exception as e:
68
+ state["errors"].append(f"Desktop extraction failed: {str(e)}")
69
+
70
+ return state
71
+
72
+
73
+ async def extract_tokens_mobile(state: AgentState) -> AgentState:
74
+ """
75
+ Agent 1 - Part 2b: Extract tokens from mobile viewport.
76
+ """
77
+ from agents.extractor import TokenExtractor
78
+
79
+ try:
80
+ extractor = TokenExtractor(viewport=Viewport.MOBILE)
81
+ result = await extractor.extract(
82
+ pages=state["pages_to_crawl"],
83
+ progress_callback=lambda p: state.update({"mobile_crawl_progress": p})
84
+ )
85
+
86
+ state["mobile_extraction"] = result
87
+
88
+ except Exception as e:
89
+ state["errors"].append(f"Mobile extraction failed: {str(e)}")
90
+
91
+ return state
92
+
93
+
94
+ async def normalize_tokens(state: AgentState) -> AgentState:
95
+ """
96
+ Agent 2: Normalize and structure extracted tokens.
97
+ """
98
+ from agents.normalizer import TokenNormalizer
99
+
100
+ state["current_stage"] = "normalize"
101
+ state["stage_started_at"] = datetime.now()
102
+
103
+ try:
104
+ normalizer = TokenNormalizer()
105
+
106
+ if state["desktop_extraction"]:
107
+ state["desktop_normalized"] = normalizer.normalize(state["desktop_extraction"])
108
+
109
+ if state["mobile_extraction"]:
110
+ state["mobile_normalized"] = normalizer.normalize(state["mobile_extraction"])
111
+
112
+ # After normalization, wait for human review
113
+ state["awaiting_human_input"] = True
114
+ state["checkpoint_name"] = "review_tokens"
115
+
116
+ except Exception as e:
117
+ state["errors"].append(f"Normalization failed: {str(e)}")
118
+
119
+ return state
120
+
121
+
122
+ async def generate_recommendations(state: AgentState) -> AgentState:
123
+ """
124
+ Agent 3: Generate upgrade recommendations.
125
+ """
126
+ from agents.advisor import DesignSystemAdvisor
127
+
128
+ state["current_stage"] = "advise"
129
+ state["stage_started_at"] = datetime.now()
130
+
131
+ try:
132
+ advisor = DesignSystemAdvisor()
133
+ recommendations = await advisor.analyze_and_recommend(
134
+ desktop=state["desktop_normalized"],
135
+ mobile=state["mobile_normalized"],
136
+ )
137
+
138
+ state["upgrade_recommendations"] = recommendations
139
+
140
+ # Wait for human to select upgrades
141
+ state["awaiting_human_input"] = True
142
+ state["checkpoint_name"] = "select_upgrades"
143
+
144
+ except Exception as e:
145
+ state["errors"].append(f"Recommendation generation failed: {str(e)}")
146
+
147
+ return state
148
+
149
+
150
+ async def generate_final_tokens(state: AgentState) -> AgentState:
151
+ """
152
+ Agent 4: Generate final token JSON.
153
+ """
154
+ from agents.generator import TokenGenerator
155
+
156
+ state["current_stage"] = "generate"
157
+ state["stage_started_at"] = datetime.now()
158
+
159
+ try:
160
+ generator = TokenGenerator()
161
+
162
+ # Build selection config from user choices
163
+ selections = {
164
+ "type_scale": state["selected_type_scale"],
165
+ "spacing_system": state["selected_spacing_system"],
166
+ "naming_convention": state["selected_naming_convention"],
167
+ "color_ramps": state["selected_color_ramps"],
168
+ "a11y_fixes": state["selected_a11y_fixes"],
169
+ }
170
+
171
+ if state["desktop_normalized"]:
172
+ state["desktop_final"] = generator.generate(
173
+ normalized=state["desktop_normalized"],
174
+ selections=selections,
175
+ version=state["version_label"],
176
+ )
177
+
178
+ if state["mobile_normalized"]:
179
+ state["mobile_final"] = generator.generate(
180
+ normalized=state["mobile_normalized"],
181
+ selections=selections,
182
+ version=state["version_label"],
183
+ )
184
+
185
+ # Wait for human to approve export
186
+ state["awaiting_human_input"] = True
187
+ state["checkpoint_name"] = "approve_export"
188
+
189
+ except Exception as e:
190
+ state["errors"].append(f"Token generation failed: {str(e)}")
191
+
192
+ return state
193
+
194
+
195
+ async def complete_workflow(state: AgentState) -> AgentState:
196
+ """
197
+ Final node: Mark workflow as complete.
198
+ """
199
+ state["current_stage"] = "export"
200
+ state["awaiting_human_input"] = False
201
+ state["checkpoint_name"] = None
202
+
203
+ return state
204
+
205
+
206
+ # =============================================================================
207
+ # HUMAN CHECKPOINT HANDLERS
208
+ # =============================================================================
209
+
210
+ def handle_page_confirmation(state: AgentState, confirmed_pages: list[str]) -> AgentState:
211
+ """Handle human confirmation of pages to crawl."""
212
+ state["pages_to_crawl"] = confirmed_pages
213
+ state["awaiting_human_input"] = False
214
+ state["checkpoint_name"] = None
215
+ return state
216
+
217
+
218
+ def handle_token_review(
219
+ state: AgentState,
220
+ color_decisions: dict[str, bool],
221
+ typography_decisions: dict[str, bool],
222
+ spacing_decisions: dict[str, bool],
223
+ ) -> AgentState:
224
+ """Handle human review of extracted tokens."""
225
+ state["accepted_colors"] = [k for k, v in color_decisions.items() if v]
226
+ state["rejected_colors"] = [k for k, v in color_decisions.items() if not v]
227
+ state["accepted_typography"] = [k for k, v in typography_decisions.items() if v]
228
+ state["rejected_typography"] = [k for k, v in typography_decisions.items() if not v]
229
+ state["accepted_spacing"] = [k for k, v in spacing_decisions.items() if v]
230
+ state["rejected_spacing"] = [k for k, v in spacing_decisions.items() if not v]
231
+
232
+ state["awaiting_human_input"] = False
233
+ state["checkpoint_name"] = None
234
+ return state
235
+
236
+
237
+ def handle_upgrade_selection(
238
+ state: AgentState,
239
+ type_scale: str | None,
240
+ spacing_system: str | None,
241
+ naming_convention: str | None,
242
+ color_ramps: dict[str, bool],
243
+ a11y_fixes: list[str],
244
+ ) -> AgentState:
245
+ """Handle human selection of upgrade options."""
246
+ state["selected_type_scale"] = type_scale
247
+ state["selected_spacing_system"] = spacing_system
248
+ state["selected_naming_convention"] = naming_convention
249
+ state["selected_color_ramps"] = color_ramps
250
+ state["selected_a11y_fixes"] = a11y_fixes
251
+
252
+ state["awaiting_human_input"] = False
253
+ state["checkpoint_name"] = None
254
+ return state
255
+
256
+
257
+ def handle_export_approval(state: AgentState, version_label: str) -> AgentState:
258
+ """Handle human approval of final export."""
259
+ state["version_label"] = version_label
260
+ state["awaiting_human_input"] = False
261
+ state["checkpoint_name"] = None
262
+ return state
263
+
264
+
265
+ # =============================================================================
266
+ # ROUTING FUNCTIONS
267
+ # =============================================================================
268
+
269
+ def route_after_discovery(state: AgentState) -> Literal["wait_for_pages", "extract"]:
270
+ """Route after discovery: wait for human or continue."""
271
+ if state["awaiting_human_input"]:
272
+ return "wait_for_pages"
273
+ return "extract"
274
+
275
+
276
+ def route_after_extraction(state: AgentState) -> Literal["normalize", "error"]:
277
+ """Route after extraction: normalize or handle error."""
278
+ if state["desktop_extraction"] is None and state["mobile_extraction"] is None:
279
+ return "error"
280
+ return "normalize"
281
+
282
+
283
+ def route_after_normalization(state: AgentState) -> Literal["wait_for_review", "advise"]:
284
+ """Route after normalization: wait for review or continue."""
285
+ if state["awaiting_human_input"]:
286
+ return "wait_for_review"
287
+ return "advise"
288
+
289
+
290
+ def route_after_recommendations(state: AgentState) -> Literal["wait_for_selection", "generate"]:
291
+ """Route after recommendations: wait for selection or continue."""
292
+ if state["awaiting_human_input"]:
293
+ return "wait_for_selection"
294
+ return "generate"
295
+
296
+
297
+ def route_after_generation(state: AgentState) -> Literal["wait_for_approval", "complete"]:
298
+ """Route after generation: wait for approval or complete."""
299
+ if state["awaiting_human_input"]:
300
+ return "wait_for_approval"
301
+ return "complete"
302
+
303
+
304
+ # =============================================================================
305
+ # GRAPH BUILDER
306
+ # =============================================================================
307
+
308
+ def build_workflow_graph() -> StateGraph:
309
+ """
310
+ Build the main LangGraph workflow.
311
+
312
+ Flow:
313
+ 1. discover_pages -> [human confirms pages]
314
+ 2. extract_desktop + extract_mobile (parallel)
315
+ 3. normalize_tokens -> [human reviews tokens]
316
+ 4. generate_recommendations -> [human selects upgrades]
317
+ 5. generate_final_tokens -> [human approves export]
318
+ 6. complete
319
+ """
320
+
321
+ # Create the graph
322
+ workflow = StateGraph(AgentState)
323
+
324
+ # -------------------------------------------------------------------------
325
+ # ADD NODES
326
+ # -------------------------------------------------------------------------
327
+
328
+ # Discovery
329
+ workflow.add_node("discover", discover_pages)
330
+
331
+ # Extraction (will be parallel in subgraph)
332
+ workflow.add_node("extract_desktop", extract_tokens_desktop)
333
+ workflow.add_node("extract_mobile", extract_tokens_mobile)
334
+
335
+ # Normalization
336
+ workflow.add_node("normalize", normalize_tokens)
337
+
338
+ # Advisor
339
+ workflow.add_node("advise", generate_recommendations)
340
+
341
+ # Generator
342
+ workflow.add_node("generate", generate_final_tokens)
343
+
344
+ # Completion
345
+ workflow.add_node("complete", complete_workflow)
346
+
347
+ # Human checkpoint placeholder nodes (these just pass through)
348
+ workflow.add_node("wait_for_pages", lambda s: s)
349
+ workflow.add_node("wait_for_review", lambda s: s)
350
+ workflow.add_node("wait_for_selection", lambda s: s)
351
+ workflow.add_node("wait_for_approval", lambda s: s)
352
+
353
+ # -------------------------------------------------------------------------
354
+ # ADD EDGES
355
+ # -------------------------------------------------------------------------
356
+
357
+ # Entry point
358
+ workflow.set_entry_point("discover")
359
+
360
+ # Discovery -> (wait or extract)
361
+ workflow.add_conditional_edges(
362
+ "discover",
363
+ route_after_discovery,
364
+ {
365
+ "wait_for_pages": "wait_for_pages",
366
+ "extract": "extract_desktop",
367
+ }
368
+ )
369
+
370
+ # After human confirms pages -> extract
371
+ workflow.add_edge("wait_for_pages", "extract_desktop")
372
+
373
+ # Parallel extraction
374
+ workflow.add_edge("extract_desktop", "extract_mobile")
375
+
376
+ # After extraction -> normalize
377
+ workflow.add_conditional_edges(
378
+ "extract_mobile",
379
+ route_after_extraction,
380
+ {
381
+ "normalize": "normalize",
382
+ "error": END,
383
+ }
384
+ )
385
+
386
+ # Normalization -> (wait or advise)
387
+ workflow.add_conditional_edges(
388
+ "normalize",
389
+ route_after_normalization,
390
+ {
391
+ "wait_for_review": "wait_for_review",
392
+ "advise": "advise",
393
+ }
394
+ )
395
+
396
+ # After human reviews -> advise
397
+ workflow.add_edge("wait_for_review", "advise")
398
+
399
+ # Advisor -> (wait or generate)
400
+ workflow.add_conditional_edges(
401
+ "advise",
402
+ route_after_recommendations,
403
+ {
404
+ "wait_for_selection": "wait_for_selection",
405
+ "generate": "generate",
406
+ }
407
+ )
408
+
409
+ # After human selects upgrades -> generate
410
+ workflow.add_edge("wait_for_selection", "generate")
411
+
412
+ # Generation -> (wait or complete)
413
+ workflow.add_conditional_edges(
414
+ "generate",
415
+ route_after_generation,
416
+ {
417
+ "wait_for_approval": "wait_for_approval",
418
+ "complete": "complete",
419
+ }
420
+ )
421
+
422
+ # After human approves -> complete
423
+ workflow.add_edge("wait_for_approval", "complete")
424
+
425
+ # Complete -> END
426
+ workflow.add_edge("complete", END)
427
+
428
+ return workflow
429
+
430
+
431
+ # =============================================================================
432
+ # WORKFLOW RUNNER
433
+ # =============================================================================
434
+
435
+ class WorkflowRunner:
436
+ """
437
+ Manages workflow execution with human-in-the-loop support.
438
+ """
439
+
440
+ def __init__(self):
441
+ self.graph = build_workflow_graph()
442
+ self.checkpointer = MemorySaver()
443
+ self.app = self.graph.compile(checkpointer=self.checkpointer)
444
+ self.current_state: AgentState | None = None
445
+ self.thread_id: str | None = None
446
+
447
+ async def start(self, base_url: str, thread_id: str | None = None) -> AgentState:
448
+ """Start a new workflow."""
449
+ self.thread_id = thread_id or f"workflow_{datetime.now().timestamp()}"
450
+ self.current_state = create_initial_state(base_url)
451
+
452
+ config = {"configurable": {"thread_id": self.thread_id}}
453
+
454
+ # Run until first human checkpoint
455
+ async for event in self.app.astream(self.current_state, config):
456
+ self.current_state = event
457
+ if self.current_state.get("awaiting_human_input"):
458
+ break
459
+
460
+ return self.current_state
461
+
462
+ async def resume(self, human_input: dict) -> AgentState:
463
+ """Resume workflow after human input."""
464
+ if not self.current_state or not self.thread_id:
465
+ raise ValueError("No active workflow to resume")
466
+
467
+ checkpoint = self.current_state.get("checkpoint_name")
468
+
469
+ # Apply human input based on checkpoint
470
+ if checkpoint == "confirm_pages":
471
+ self.current_state = handle_page_confirmation(
472
+ self.current_state,
473
+ human_input.get("confirmed_pages", [])
474
+ )
475
+ elif checkpoint == "review_tokens":
476
+ self.current_state = handle_token_review(
477
+ self.current_state,
478
+ human_input.get("color_decisions", {}),
479
+ human_input.get("typography_decisions", {}),
480
+ human_input.get("spacing_decisions", {}),
481
+ )
482
+ elif checkpoint == "select_upgrades":
483
+ self.current_state = handle_upgrade_selection(
484
+ self.current_state,
485
+ human_input.get("type_scale"),
486
+ human_input.get("spacing_system"),
487
+ human_input.get("naming_convention"),
488
+ human_input.get("color_ramps", {}),
489
+ human_input.get("a11y_fixes", []),
490
+ )
491
+ elif checkpoint == "approve_export":
492
+ self.current_state = handle_export_approval(
493
+ self.current_state,
494
+ human_input.get("version_label", "v1")
495
+ )
496
+
497
+ config = {"configurable": {"thread_id": self.thread_id}}
498
+
499
+ # Continue until next checkpoint or completion
500
+ async for event in self.app.astream(self.current_state, config):
501
+ self.current_state = event
502
+ if self.current_state.get("awaiting_human_input"):
503
+ break
504
+
505
+ return self.current_state
506
+
507
+ def get_progress(self) -> dict:
508
+ """Get current workflow progress."""
509
+ if not self.current_state:
510
+ return {"status": "not_started"}
511
+ return get_stage_progress(self.current_state)
512
+
513
+ def get_state(self) -> AgentState | None:
514
+ """Get current state."""
515
+ return self.current_state
516
+
517
+
518
+ # =============================================================================
519
+ # CONVENIENCE FUNCTIONS
520
+ # =============================================================================
521
+
522
+ def create_workflow() -> WorkflowRunner:
523
+ """Create a new workflow runner instance."""
524
+ return WorkflowRunner()
525
+
526
+
527
+ async def run_discovery_only(base_url: str) -> list:
528
+ """Run only the discovery phase (for testing)."""
529
+ from agents.crawler import PageDiscoverer
530
+
531
+ discoverer = PageDiscoverer()
532
+ return await discoverer.discover(base_url)
533
+
534
+
535
+ async def run_extraction_only(pages: list[str], viewport: Viewport) -> dict:
536
+ """Run only the extraction phase (for testing)."""
537
+ from agents.extractor import TokenExtractor
538
+
539
+ extractor = TokenExtractor(viewport=viewport)
540
+ return await extractor.extract(pages)
agents/state.py ADDED
@@ -0,0 +1,292 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ LangGraph State Definitions
3
+ Design System Extractor v2
4
+
5
+ Defines the state schema and type hints for LangGraph workflow.
6
+ """
7
+
8
+ from typing import TypedDict, Annotated, Sequence, Optional
9
+ from datetime import datetime
10
+ from langgraph.graph.message import add_messages
11
+
12
+ from core.token_schema import (
13
+ DiscoveredPage,
14
+ ExtractedTokens,
15
+ NormalizedTokens,
16
+ UpgradeRecommendations,
17
+ FinalTokens,
18
+ Viewport,
19
+ )
20
+
21
+
22
+ # =============================================================================
23
+ # STATE ANNOTATIONS
24
+ # =============================================================================
25
+
26
+ def merge_lists(left: list, right: list) -> list:
27
+ """Merge two lists, avoiding duplicates."""
28
+ seen = set()
29
+ result = []
30
+ for item in left + right:
31
+ key = str(item) if not hasattr(item, 'url') else item.url
32
+ if key not in seen:
33
+ seen.add(key)
34
+ result.append(item)
35
+ return result
36
+
37
+
38
+ def replace_value(left, right):
39
+ """Replace left with right (simple override)."""
40
+ return right if right is not None else left
41
+
42
+
43
+ # =============================================================================
44
+ # MAIN WORKFLOW STATE
45
+ # =============================================================================
46
+
47
+ class AgentState(TypedDict):
48
+ """
49
+ Main state for the LangGraph workflow.
50
+
51
+ This state is passed between all agents and accumulates data
52
+ as the workflow progresses through stages.
53
+ """
54
+
55
+ # -------------------------------------------------------------------------
56
+ # INPUT
57
+ # -------------------------------------------------------------------------
58
+ base_url: str # The website URL to extract from
59
+
60
+ # -------------------------------------------------------------------------
61
+ # DISCOVERY STAGE (Agent 1 - Part 1)
62
+ # -------------------------------------------------------------------------
63
+ discovered_pages: Annotated[list[DiscoveredPage], merge_lists]
64
+ pages_to_crawl: list[str] # User-confirmed pages
65
+
66
+ # -------------------------------------------------------------------------
67
+ # EXTRACTION STAGE (Agent 1 - Part 2)
68
+ # -------------------------------------------------------------------------
69
+ # Desktop extraction
70
+ desktop_extraction: Optional[ExtractedTokens]
71
+ desktop_crawl_progress: float # 0.0 to 1.0
72
+
73
+ # Mobile extraction
74
+ mobile_extraction: Optional[ExtractedTokens]
75
+ mobile_crawl_progress: float # 0.0 to 1.0
76
+
77
+ # -------------------------------------------------------------------------
78
+ # NORMALIZATION STAGE (Agent 2)
79
+ # -------------------------------------------------------------------------
80
+ desktop_normalized: Optional[NormalizedTokens]
81
+ mobile_normalized: Optional[NormalizedTokens]
82
+
83
+ # User decisions from Stage 1 review
84
+ accepted_colors: list[str] # List of accepted color values
85
+ rejected_colors: list[str] # List of rejected color values
86
+ accepted_typography: list[str]
87
+ rejected_typography: list[str]
88
+ accepted_spacing: list[str]
89
+ rejected_spacing: list[str]
90
+
91
+ # -------------------------------------------------------------------------
92
+ # ADVISOR STAGE (Agent 3)
93
+ # -------------------------------------------------------------------------
94
+ upgrade_recommendations: Optional[UpgradeRecommendations]
95
+
96
+ # User selections from Stage 2 playground
97
+ selected_type_scale: Optional[str] # ID of selected scale
98
+ selected_spacing_system: Optional[str]
99
+ selected_naming_convention: Optional[str]
100
+ selected_color_ramps: dict[str, bool] # {"primary": True, "secondary": False}
101
+ selected_a11y_fixes: list[str] # IDs of accepted fixes
102
+
103
+ # -------------------------------------------------------------------------
104
+ # GENERATION STAGE (Agent 4)
105
+ # -------------------------------------------------------------------------
106
+ desktop_final: Optional[FinalTokens]
107
+ mobile_final: Optional[FinalTokens]
108
+
109
+ # Version info
110
+ version_label: str # e.g., "v1-recovered", "v2-upgraded"
111
+
112
+ # -------------------------------------------------------------------------
113
+ # WORKFLOW METADATA
114
+ # -------------------------------------------------------------------------
115
+ current_stage: str # "discover", "extract", "normalize", "advise", "generate", "export"
116
+
117
+ # Human checkpoints
118
+ awaiting_human_input: bool
119
+ checkpoint_name: Optional[str] # "confirm_pages", "review_tokens", "select_upgrades", "approve_export"
120
+
121
+ # Errors and warnings (accumulated)
122
+ errors: Annotated[list[str], merge_lists]
123
+ warnings: Annotated[list[str], merge_lists]
124
+
125
+ # Messages for LLM agents (if using chat-based agents)
126
+ messages: Annotated[Sequence[dict], add_messages]
127
+
128
+ # Timing
129
+ started_at: Optional[datetime]
130
+ stage_started_at: Optional[datetime]
131
+
132
+
133
+ # =============================================================================
134
+ # STAGE-SPECIFIC STATES (for parallel execution)
135
+ # =============================================================================
136
+
137
+ class DiscoveryState(TypedDict):
138
+ """State for page discovery sub-graph."""
139
+ base_url: str
140
+ discovered_pages: list[DiscoveredPage]
141
+ discovery_complete: bool
142
+ error: Optional[str]
143
+
144
+
145
+ class ExtractionState(TypedDict):
146
+ """State for extraction sub-graph (per viewport)."""
147
+ viewport: Viewport
148
+ pages_to_crawl: list[str]
149
+ extraction_result: Optional[ExtractedTokens]
150
+ progress: float
151
+ current_page: Optional[str]
152
+ error: Optional[str]
153
+
154
+
155
+ class NormalizationState(TypedDict):
156
+ """State for normalization sub-graph."""
157
+ raw_tokens: ExtractedTokens
158
+ normalized_tokens: Optional[NormalizedTokens]
159
+ duplicates_found: list[tuple[str, str]]
160
+ error: Optional[str]
161
+
162
+
163
+ class AdvisorState(TypedDict):
164
+ """State for advisor sub-graph."""
165
+ normalized_desktop: NormalizedTokens
166
+ normalized_mobile: Optional[NormalizedTokens]
167
+ recommendations: Optional[UpgradeRecommendations]
168
+ error: Optional[str]
169
+
170
+
171
+ class GenerationState(TypedDict):
172
+ """State for generation sub-graph."""
173
+ normalized_tokens: NormalizedTokens
174
+ selected_upgrades: dict[str, str]
175
+ final_tokens: Optional[FinalTokens]
176
+ error: Optional[str]
177
+
178
+
179
+ # =============================================================================
180
+ # CHECKPOINT STATES (Human-in-the-loop)
181
+ # =============================================================================
182
+
183
+ class PageConfirmationState(TypedDict):
184
+ """State for page confirmation checkpoint."""
185
+ discovered_pages: list[DiscoveredPage]
186
+ confirmed_pages: list[str]
187
+ user_confirmed: bool
188
+
189
+
190
+ class TokenReviewState(TypedDict):
191
+ """State for token review checkpoint (Stage 1 UI)."""
192
+ desktop_tokens: NormalizedTokens
193
+ mobile_tokens: Optional[NormalizedTokens]
194
+
195
+ # User decisions
196
+ color_decisions: dict[str, bool] # {value: accepted}
197
+ typography_decisions: dict[str, bool]
198
+ spacing_decisions: dict[str, bool]
199
+
200
+ user_confirmed: bool
201
+
202
+
203
+ class UpgradeSelectionState(TypedDict):
204
+ """State for upgrade selection checkpoint (Stage 2 UI)."""
205
+ recommendations: UpgradeRecommendations
206
+ current_tokens: NormalizedTokens
207
+
208
+ # User selections
209
+ selected_options: dict[str, str] # {category: option_id}
210
+
211
+ user_confirmed: bool
212
+
213
+
214
+ class ExportApprovalState(TypedDict):
215
+ """State for export approval checkpoint (Stage 3 UI)."""
216
+ desktop_final: FinalTokens
217
+ mobile_final: Optional[FinalTokens]
218
+
219
+ version_label: str
220
+ user_confirmed: bool
221
+
222
+
223
+ # =============================================================================
224
+ # STATE FACTORY FUNCTIONS
225
+ # =============================================================================
226
+
227
+ def create_initial_state(base_url: str) -> AgentState:
228
+ """Create initial state for a new workflow."""
229
+ return {
230
+ # Input
231
+ "base_url": base_url,
232
+
233
+ # Discovery
234
+ "discovered_pages": [],
235
+ "pages_to_crawl": [],
236
+
237
+ # Extraction
238
+ "desktop_extraction": None,
239
+ "desktop_crawl_progress": 0.0,
240
+ "mobile_extraction": None,
241
+ "mobile_crawl_progress": 0.0,
242
+
243
+ # Normalization
244
+ "desktop_normalized": None,
245
+ "mobile_normalized": None,
246
+ "accepted_colors": [],
247
+ "rejected_colors": [],
248
+ "accepted_typography": [],
249
+ "rejected_typography": [],
250
+ "accepted_spacing": [],
251
+ "rejected_spacing": [],
252
+
253
+ # Advisor
254
+ "upgrade_recommendations": None,
255
+ "selected_type_scale": None,
256
+ "selected_spacing_system": None,
257
+ "selected_naming_convention": None,
258
+ "selected_color_ramps": {},
259
+ "selected_a11y_fixes": [],
260
+
261
+ # Generation
262
+ "desktop_final": None,
263
+ "mobile_final": None,
264
+ "version_label": "v1-recovered",
265
+
266
+ # Workflow
267
+ "current_stage": "discover",
268
+ "awaiting_human_input": False,
269
+ "checkpoint_name": None,
270
+ "errors": [],
271
+ "warnings": [],
272
+ "messages": [],
273
+
274
+ # Timing
275
+ "started_at": datetime.now(),
276
+ "stage_started_at": datetime.now(),
277
+ }
278
+
279
+
280
+ def get_stage_progress(state: AgentState) -> dict:
281
+ """Get progress information for the current workflow."""
282
+ stages = ["discover", "extract", "normalize", "advise", "generate", "export"]
283
+ current_idx = stages.index(state["current_stage"]) if state["current_stage"] in stages else 0
284
+
285
+ return {
286
+ "current_stage": state["current_stage"],
287
+ "stage_index": current_idx,
288
+ "total_stages": len(stages),
289
+ "progress_percent": (current_idx / len(stages)) * 100,
290
+ "awaiting_human": state["awaiting_human_input"],
291
+ "checkpoint": state["checkpoint_name"],
292
+ }
app.py ADDED
@@ -0,0 +1,453 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Design System Extractor v2 — Main Application
3
+ ==============================================
4
+
5
+ A semi-automated, human-in-the-loop agentic system that reverse-engineers
6
+ design systems from live websites.
7
+
8
+ Usage:
9
+ python app.py
10
+ """
11
+
12
+ import os
13
+ import asyncio
14
+ import gradio as gr
15
+ from datetime import datetime
16
+
17
+ # Get HF token from environment if available
18
+ HF_TOKEN_FROM_ENV = os.getenv("HF_TOKEN", "")
19
+
20
+ # =============================================================================
21
+ # GLOBAL STATE
22
+ # =============================================================================
23
+
24
+ current_extraction: dict = {}
25
+ user_hf_token: str = ""
26
+
27
+
28
+ # =============================================================================
29
+ # HF TOKEN MANAGEMENT
30
+ # =============================================================================
31
+
32
+ def set_hf_token(token: str) -> str:
33
+ """Set the HF token globally."""
34
+ global user_hf_token
35
+
36
+ if not token or len(token) < 10:
37
+ return "❌ Please enter a valid HuggingFace token"
38
+
39
+ user_hf_token = token.strip()
40
+ os.environ["HF_TOKEN"] = user_hf_token
41
+
42
+ return "✅ Token saved! You can now use the extractor."
43
+
44
+
45
+ # =============================================================================
46
+ # LAZY IMPORTS (avoid circular imports at startup)
47
+ # =============================================================================
48
+
49
+ _crawler_module = None
50
+ _extractor_module = None
51
+ _schema_module = None
52
+
53
+ def get_crawler():
54
+ global _crawler_module
55
+ if _crawler_module is None:
56
+ from agents import crawler as _crawler_module
57
+ return _crawler_module
58
+
59
+ def get_extractor():
60
+ global _extractor_module
61
+ if _extractor_module is None:
62
+ from agents import extractor as _extractor_module
63
+ return _extractor_module
64
+
65
+ def get_schema():
66
+ global _schema_module
67
+ if _schema_module is None:
68
+ from core import token_schema as _schema_module
69
+ return _schema_module
70
+
71
+
72
+ # =============================================================================
73
+ # STAGE 1: URL INPUT & PAGE DISCOVERY
74
+ # =============================================================================
75
+
76
+ async def discover_site_pages(url: str, progress=gr.Progress()) -> tuple:
77
+ """
78
+ Discover pages from a website URL.
79
+
80
+ Returns tuple of (status_message, pages_dataframe, pages_json)
81
+ """
82
+ if not url or not url.startswith(("http://", "https://")):
83
+ return "❌ Please enter a valid URL starting with http:// or https://", None, None
84
+
85
+ progress(0, desc="Initializing browser...")
86
+
87
+ try:
88
+ crawler = get_crawler()
89
+ discoverer = crawler.PageDiscoverer()
90
+
91
+ def update_progress(p):
92
+ progress(p, desc=f"Discovering pages... ({int(p*100)}%)")
93
+
94
+ pages = await discoverer.discover(url, progress_callback=update_progress)
95
+
96
+ # Format for display
97
+ pages_data = []
98
+ for page in pages:
99
+ pages_data.append({
100
+ "Select": page.selected,
101
+ "URL": page.url,
102
+ "Title": page.title or "(No title)",
103
+ "Type": page.page_type.value,
104
+ "Status": "✓" if not page.error else f"⚠ {page.error}",
105
+ })
106
+
107
+ # Store for later use
108
+ current_extraction["discovered_pages"] = pages
109
+ current_extraction["base_url"] = url
110
+
111
+ status = f"✅ Found {len(pages)} pages. Select the pages you want to extract tokens from."
112
+
113
+ return status, pages_data, [p.model_dump() for p in pages]
114
+
115
+ except Exception as e:
116
+ import traceback
117
+ return f"❌ Error: {str(e)}\n\n{traceback.format_exc()}", None, None
118
+
119
+
120
+ async def start_extraction(pages_selection: list, viewport_choice: str, progress=gr.Progress()) -> tuple:
121
+ """
122
+ Start token extraction from selected pages.
123
+
124
+ Returns tuple of (status, colors_data, typography_data, spacing_data)
125
+ """
126
+ if not pages_selection:
127
+ return "❌ Please select at least one page", None, None, None
128
+
129
+ # Get selected URLs
130
+ selected_urls = []
131
+ for row in pages_selection:
132
+ if row.get("Select", False):
133
+ selected_urls.append(row["URL"])
134
+
135
+ if not selected_urls:
136
+ return "❌ Please select at least one page using the checkboxes", None, None, None
137
+
138
+ # Determine viewport
139
+ schema = get_schema()
140
+ viewport = schema.Viewport.DESKTOP if viewport_choice == "Desktop (1440px)" else schema.Viewport.MOBILE
141
+
142
+ progress(0, desc=f"Starting {viewport.value} extraction...")
143
+
144
+ try:
145
+ extractor_mod = get_extractor()
146
+ extractor = extractor_mod.TokenExtractor(viewport=viewport)
147
+
148
+ def update_progress(p):
149
+ progress(p, desc=f"Extracting tokens... ({int(p*100)}%)")
150
+
151
+ result = await extractor.extract(selected_urls, progress_callback=update_progress)
152
+
153
+ # Store result
154
+ current_extraction[f"{viewport.value}_tokens"] = result
155
+
156
+ # Format colors for display
157
+ colors_data = []
158
+ for color in sorted(result.colors, key=lambda c: -c.frequency)[:50]:
159
+ colors_data.append({
160
+ "Accept": True,
161
+ "Color": color.value,
162
+ "Frequency": color.frequency,
163
+ "Context": ", ".join(color.contexts[:3]),
164
+ "Contrast (White)": f"{color.contrast_white}:1",
165
+ "AA Text": "✓" if color.wcag_aa_small_text else "✗",
166
+ "Confidence": color.confidence.value,
167
+ })
168
+
169
+ # Format typography for display
170
+ typography_data = []
171
+ for typo in sorted(result.typography, key=lambda t: -t.frequency)[:30]:
172
+ typography_data.append({
173
+ "Accept": True,
174
+ "Font": typo.font_family,
175
+ "Size": typo.font_size,
176
+ "Weight": typo.font_weight,
177
+ "Line Height": typo.line_height,
178
+ "Elements": ", ".join(typo.elements[:3]),
179
+ "Frequency": typo.frequency,
180
+ })
181
+
182
+ # Format spacing for display
183
+ spacing_data = []
184
+ for space in sorted(result.spacing, key=lambda s: s.value_px)[:20]:
185
+ spacing_data.append({
186
+ "Accept": True,
187
+ "Value": space.value,
188
+ "Frequency": space.frequency,
189
+ "Context": ", ".join(space.contexts[:2]),
190
+ "Fits 8px": "✓" if space.fits_base_8 else "",
191
+ "Outlier": "⚠" if space.is_outlier else "",
192
+ })
193
+
194
+ # Summary
195
+ status = f"""✅ Extraction Complete ({viewport.value})
196
+
197
+ **Summary:**
198
+ - Pages crawled: {len(result.pages_crawled)}
199
+ - Colors found: {len(result.colors)}
200
+ - Typography styles: {len(result.typography)}
201
+ - Spacing values: {len(result.spacing)}
202
+ - Font families: {len(result.font_families)}
203
+ - Detected spacing base: {result.spacing_base or 'Unknown'}px
204
+ - Duration: {result.extraction_duration_ms}ms
205
+ """
206
+
207
+ if result.warnings:
208
+ status += f"\n⚠️ Warnings: {len(result.warnings)}"
209
+ if result.errors:
210
+ status += f"\n❌ Errors: {len(result.errors)}"
211
+
212
+ return status, colors_data, typography_data, spacing_data
213
+
214
+ except Exception as e:
215
+ import traceback
216
+ return f"❌ Extraction failed: {str(e)}\n\n{traceback.format_exc()}", None, None, None
217
+
218
+
219
+ def export_tokens_json():
220
+ """Export current tokens to JSON."""
221
+ import json
222
+
223
+ result = {}
224
+
225
+ if "desktop_tokens" in current_extraction:
226
+ desktop = current_extraction["desktop_tokens"]
227
+ result["desktop"] = {
228
+ "colors": [c.model_dump() for c in desktop.colors],
229
+ "typography": [t.model_dump() for t in desktop.typography],
230
+ "spacing": [s.model_dump() for s in desktop.spacing],
231
+ "metadata": desktop.summary(),
232
+ }
233
+
234
+ if "mobile_tokens" in current_extraction:
235
+ mobile = current_extraction["mobile_tokens"]
236
+ result["mobile"] = {
237
+ "colors": [c.model_dump() for c in mobile.colors],
238
+ "typography": [t.model_dump() for t in mobile.typography],
239
+ "spacing": [s.model_dump() for s in mobile.spacing],
240
+ "metadata": mobile.summary(),
241
+ }
242
+
243
+ if not result:
244
+ return '{"error": "No tokens extracted yet. Please run extraction first."}'
245
+
246
+ return json.dumps(result, indent=2, default=str)
247
+
248
+
249
+ # =============================================================================
250
+ # UI BUILDING
251
+ # =============================================================================
252
+
253
+ def create_ui():
254
+ """Create the Gradio interface."""
255
+
256
+ with gr.Blocks(
257
+ title="Design System Extractor v2",
258
+ theme=gr.themes.Soft(),
259
+ ) as app:
260
+
261
+ # Header
262
+ gr.Markdown("""
263
+ # 🎨 Design System Extractor v2
264
+
265
+ **Reverse-engineer design systems from live websites.**
266
+
267
+ Extract colors, typography, and spacing tokens from any website and export to Figma-compatible JSON.
268
+
269
+ ---
270
+ """)
271
+
272
+ # =================================================================
273
+ # CONFIGURATION SECTION
274
+ # =================================================================
275
+
276
+ with gr.Accordion("⚙️ Configuration", open=not bool(HF_TOKEN_FROM_ENV)):
277
+
278
+ gr.Markdown("""
279
+ **HuggingFace Token** is required for AI-powered features (Agent 2-4).
280
+ Get your token at: [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens)
281
+
282
+ *Note: Basic extraction (Agent 1) works without a token.*
283
+ """)
284
+
285
+ with gr.Row():
286
+ hf_token_input = gr.Textbox(
287
+ label="HuggingFace Token",
288
+ placeholder="hf_xxxxxxxxxxxxxxxxxxxx",
289
+ type="password",
290
+ scale=4,
291
+ value=HF_TOKEN_FROM_ENV if HF_TOKEN_FROM_ENV else "",
292
+ )
293
+ save_token_btn = gr.Button("💾 Save Token", scale=1)
294
+
295
+ token_status = gr.Markdown(
296
+ "✅ Token loaded from environment" if HF_TOKEN_FROM_ENV else "⏳ Enter your HF token to enable all features"
297
+ )
298
+
299
+ save_token_btn.click(
300
+ fn=set_hf_token,
301
+ inputs=[hf_token_input],
302
+ outputs=[token_status],
303
+ )
304
+
305
+ # =================================================================
306
+ # STAGE 1: URL Input & Discovery
307
+ # =================================================================
308
+
309
+ with gr.Accordion("📍 Stage 1: Website Discovery", open=True):
310
+
311
+ gr.Markdown("""
312
+ **Step 1:** Enter your website URL and discover pages.
313
+ The system will automatically find and classify pages for extraction.
314
+ """)
315
+
316
+ with gr.Row():
317
+ url_input = gr.Textbox(
318
+ label="Website URL",
319
+ placeholder="https://example.com",
320
+ scale=4,
321
+ )
322
+ discover_btn = gr.Button("🔍 Discover Pages", variant="primary", scale=1)
323
+
324
+ discovery_status = gr.Markdown("")
325
+
326
+ pages_table = gr.Dataframe(
327
+ headers=["Select", "URL", "Title", "Type", "Status"],
328
+ datatype=["bool", "str", "str", "str", "str"],
329
+ interactive=True,
330
+ label="Discovered Pages",
331
+ visible=False,
332
+ )
333
+
334
+ pages_json = gr.JSON(visible=False)
335
+
336
+ # =================================================================
337
+ # STAGE 2: Extraction
338
+ # =================================================================
339
+
340
+ with gr.Accordion("🔬 Stage 2: Token Extraction", open=False):
341
+
342
+ gr.Markdown("""
343
+ **Step 2:** Select pages and viewport, then extract design tokens.
344
+ """)
345
+
346
+ with gr.Row():
347
+ viewport_radio = gr.Radio(
348
+ choices=["Desktop (1440px)", "Mobile (375px)"],
349
+ value="Desktop (1440px)",
350
+ label="Viewport",
351
+ )
352
+ extract_btn = gr.Button("🚀 Extract Tokens", variant="primary")
353
+
354
+ extraction_status = gr.Markdown("")
355
+
356
+ with gr.Tabs():
357
+ with gr.Tab("🎨 Colors"):
358
+ colors_table = gr.Dataframe(
359
+ headers=["Accept", "Color", "Frequency", "Context", "Contrast (White)", "AA Text", "Confidence"],
360
+ datatype=["bool", "str", "number", "str", "str", "str", "str"],
361
+ interactive=True,
362
+ label="Extracted Colors",
363
+ )
364
+
365
+ with gr.Tab("📝 Typography"):
366
+ typography_table = gr.Dataframe(
367
+ headers=["Accept", "Font", "Size", "Weight", "Line Height", "Elements", "Frequency"],
368
+ datatype=["bool", "str", "str", "number", "str", "str", "number"],
369
+ interactive=True,
370
+ label="Extracted Typography",
371
+ )
372
+
373
+ with gr.Tab("📏 Spacing"):
374
+ spacing_table = gr.Dataframe(
375
+ headers=["Accept", "Value", "Frequency", "Context", "Fits 8px", "Outlier"],
376
+ datatype=["bool", "str", "number", "str", "str", "str"],
377
+ interactive=True,
378
+ label="Extracted Spacing",
379
+ )
380
+
381
+ # =================================================================
382
+ # STAGE 3: Export
383
+ # =================================================================
384
+
385
+ with gr.Accordion("📦 Stage 3: Export", open=False):
386
+
387
+ gr.Markdown("""
388
+ **Step 3:** Review and export your design tokens.
389
+ """)
390
+
391
+ with gr.Row():
392
+ export_btn = gr.Button("📥 Export JSON", variant="secondary")
393
+
394
+ export_output = gr.Code(
395
+ label="Exported Tokens (JSON)",
396
+ language="json",
397
+ lines=20,
398
+ )
399
+
400
+ # =================================================================
401
+ # EVENT HANDLERS
402
+ # =================================================================
403
+
404
+ # Discovery
405
+ discover_btn.click(
406
+ fn=discover_site_pages,
407
+ inputs=[url_input],
408
+ outputs=[discovery_status, pages_table, pages_json],
409
+ ).then(
410
+ fn=lambda: gr.update(visible=True),
411
+ outputs=[pages_table],
412
+ )
413
+
414
+ # Extraction
415
+ extract_btn.click(
416
+ fn=start_extraction,
417
+ inputs=[pages_table, viewport_radio],
418
+ outputs=[extraction_status, colors_table, typography_table, spacing_table],
419
+ )
420
+
421
+ # Export
422
+ export_btn.click(
423
+ fn=export_tokens_json,
424
+ outputs=[export_output],
425
+ )
426
+
427
+ # =================================================================
428
+ # FOOTER
429
+ # =================================================================
430
+
431
+ gr.Markdown("""
432
+ ---
433
+
434
+ **Design System Extractor v2** | Built with LangGraph + Gradio + HuggingFace
435
+
436
+ *A semi-automated co-pilot for design system recovery and modernization.*
437
+
438
+ **Models:** Microsoft Phi (Normalizer) • Meta Llama (Advisor) • Mistral Codestral (Generator)
439
+ """)
440
+
441
+ return app
442
+
443
+
444
+ # =============================================================================
445
+ # MAIN
446
+ # =============================================================================
447
+
448
+ if __name__ == "__main__":
449
+ app = create_ui()
450
+ app.launch(
451
+ server_name="0.0.0.0",
452
+ server_port=7860,
453
+ )
config/.env.example ADDED
@@ -0,0 +1,143 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # =============================================================================
2
+ # Design System Extractor v2 — Environment Variables
3
+ # =============================================================================
4
+ # Copy this file to .env and fill in your values
5
+ # NEVER commit .env to version control
6
+ # =============================================================================
7
+
8
+ # -----------------------------------------------------------------------------
9
+ # REQUIRED: Hugging Face Token (Pro recommended for best models)
10
+ # -----------------------------------------------------------------------------
11
+
12
+ # HuggingFace Token (for Spaces deployment and model access)
13
+ # Get yours at: https://huggingface.co/settings/tokens
14
+ # Pro subscription unlocks: Llama 3.1 405B, Qwen 72B, Command R+, etc.
15
+ HF_TOKEN=your_huggingface_token_here
16
+
17
+ # HuggingFace Space name (for deployment)
18
+ HF_SPACE_NAME=your-username/design-system-extractor
19
+
20
+ # -----------------------------------------------------------------------------
21
+ # MODEL CONFIGURATION — Diverse Models for Different Tasks
22
+ # -----------------------------------------------------------------------------
23
+
24
+ # === Agent 1 (Crawler/Extractor): NO LLM NEEDED ===
25
+ # Pure rule-based extraction using Playwright + CSS parsing
26
+
27
+ # === Agent 2 (Normalizer): FAST STRUCTURED OUTPUT ===
28
+ # Task: Token naming, duplicate detection, pattern inference
29
+ # Needs: Good instruction following, JSON output, SPEED
30
+ #
31
+ # Options (pick one):
32
+ # - microsoft/Phi-3.5-mini-instruct (Fast, great for structured tasks)
33
+ # - mistralai/Mistral-7B-Instruct-v0.3 (Fast, good JSON)
34
+ # - google/gemma-2-9b-it (Balanced speed/quality)
35
+ # - Qwen/Qwen2.5-7B-Instruct (Good all-rounder)
36
+ AGENT2_MODEL=microsoft/Phi-3.5-mini-instruct
37
+
38
+ # === Agent 3 (Advisor): STRONG REASONING — Most Important! ===
39
+ # Task: Design system analysis, best practice recommendations, trade-off analysis
40
+ # Needs: Deep reasoning, design knowledge, creative suggestions
41
+ #
42
+ # Options (pick one - Pro tier recommended):
43
+ # - meta-llama/Llama-3.1-70B-Instruct (Excellent reasoning, long context)
44
+ # - CohereForAI/c4ai-command-r-plus (Great for analysis & recommendations)
45
+ # - Qwen/Qwen2.5-72B-Instruct (Strong reasoning, good design knowledge)
46
+ # - mistralai/Mixtral-8x22B-Instruct-v0.1 (Large MoE, good balance)
47
+ # - meta-llama/Llama-3.1-405B-Instruct (BEST - if you have Pro++)
48
+ AGENT3_MODEL=meta-llama/Llama-3.1-70B-Instruct
49
+
50
+ # === Agent 4 (Generator): CODE/JSON SPECIALIST ===
51
+ # Task: Generate Tokens Studio JSON, CSS variables, structured output
52
+ # Needs: Precise formatting, code generation, schema adherence
53
+ #
54
+ # Options (pick one):
55
+ # - codellama/CodeLlama-34b-Instruct-hf (Code specialist)
56
+ # - bigcode/starcoder2-15b-instruct-v0.1 (Code generation)
57
+ # - mistralai/Codestral-22B-v0.1 (Mistral's code model)
58
+ # - deepseek-ai/deepseek-coder-33b-instruct (Strong code model)
59
+ AGENT4_MODEL=mistralai/Codestral-22B-v0.1
60
+
61
+ # === Fallback Model (if primary fails) ===
62
+ FALLBACK_MODEL=mistralai/Mistral-7B-Instruct-v0.3
63
+
64
+ # -----------------------------------------------------------------------------
65
+ # PRESET CONFIGURATIONS
66
+ # -----------------------------------------------------------------------------
67
+
68
+ # Uncomment ONE preset below, or configure individually above
69
+
70
+ # --- PRESET: BUDGET (Free tier compatible) ---
71
+ # AGENT2_MODEL=microsoft/Phi-3.5-mini-instruct
72
+ # AGENT3_MODEL=mistralai/Mixtral-8x7B-Instruct-v0.1
73
+ # AGENT4_MODEL=mistralai/Mistral-7B-Instruct-v0.3
74
+
75
+ # --- PRESET: BALANCED (Pro tier) ---
76
+ # AGENT2_MODEL=google/gemma-2-9b-it
77
+ # AGENT3_MODEL=meta-llama/Llama-3.1-70B-Instruct
78
+ # AGENT4_MODEL=mistralai/Codestral-22B-v0.1
79
+
80
+ # --- PRESET: MAXIMUM QUALITY (Pro tier) ---
81
+ # AGENT2_MODEL=google/gemma-2-27b-it
82
+ # AGENT3_MODEL=meta-llama/Llama-3.1-405B-Instruct
83
+ # AGENT4_MODEL=deepseek-ai/deepseek-coder-33b-instruct
84
+
85
+ # -----------------------------------------------------------------------------
86
+ # OPTIONAL: Application Settings
87
+ # -----------------------------------------------------------------------------
88
+
89
+ DEBUG=false
90
+ LOG_LEVEL=INFO
91
+ MAX_PAGES=20
92
+ MIN_PAGES=10
93
+
94
+ # -----------------------------------------------------------------------------
95
+ # OPTIONAL: Browser Settings (Playwright)
96
+ # -----------------------------------------------------------------------------
97
+
98
+ BROWSER_TYPE=chromium
99
+ BROWSER_HEADLESS=true
100
+ BROWSER_TIMEOUT=30000
101
+ NETWORK_IDLE_TIMEOUT=5000
102
+
103
+ # -----------------------------------------------------------------------------
104
+ # OPTIONAL: Storage Settings
105
+ # -----------------------------------------------------------------------------
106
+
107
+ STORAGE_PATH=/data
108
+ ENABLE_PERSISTENCE=true
109
+ MAX_VERSIONS=10
110
+
111
+ # -----------------------------------------------------------------------------
112
+ # OPTIONAL: Rate Limiting
113
+ # -----------------------------------------------------------------------------
114
+
115
+ CRAWL_DELAY_MS=1000
116
+ MAX_CONCURRENT_CRAWLS=3
117
+ RESPECT_ROBOTS_TXT=true
118
+
119
+ # -----------------------------------------------------------------------------
120
+ # OPTIONAL: HuggingFace Inference Settings
121
+ # -----------------------------------------------------------------------------
122
+
123
+ USE_HF_INFERENCE_API=true
124
+ HF_INFERENCE_TIMEOUT=120
125
+ HF_MAX_NEW_TOKENS=2048
126
+ HF_TEMPERATURE=0.3
127
+
128
+ # -----------------------------------------------------------------------------
129
+ # OPTIONAL: UI Settings
130
+ # -----------------------------------------------------------------------------
131
+
132
+ SERVER_PORT=7860
133
+ SHARE=false
134
+ UI_THEME=soft
135
+
136
+ # -----------------------------------------------------------------------------
137
+ # OPTIONAL: Feature Flags
138
+ # -----------------------------------------------------------------------------
139
+
140
+ FEATURE_COLOR_RAMPS=true
141
+ FEATURE_TYPE_SCALES=true
142
+ FEATURE_A11Y_CHECKS=true
143
+ FEATURE_PARALLEL_EXTRACTION=true
config/agents.yaml ADDED
@@ -0,0 +1,547 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Agent Personas & Configuration
2
+ # Design System Extractor v2
3
+
4
+ # =============================================================================
5
+ # MODEL CONFIGURATION
6
+ # =============================================================================
7
+ # Model assignments for each agent based on task complexity
8
+ # =============================================================================
9
+
10
+ models:
11
+ # Agent 1: Crawler & Extractor — NO LLM NEEDED (Rule-based)
12
+ agent_crawler:
13
+ requires_llm: false
14
+ notes: "Pure rule-based extraction using Playwright + CSS parsing"
15
+
16
+ # Agent 2: Normalizer — LIGHT MODEL
17
+ agent_normalizer:
18
+ requires_llm: true
19
+ primary_model: "Qwen/Qwen2.5-7B-Instruct"
20
+ fallback_model: "microsoft/Phi-3-mini-4k-instruct"
21
+ tasks:
22
+ - "Suggest semantic token names from values"
23
+ - "Explain duplicate detection rationale"
24
+ - "Infer naming conventions from class names"
25
+ temperature: 0.2 # Low temp for consistent naming
26
+ max_tokens: 1024
27
+ why_this_model: |
28
+ 7B model is sufficient because:
29
+ - Tasks are mostly pattern matching and naming
30
+ - Doesn't require complex reasoning
31
+ - Fast inference, good for iterative workflows
32
+
33
+ # Agent 3: Advisor — STRONG MODEL (Key agent)
34
+ agent_advisor:
35
+ requires_llm: true
36
+ primary_model: "Qwen/Qwen2.5-72B-Instruct"
37
+ fallback_model: "mistralai/Mixtral-8x7B-Instruct-v0.1"
38
+ budget_model: "Qwen/Qwen2.5-7B-Instruct" # If quota constrained
39
+ tasks:
40
+ - "Analyze extracted system for patterns and anti-patterns"
41
+ - "Research and recommend type scales with rationale"
42
+ - "Suggest spacing systems based on detected patterns"
43
+ - "Identify accessibility issues and fixes"
44
+ - "Compare against industry design systems"
45
+ temperature: 0.4 # Moderate temp for creative recommendations
46
+ max_tokens: 4096
47
+ why_this_model: |
48
+ 72B model is critical because:
49
+ - Needs deep understanding of design systems
50
+ - Must reason about trade-offs (pros/cons)
51
+ - Should know Material, Polaris, Carbon, etc.
52
+ - Quality of recommendations is user-facing
53
+
54
+ # Agent 4: Generator — LIGHT MODEL
55
+ agent_generator:
56
+ requires_llm: true
57
+ primary_model: "Qwen/Qwen2.5-7B-Instruct"
58
+ fallback_model: "mistralai/Mistral-7B-Instruct-v0.3"
59
+ tasks:
60
+ - "Format tokens into Tokens Studio JSON"
61
+ - "Generate CSS custom properties"
62
+ - "Structure metadata correctly"
63
+ temperature: 0.1 # Very low temp for consistent formatting
64
+ max_tokens: 8192 # Larger for full JSON output
65
+ why_this_model: |
66
+ 7B model is sufficient because:
67
+ - Structured output generation is formulaic
68
+ - JSON schema is well-defined
69
+ - Speed matters for export flow
70
+
71
+ # =============================================================================
72
+ # FREE TIER vs PRO RECOMMENDATIONS
73
+ # =============================================================================
74
+
75
+ tier_recommendations:
76
+ free_tier:
77
+ description: "For users without HF Pro subscription"
78
+ agent2: "Qwen/Qwen2.5-7B-Instruct"
79
+ agent3: "mistralai/Mixtral-8x7B-Instruct-v0.1" # Best free option for reasoning
80
+ agent4: "Qwen/Qwen2.5-7B-Instruct"
81
+ notes: "Quality will be slightly lower for Agent 3 recommendations"
82
+
83
+ pro_tier:
84
+ description: "For users with HF Pro subscription"
85
+ agent2: "Qwen/Qwen2.5-7B-Instruct"
86
+ agent3: "Qwen/Qwen2.5-72B-Instruct" # Full quality
87
+ agent4: "Qwen/Qwen2.5-7B-Instruct"
88
+ notes: "Best quality, especially for design system recommendations"
89
+
90
+ # =============================================================================
91
+ # AGENT 1: Website Crawler & Extractor
92
+ # =============================================================================
93
+ agent_crawler:
94
+ name: "Design Archaeologist"
95
+ persona: |
96
+ You are a meticulous Design Archaeologist. Your job is to carefully excavate
97
+ design decisions buried in website code. You approach each site with curiosity
98
+ and systematic precision, documenting everything you find without judgment.
99
+
100
+ You understand that live websites are often messy — accumulated decisions from
101
+ multiple designers over years. Your role is to faithfully extract what exists,
102
+ not to fix or improve it yet.
103
+
104
+ responsibilities:
105
+ - Auto-discover pages from base URL (minimum 10)
106
+ - Include key templates: homepage, listing, detail, form, marketing, auth
107
+ - Scroll pages fully (above + below fold)
108
+ - Extract separately for Desktop (1440px) and Mobile (375px)
109
+
110
+ extraction_targets:
111
+ colors:
112
+ - hex values
113
+ - rgb/rgba values
114
+ - usage frequency
115
+ - context (background, text, border, etc.)
116
+ typography:
117
+ - font families
118
+ - font sizes (px, rem, em)
119
+ - line heights
120
+ - font weights
121
+ - letter spacing
122
+ spacing:
123
+ - margin values
124
+ - padding values
125
+ - gap values
126
+ - infer base system (4px, 8px)
127
+ other:
128
+ - border radius
129
+ - box shadows
130
+ - layout signals (containers, grids)
131
+
132
+ tools:
133
+ - playwright (crawling, scrolling, computed styles)
134
+ - css parsing (CSSOM)
135
+
136
+ output_format:
137
+ - raw extracted tokens
138
+ - confidence score per token
139
+ - desktop/mobile separated
140
+ - errors and warnings logged
141
+
142
+ guardrails:
143
+ - Never modify or "fix" extracted values
144
+ - Always preserve original CSS values
145
+ - Log anomalies, don't hide them
146
+ - Respect robots.txt (configurable)
147
+
148
+ # =============================================================================
149
+ # AGENT 2: Token Normalizer & Structurer
150
+ # =============================================================================
151
+ agent_normalizer:
152
+ name: "Design System Librarian"
153
+ persona: |
154
+ You are a methodical Design System Librarian. Your expertise is in organizing
155
+ chaotic information into structured, meaningful categories. You see patterns
156
+ where others see noise.
157
+
158
+ You are careful to distinguish between what you KNOW (detected) and what you
159
+ THINK (inferred). You never overwrite the truth — you annotate it.
160
+
161
+ responsibilities:
162
+ - Clean noisy extraction data
163
+ - Group and merge duplicates (with threshold tolerance)
164
+ - Infer naming patterns from class names
165
+ - Propose initial token names
166
+ - Tag confidence levels
167
+
168
+ naming_conventions:
169
+ colors:
170
+ pattern: "color.{role}.{shade}"
171
+ example: "color.primary.500"
172
+ roles:
173
+ - primary
174
+ - secondary
175
+ - neutral
176
+ - success
177
+ - warning
178
+ - error
179
+ shades:
180
+ - 50
181
+ - 100
182
+ - 200
183
+ - 300
184
+ - 400
185
+ - 500
186
+ - 600
187
+ - 700
188
+ - 800
189
+ - 900
190
+ typography:
191
+ pattern: "font.{category}.{size}"
192
+ example: "font.heading.lg"
193
+ categories:
194
+ - heading
195
+ - body
196
+ - label
197
+ - caption
198
+ sizes:
199
+ - xs
200
+ - sm
201
+ - md
202
+ - lg
203
+ - xl
204
+ - 2xl
205
+ spacing:
206
+ pattern: "space.{size}"
207
+ example: "space.4, space.8"
208
+ note: "based on pixel value / 4"
209
+
210
+ tagging:
211
+ detected: "Directly found in CSS, high confidence"
212
+ inferred: "Derived from patterns, medium confidence"
213
+ low_confidence: "Appears rarely or inconsistently"
214
+
215
+ duplicate_threshold:
216
+ colors: 3 # hex values within 3 steps are potential duplicates
217
+ spacing: 2 # pixel values within 2px are potential duplicates
218
+
219
+ guardrails:
220
+ - Never overwrite extracted truth
221
+ - Always mark inferred vs detected
222
+ - Preserve original values alongside normalized
223
+ - Flag conflicts, don't resolve them silently
224
+
225
+ # =============================================================================
226
+ # AGENT 3: Design System Best Practices Advisor
227
+ # =============================================================================
228
+ agent_advisor:
229
+ name: "Senior Staff Design Systems Architect"
230
+ persona: |
231
+ You are a Senior Staff Design Systems Architect with 15+ years of experience
232
+ building and scaling design systems at major tech companies. You've seen what
233
+ works and what doesn't.
234
+
235
+ Your role is to ADVISE, not DECIDE. You present options with clear rationale,
236
+ letting humans make the final call. You respect existing decisions while
237
+ offering paths to improvement.
238
+
239
+ responsibilities:
240
+ - Analyze extracted system for patterns and anti-patterns
241
+ - Research modern design system best practices
242
+ - Propose upgrade OPTIONS (never auto-apply)
243
+ - Ensure accessibility compliance (AA minimum)
244
+
245
+ research_sources:
246
+ - Material Design (Google)
247
+ - Polaris (Shopify)
248
+ - Carbon (IBM)
249
+ - Fluent (Microsoft)
250
+ - Primer (GitHub)
251
+ - Radix
252
+ - Tailwind CSS
253
+
254
+ upgrade_categories:
255
+ typography_scales:
256
+ - name: "Minor Third"
257
+ ratio: 1.2
258
+ description: "Subtle progression, good for dense UIs"
259
+ - name: "Major Third"
260
+ ratio: 1.25
261
+ description: "Balanced, most popular choice"
262
+ - name: "Perfect Fourth"
263
+ ratio: 1.333
264
+ description: "Strong hierarchy, good for marketing"
265
+ - name: "Golden Ratio"
266
+ ratio: 1.618
267
+ description: "Dramatic, use sparingly"
268
+
269
+ spacing_systems:
270
+ - name: "4px base"
271
+ scale: [4, 8, 12, 16, 20, 24, 32, 40, 48, 64]
272
+ description: "Fine-grained control"
273
+ - name: "8px base"
274
+ scale: [4, 8, 16, 24, 32, 48, 64, 80, 96]
275
+ description: "Industry standard, recommended"
276
+ - name: "Tailwind"
277
+ scale: [4, 8, 12, 16, 20, 24, 32, 40, 48, 56, 64, 80, 96]
278
+ description: "Utility-first, comprehensive"
279
+
280
+ naming_conventions:
281
+ - name: "T-shirt sizes"
282
+ example: "xs, sm, md, lg, xl, 2xl"
283
+ pros: "Intuitive, easy to remember"
284
+ cons: "Limited granularity"
285
+ - name: "Numeric"
286
+ example: "100, 200, 300, 400, 500"
287
+ pros: "Extensible, precise"
288
+ cons: "Less intuitive"
289
+ - name: "Semantic"
290
+ example: "caption, body, subhead, title, display"
291
+ pros: "Meaningful, self-documenting"
292
+ cons: "Harder to extend"
293
+
294
+ color_ramps:
295
+ shades: [50, 100, 200, 300, 400, 500, 600, 700, 800, 900]
296
+ aa_contrast_minimum: 4.5
297
+ aaa_contrast_minimum: 7.0
298
+
299
+ output_format:
300
+ - option sets (never single recommendations)
301
+ - rationale for each option
302
+ - pros and cons
303
+ - accessibility impact
304
+ - migration effort estimate
305
+
306
+ guardrails:
307
+ - Never auto-apply changes
308
+ - Always provide multiple options
309
+ - Respect existing system, suggest improvements
310
+ - Flag accessibility issues prominently
311
+
312
+ # =============================================================================
313
+ # AGENT 4: Plugin & JSON Generator
314
+ # =============================================================================
315
+ agent_generator:
316
+ name: "Automation Engineer"
317
+ persona: |
318
+ You are a precise Automation Engineer. Your job is to transform design
319
+ decisions into machine-readable formats that tools can consume. You care
320
+ deeply about compatibility, versioning, and clean output.
321
+
322
+ You understand that your output will be used by Figma plugins, CSS
323
+ preprocessors, and design tools — so format matters.
324
+
325
+ responsibilities:
326
+ - Convert finalized tokens to standard formats
327
+ - Generate color ramps with AA compliance
328
+ - Maintain viewport separation (Desktop/Mobile)
329
+ - Version all outputs
330
+
331
+ output_formats:
332
+ tokens_studio:
333
+ description: "Compatible with Tokens Studio Figma plugin"
334
+ extension: ".json"
335
+ figma_variables:
336
+ description: "Direct Figma Variables format"
337
+ extension: ".json"
338
+ css_variables:
339
+ description: "CSS custom properties"
340
+ extension: ".css"
341
+ tailwind_config:
342
+ description: "Tailwind CSS configuration"
343
+ extension: ".js"
344
+
345
+ token_structure:
346
+ colors:
347
+ include_ramps: true
348
+ ramp_shades: [50, 100, 200, 300, 400, 500, 600, 700, 800, 900]
349
+ include_contrast: true
350
+ typography:
351
+ include_composite: true # font-family + size + weight + line-height
352
+ include_individual: true
353
+ spacing:
354
+ base: 8
355
+ include_negative: false
356
+
357
+ metadata_fields:
358
+ - source_url
359
+ - extracted_at
360
+ - version
361
+ - viewport
362
+ - token_source (detected/inferred/upgraded)
363
+
364
+ guardrails:
365
+ - Always include metadata
366
+ - Validate JSON before output
367
+ - Preserve source attribution on each token
368
+ - Warn on potential conflicts
369
+
370
+ # =============================================================================
371
+ # LANGGRAPH WORKFLOW CONFIGURATION
372
+ # =============================================================================
373
+ workflow:
374
+ name: "design_system_extraction"
375
+
376
+ checkpoints:
377
+ - id: "confirm_pages"
378
+ description: "Human confirms discovered pages before crawling"
379
+ required: true
380
+ - id: "review_extraction"
381
+ description: "Human reviews extracted tokens (Stage 1 UI)"
382
+ required: true
383
+ - id: "select_upgrades"
384
+ description: "Human selects upgrade options (Stage 2 UI)"
385
+ required: true
386
+ - id: "approve_export"
387
+ description: "Human approves final output (Stage 3 UI)"
388
+ required: true
389
+
390
+ parallel_nodes:
391
+ - name: "viewport_extraction"
392
+ description: "Extract Desktop and Mobile in parallel"
393
+ agents: ["agent_crawler"]
394
+ - name: "research_and_advise"
395
+ description: "Agent 3 can research while human reviews Stage 1"
396
+ agents: ["agent_advisor"]
397
+
398
+ error_handling:
399
+ retry_attempts: 3
400
+ retry_delay_seconds: 5
401
+ log_errors: true
402
+ show_in_ui: true
403
+
404
+ # =============================================================================
405
+ # UI CONFIGURATION
406
+ # =============================================================================
407
+ ui:
408
+ layout: "single_scroll"
409
+
410
+ stages:
411
+ stage1:
412
+ name: "Extraction Review"
413
+ purpose: "Trust building — see what was found"
414
+ components:
415
+ - token_tables
416
+ - color_swatches
417
+ - typography_samples
418
+ - viewport_toggle
419
+ - confidence_indicators
420
+ - accept_reject_controls
421
+
422
+ stage2:
423
+ name: "Upgrade Playground"
424
+ purpose: "Decision making through live visuals"
425
+ components:
426
+ - option_selector
427
+ - live_preview_iframe
428
+ - side_by_side_comparison
429
+ - existing_vs_upgraded_toggle
430
+
431
+ stage3:
432
+ name: "Final Review & Export"
433
+ purpose: "Confidence before export"
434
+ components:
435
+ - token_preview_readonly
436
+ - json_tree_view
437
+ - diff_view
438
+ - viewport_tabs
439
+ - download_buttons
440
+ - version_labeling
441
+
442
+ preview:
443
+ type: "iframe"
444
+ template: "specimen.html"
445
+ update_trigger: "on_selection"
446
+
447
+ # =============================================================================
448
+ # EXTRACTION SETTINGS
449
+ # =============================================================================
450
+ extraction:
451
+ viewports:
452
+ desktop:
453
+ width: 1440
454
+ height: 900
455
+ name: "Desktop"
456
+ mobile:
457
+ width: 375
458
+ height: 812
459
+ name: "Mobile"
460
+
461
+ crawling:
462
+ max_pages: 20
463
+ min_pages: 10
464
+ scroll_behavior: "smooth"
465
+ wait_for_network_idle: true
466
+ network_idle_timeout_ms: 5000
467
+ skip_infinite_scroll: true
468
+ respect_robots_txt: true
469
+
470
+ page_templates:
471
+ required:
472
+ - homepage
473
+ - listing
474
+ - detail
475
+ optional:
476
+ - form
477
+ - marketing
478
+ - auth
479
+ - checkout
480
+ - about
481
+ - contact
482
+
483
+ # =============================================================================
484
+ # COLOR PROCESSING
485
+ # =============================================================================
486
+ color_processing:
487
+ ramp_generation:
488
+ enabled: true
489
+ shades: [50, 100, 200, 300, 400, 500, 600, 700, 800, 900]
490
+ method: "oklch" # or "hsl" or "lab"
491
+
492
+ accessibility:
493
+ check_contrast: true
494
+ minimum_standard: "AA" # or "AAA"
495
+ contrast_pairs:
496
+ - ["text", "background"]
497
+ - ["button-text", "button-background"]
498
+
499
+ duplicate_detection:
500
+ enabled: true
501
+ threshold_delta_e: 3 # CIE Delta E threshold
502
+
503
+ # =============================================================================
504
+ # TYPOGRAPHY PROCESSING
505
+ # =============================================================================
506
+ typography_processing:
507
+ scale_options:
508
+ - name: "Minor Third"
509
+ ratio: 1.2
510
+ - name: "Major Third"
511
+ ratio: 1.25
512
+ - name: "Perfect Fourth"
513
+ ratio: 1.333
514
+
515
+ base_size: 16 # px
516
+
517
+ text_styles:
518
+ - display
519
+ - heading-xl
520
+ - heading-lg
521
+ - heading-md
522
+ - heading-sm
523
+ - body-lg
524
+ - body-md
525
+ - body-sm
526
+ - caption
527
+ - label
528
+
529
+ # =============================================================================
530
+ # SPACING PROCESSING
531
+ # =============================================================================
532
+ spacing_processing:
533
+ base: 8 # px
534
+ scale: [0, 4, 8, 12, 16, 24, 32, 48, 64, 80, 96]
535
+
536
+ names:
537
+ 0: "none"
538
+ 4: "xs"
539
+ 8: "sm"
540
+ 12: "sm-md"
541
+ 16: "md"
542
+ 24: "lg"
543
+ 32: "xl"
544
+ 48: "2xl"
545
+ 64: "3xl"
546
+ 80: "4xl"
547
+ 96: "5xl"
config/settings.py ADDED
@@ -0,0 +1,217 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Application Settings
3
+ Design System Extractor v2
4
+
5
+ Loads configuration from environment variables and YAML files.
6
+ """
7
+
8
+ import os
9
+ from pathlib import Path
10
+ from typing import Optional
11
+ from dataclasses import dataclass, field
12
+ from dotenv import load_dotenv
13
+ import yaml
14
+
15
+ # Load environment variables from .env file
16
+ env_path = Path(__file__).parent / ".env"
17
+ if env_path.exists():
18
+ load_dotenv(env_path)
19
+ else:
20
+ # Try loading from parent directory (for development)
21
+ load_dotenv(Path(__file__).parent.parent / ".env")
22
+
23
+
24
+ @dataclass
25
+ class HFSettings:
26
+ """Hugging Face configuration."""
27
+ hf_token: str = field(default_factory=lambda: os.getenv("HF_TOKEN", ""))
28
+ hf_space_name: str = field(default_factory=lambda: os.getenv("HF_SPACE_NAME", ""))
29
+ use_inference_api: bool = field(default_factory=lambda: os.getenv("USE_HF_INFERENCE_API", "true").lower() == "true")
30
+ inference_timeout: int = field(default_factory=lambda: int(os.getenv("HF_INFERENCE_TIMEOUT", "120")))
31
+ max_new_tokens: int = field(default_factory=lambda: int(os.getenv("HF_MAX_NEW_TOKENS", "2048")))
32
+ temperature: float = field(default_factory=lambda: float(os.getenv("HF_TEMPERATURE", "0.3")))
33
+
34
+
35
+ @dataclass
36
+ class ModelSettings:
37
+ """Model configuration for each agent — Diverse providers."""
38
+ # Agent 1: Rule-based, no LLM needed
39
+
40
+ # Agent 2 (Normalizer): Fast structured output
41
+ # Default: Microsoft Phi (fast, great structured output)
42
+ agent2_model: str = field(default_factory=lambda: os.getenv("AGENT2_MODEL", "microsoft/Phi-3.5-mini-instruct"))
43
+
44
+ # Agent 3 (Advisor): Strong reasoning - MOST IMPORTANT
45
+ # Default: Meta Llama 70B (excellent reasoning)
46
+ agent3_model: str = field(default_factory=lambda: os.getenv("AGENT3_MODEL", "meta-llama/Llama-3.1-70B-Instruct"))
47
+
48
+ # Agent 4 (Generator): Code/JSON specialist
49
+ # Default: Mistral Codestral (code specialist)
50
+ agent4_model: str = field(default_factory=lambda: os.getenv("AGENT4_MODEL", "mistralai/Codestral-22B-v0.1"))
51
+
52
+ # Fallback
53
+ fallback_model: str = field(default_factory=lambda: os.getenv("FALLBACK_MODEL", "mistralai/Mistral-7B-Instruct-v0.3"))
54
+
55
+
56
+ @dataclass
57
+ class APISettings:
58
+ """API key configuration (optional alternatives)."""
59
+ anthropic_api_key: str = field(default_factory=lambda: os.getenv("ANTHROPIC_API_KEY", ""))
60
+ openai_api_key: str = field(default_factory=lambda: os.getenv("OPENAI_API_KEY", ""))
61
+
62
+
63
+ @dataclass
64
+ class BrowserSettings:
65
+ """Playwright browser configuration."""
66
+ browser_type: str = field(default_factory=lambda: os.getenv("BROWSER_TYPE", "chromium"))
67
+ headless: bool = field(default_factory=lambda: os.getenv("BROWSER_HEADLESS", "true").lower() == "true")
68
+ timeout: int = field(default_factory=lambda: int(os.getenv("BROWSER_TIMEOUT", "30000")))
69
+ network_idle_timeout: int = field(default_factory=lambda: int(os.getenv("NETWORK_IDLE_TIMEOUT", "5000")))
70
+
71
+
72
+ @dataclass
73
+ class CrawlSettings:
74
+ """Website crawling configuration."""
75
+ max_pages: int = field(default_factory=lambda: int(os.getenv("MAX_PAGES", "20")))
76
+ min_pages: int = field(default_factory=lambda: int(os.getenv("MIN_PAGES", "10")))
77
+ crawl_delay_ms: int = field(default_factory=lambda: int(os.getenv("CRAWL_DELAY_MS", "1000")))
78
+ max_concurrent: int = field(default_factory=lambda: int(os.getenv("MAX_CONCURRENT_CRAWLS", "3")))
79
+ respect_robots_txt: bool = field(default_factory=lambda: os.getenv("RESPECT_ROBOTS_TXT", "true").lower() == "true")
80
+
81
+
82
+ @dataclass
83
+ class ViewportSettings:
84
+ """Viewport configuration for extraction."""
85
+ desktop_width: int = 1440
86
+ desktop_height: int = 900
87
+ mobile_width: int = 375
88
+ mobile_height: int = 812
89
+
90
+
91
+ @dataclass
92
+ class StorageSettings:
93
+ """Persistent storage configuration."""
94
+ storage_path: str = field(default_factory=lambda: os.getenv("STORAGE_PATH", "/data"))
95
+ enable_persistence: bool = field(default_factory=lambda: os.getenv("ENABLE_PERSISTENCE", "true").lower() == "true")
96
+ max_versions: int = field(default_factory=lambda: int(os.getenv("MAX_VERSIONS", "10")))
97
+
98
+
99
+ @dataclass
100
+ class UISettings:
101
+ """UI configuration."""
102
+ server_port: int = field(default_factory=lambda: int(os.getenv("SERVER_PORT", "7860")))
103
+ share: bool = field(default_factory=lambda: os.getenv("SHARE", "false").lower() == "true")
104
+ theme: str = field(default_factory=lambda: os.getenv("UI_THEME", "soft"))
105
+
106
+
107
+ @dataclass
108
+ class FeatureFlags:
109
+ """Feature toggles."""
110
+ color_ramps: bool = field(default_factory=lambda: os.getenv("FEATURE_COLOR_RAMPS", "true").lower() == "true")
111
+ type_scales: bool = field(default_factory=lambda: os.getenv("FEATURE_TYPE_SCALES", "true").lower() == "true")
112
+ a11y_checks: bool = field(default_factory=lambda: os.getenv("FEATURE_A11Y_CHECKS", "true").lower() == "true")
113
+ parallel_extraction: bool = field(default_factory=lambda: os.getenv("FEATURE_PARALLEL_EXTRACTION", "true").lower() == "true")
114
+
115
+
116
+ @dataclass
117
+ class Settings:
118
+ """Main settings container."""
119
+ debug: bool = field(default_factory=lambda: os.getenv("DEBUG", "false").lower() == "true")
120
+ log_level: str = field(default_factory=lambda: os.getenv("LOG_LEVEL", "INFO"))
121
+
122
+ hf: HFSettings = field(default_factory=HFSettings)
123
+ models: ModelSettings = field(default_factory=ModelSettings)
124
+ api: APISettings = field(default_factory=APISettings)
125
+ browser: BrowserSettings = field(default_factory=BrowserSettings)
126
+ crawl: CrawlSettings = field(default_factory=CrawlSettings)
127
+ viewport: ViewportSettings = field(default_factory=ViewportSettings)
128
+ storage: StorageSettings = field(default_factory=StorageSettings)
129
+ ui: UISettings = field(default_factory=UISettings)
130
+ features: FeatureFlags = field(default_factory=FeatureFlags)
131
+
132
+ # Agent configuration loaded from YAML
133
+ agents_config: dict = field(default_factory=dict)
134
+
135
+ def __post_init__(self):
136
+ """Load agent configuration from YAML after initialization."""
137
+ self.load_agents_config()
138
+
139
+ def load_agents_config(self):
140
+ """Load agent personas and settings from YAML file."""
141
+ yaml_path = Path(__file__).parent / "agents.yaml"
142
+ if yaml_path.exists():
143
+ with open(yaml_path, "r") as f:
144
+ self.agents_config = yaml.safe_load(f)
145
+ else:
146
+ print(f"Warning: agents.yaml not found at {yaml_path}")
147
+ self.agents_config = {}
148
+
149
+ def get_agent_persona(self, agent_name: str) -> str:
150
+ """Get persona string for an agent."""
151
+ agent_key = f"agent_{agent_name}"
152
+ if agent_key in self.agents_config:
153
+ return self.agents_config[agent_key].get("persona", "")
154
+ return ""
155
+
156
+ def get_agent_config(self, agent_name: str) -> dict:
157
+ """Get full configuration for an agent."""
158
+ agent_key = f"agent_{agent_name}"
159
+ return self.agents_config.get(agent_key, {})
160
+
161
+ def get_model_for_agent(self, agent_name: str) -> str:
162
+ """Get the model ID for a specific agent."""
163
+ model_map = {
164
+ "normalizer": self.models.agent2_model,
165
+ "advisor": self.models.agent3_model,
166
+ "generator": self.models.agent4_model,
167
+ }
168
+ return model_map.get(agent_name, self.models.fallback_model)
169
+
170
+ def validate(self) -> list[str]:
171
+ """Validate settings and return list of errors."""
172
+ errors = []
173
+
174
+ if not self.hf.hf_token:
175
+ errors.append("HF_TOKEN is required for model inference")
176
+
177
+ if self.crawl.max_pages < self.crawl.min_pages:
178
+ errors.append("MAX_PAGES must be >= MIN_PAGES")
179
+
180
+ return errors
181
+
182
+
183
+ # Global settings instance
184
+ settings = Settings()
185
+
186
+
187
+ def get_settings() -> Settings:
188
+ """Get the global settings instance."""
189
+ return settings
190
+
191
+
192
+ def reload_settings() -> Settings:
193
+ """Reload settings from environment and config files."""
194
+ global settings
195
+ settings = Settings()
196
+ return settings
197
+
198
+
199
+ # Convenience functions
200
+ def is_debug() -> bool:
201
+ """Check if debug mode is enabled."""
202
+ return settings.debug
203
+
204
+
205
+ def get_hf_token() -> str:
206
+ """Get HuggingFace token."""
207
+ return settings.hf.hf_token
208
+
209
+
210
+ def get_agent_persona(agent_name: str) -> str:
211
+ """Get persona for an agent."""
212
+ return settings.get_agent_persona(agent_name)
213
+
214
+
215
+ def get_model_for_agent(agent_name: str) -> str:
216
+ """Get model ID for an agent."""
217
+ return settings.get_model_for_agent(agent_name)
core/__init__.py ADDED
@@ -0,0 +1,61 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Core utilities for Design System Extractor v2.
3
+ """
4
+
5
+ from core.token_schema import (
6
+ TokenSource,
7
+ Confidence,
8
+ Viewport,
9
+ PageType,
10
+ ColorToken,
11
+ TypographyToken,
12
+ SpacingToken,
13
+ RadiusToken,
14
+ ShadowToken,
15
+ ExtractedTokens,
16
+ NormalizedTokens,
17
+ FinalTokens,
18
+ WorkflowState,
19
+ )
20
+
21
+ from core.color_utils import (
22
+ parse_color,
23
+ normalize_hex,
24
+ get_contrast_ratio,
25
+ check_wcag_compliance,
26
+ generate_color_ramp,
27
+ generate_accessible_ramp,
28
+ categorize_color,
29
+ suggest_color_name,
30
+ )
31
+
32
+ # HF Inference is imported lazily to avoid circular imports
33
+ # Use: from core.hf_inference import get_inference_client
34
+
35
+ __all__ = [
36
+ # Enums
37
+ "TokenSource",
38
+ "Confidence",
39
+ "Viewport",
40
+ "PageType",
41
+ # Token models
42
+ "ColorToken",
43
+ "TypographyToken",
44
+ "SpacingToken",
45
+ "RadiusToken",
46
+ "ShadowToken",
47
+ # Result models
48
+ "ExtractedTokens",
49
+ "NormalizedTokens",
50
+ "FinalTokens",
51
+ "WorkflowState",
52
+ # Color utilities
53
+ "parse_color",
54
+ "normalize_hex",
55
+ "get_contrast_ratio",
56
+ "check_wcag_compliance",
57
+ "generate_color_ramp",
58
+ "generate_accessible_ramp",
59
+ "categorize_color",
60
+ "suggest_color_name",
61
+ ]
core/color_utils.py ADDED
@@ -0,0 +1,462 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Color Utilities
3
+ Design System Extractor v2
4
+
5
+ Functions for color analysis, contrast calculation, and ramp generation.
6
+ """
7
+
8
+ import re
9
+ import colorsys
10
+ from typing import Optional
11
+ from dataclasses import dataclass
12
+
13
+
14
+ # =============================================================================
15
+ # COLOR PARSING
16
+ # =============================================================================
17
+
18
+ @dataclass
19
+ class ParsedColor:
20
+ """Parsed color with multiple representations."""
21
+ hex: str
22
+ rgb: tuple[int, int, int]
23
+ hsl: tuple[float, float, float]
24
+ oklch: Optional[tuple[float, float, float]] = None
25
+
26
+
27
+ def hex_to_rgb(hex_color: str) -> tuple[int, int, int]:
28
+ """Convert hex color to RGB tuple."""
29
+ hex_color = hex_color.lstrip("#")
30
+ if len(hex_color) == 3:
31
+ hex_color = "".join([c * 2 for c in hex_color])
32
+ return tuple(int(hex_color[i:i+2], 16) for i in (0, 2, 4))
33
+
34
+
35
+ def rgb_to_hex(r: int, g: int, b: int) -> str:
36
+ """Convert RGB to hex color."""
37
+ return f"#{r:02x}{g:02x}{b:02x}"
38
+
39
+
40
+ def rgb_to_hsl(r: int, g: int, b: int) -> tuple[float, float, float]:
41
+ """Convert RGB to HSL."""
42
+ r_norm, g_norm, b_norm = r / 255, g / 255, b / 255
43
+ h, l, s = colorsys.rgb_to_hls(r_norm, g_norm, b_norm)
44
+ return (h * 360, s * 100, l * 100)
45
+
46
+
47
+ def hsl_to_rgb(h: float, s: float, l: float) -> tuple[int, int, int]:
48
+ """Convert HSL to RGB."""
49
+ h_norm, s_norm, l_norm = h / 360, s / 100, l / 100
50
+ r, g, b = colorsys.hls_to_rgb(h_norm, l_norm, s_norm)
51
+ return (int(r * 255), int(g * 255), int(b * 255))
52
+
53
+
54
+ def parse_color(color_string: str) -> Optional[ParsedColor]:
55
+ """
56
+ Parse any CSS color format to ParsedColor.
57
+
58
+ Supports:
59
+ - Hex: #fff, #ffffff
60
+ - RGB: rgb(255, 255, 255)
61
+ - RGBA: rgba(255, 255, 255, 0.5)
62
+ - HSL: hsl(0, 100%, 50%)
63
+ """
64
+ color_string = color_string.strip().lower()
65
+
66
+ # Hex format
67
+ if color_string.startswith("#"):
68
+ hex_color = color_string
69
+ if len(hex_color) == 4:
70
+ hex_color = f"#{hex_color[1]*2}{hex_color[2]*2}{hex_color[3]*2}"
71
+ try:
72
+ rgb = hex_to_rgb(hex_color)
73
+ hsl = rgb_to_hsl(*rgb)
74
+ return ParsedColor(hex=hex_color, rgb=rgb, hsl=hsl)
75
+ except ValueError:
76
+ return None
77
+
78
+ # RGB/RGBA format
79
+ rgb_match = re.match(r"rgba?\s*\(\s*(\d+)\s*,\s*(\d+)\s*,\s*(\d+)", color_string)
80
+ if rgb_match:
81
+ r, g, b = int(rgb_match.group(1)), int(rgb_match.group(2)), int(rgb_match.group(3))
82
+ hex_color = rgb_to_hex(r, g, b)
83
+ hsl = rgb_to_hsl(r, g, b)
84
+ return ParsedColor(hex=hex_color, rgb=(r, g, b), hsl=hsl)
85
+
86
+ # HSL format
87
+ hsl_match = re.match(r"hsl\s*\(\s*(\d+)\s*,\s*(\d+)%?\s*,\s*(\d+)%?", color_string)
88
+ if hsl_match:
89
+ h, s, l = float(hsl_match.group(1)), float(hsl_match.group(2)), float(hsl_match.group(3))
90
+ rgb = hsl_to_rgb(h, s, l)
91
+ hex_color = rgb_to_hex(*rgb)
92
+ return ParsedColor(hex=hex_color, rgb=rgb, hsl=(h, s, l))
93
+
94
+ return None
95
+
96
+
97
+ def normalize_hex(color: str) -> str:
98
+ """Normalize hex color to lowercase 6-digit format."""
99
+ parsed = parse_color(color)
100
+ return parsed.hex if parsed else color
101
+
102
+
103
+ # =============================================================================
104
+ # CONTRAST CALCULATIONS (WCAG)
105
+ # =============================================================================
106
+
107
+ def get_luminance(r: int, g: int, b: int) -> float:
108
+ """
109
+ Calculate relative luminance according to WCAG 2.1.
110
+
111
+ Formula: L = 0.2126 * R + 0.7152 * G + 0.0722 * B
112
+ where R, G, B are linearized values.
113
+ """
114
+ def linearize(c: int) -> float:
115
+ c_norm = c / 255
116
+ if c_norm <= 0.04045:
117
+ return c_norm / 12.92
118
+ return ((c_norm + 0.055) / 1.055) ** 2.4
119
+
120
+ return 0.2126 * linearize(r) + 0.7152 * linearize(g) + 0.0722 * linearize(b)
121
+
122
+
123
+ def get_contrast_ratio(color1: str, color2: str) -> float:
124
+ """
125
+ Calculate WCAG contrast ratio between two colors.
126
+
127
+ Returns ratio from 1:1 to 21:1
128
+ """
129
+ parsed1 = parse_color(color1)
130
+ parsed2 = parse_color(color2)
131
+
132
+ if not parsed1 or not parsed2:
133
+ return 1.0
134
+
135
+ l1 = get_luminance(*parsed1.rgb)
136
+ l2 = get_luminance(*parsed2.rgb)
137
+
138
+ lighter = max(l1, l2)
139
+ darker = min(l1, l2)
140
+
141
+ return (lighter + 0.05) / (darker + 0.05)
142
+
143
+
144
+ def check_wcag_compliance(foreground: str, background: str) -> dict:
145
+ """
146
+ Check WCAG compliance for a color pair.
147
+
148
+ Returns dict with AA and AAA compliance for normal and large text.
149
+ """
150
+ ratio = get_contrast_ratio(foreground, background)
151
+
152
+ return {
153
+ "contrast_ratio": round(ratio, 2),
154
+ "aa_normal_text": ratio >= 4.5, # AA for normal text
155
+ "aa_large_text": ratio >= 3.0, # AA for large text (18pt+ or 14pt+ bold)
156
+ "aaa_normal_text": ratio >= 7.0, # AAA for normal text
157
+ "aaa_large_text": ratio >= 4.5, # AAA for large text
158
+ }
159
+
160
+
161
+ def get_contrast_with_white(color: str) -> float:
162
+ """Get contrast ratio against white."""
163
+ return get_contrast_ratio(color, "#ffffff")
164
+
165
+
166
+ def get_contrast_with_black(color: str) -> float:
167
+ """Get contrast ratio against black."""
168
+ return get_contrast_ratio(color, "#000000")
169
+
170
+
171
+ def get_best_text_color(background: str) -> str:
172
+ """Determine whether white or black text works better on a background."""
173
+ white_contrast = get_contrast_with_white(background)
174
+ black_contrast = get_contrast_with_black(background)
175
+ return "#ffffff" if white_contrast > black_contrast else "#000000"
176
+
177
+
178
+ # =============================================================================
179
+ # COLOR SIMILARITY & DEDUPLICATION
180
+ # =============================================================================
181
+
182
+ def color_distance(color1: str, color2: str) -> float:
183
+ """
184
+ Calculate perceptual distance between two colors.
185
+
186
+ Uses simple Euclidean distance in RGB space.
187
+ For more accuracy, consider using CIE Lab Delta E.
188
+ """
189
+ parsed1 = parse_color(color1)
190
+ parsed2 = parse_color(color2)
191
+
192
+ if not parsed1 or not parsed2:
193
+ return float("inf")
194
+
195
+ r1, g1, b1 = parsed1.rgb
196
+ r2, g2, b2 = parsed2.rgb
197
+
198
+ return ((r1 - r2) ** 2 + (g1 - g2) ** 2 + (b1 - b2) ** 2) ** 0.5
199
+
200
+
201
+ def are_colors_similar(color1: str, color2: str, threshold: float = 10.0) -> bool:
202
+ """Check if two colors are perceptually similar."""
203
+ return color_distance(color1, color2) < threshold
204
+
205
+
206
+ def find_duplicate_colors(colors: list[str], threshold: float = 10.0) -> list[tuple[str, str]]:
207
+ """
208
+ Find pairs of colors that are potentially duplicates.
209
+
210
+ Returns list of (color1, color2) tuples.
211
+ """
212
+ duplicates = []
213
+ normalized = [normalize_hex(c) for c in colors]
214
+
215
+ for i, color1 in enumerate(normalized):
216
+ for color2 in normalized[i + 1:]:
217
+ if are_colors_similar(color1, color2, threshold):
218
+ duplicates.append((color1, color2))
219
+
220
+ return duplicates
221
+
222
+
223
+ def deduplicate_colors(colors: list[str], threshold: float = 10.0) -> list[str]:
224
+ """
225
+ Remove duplicate colors, keeping the first occurrence.
226
+ """
227
+ result = []
228
+ normalized = [normalize_hex(c) for c in colors]
229
+
230
+ for color in normalized:
231
+ is_duplicate = False
232
+ for existing in result:
233
+ if are_colors_similar(color, existing, threshold):
234
+ is_duplicate = True
235
+ break
236
+ if not is_duplicate:
237
+ result.append(color)
238
+
239
+ return result
240
+
241
+
242
+ # =============================================================================
243
+ # COLOR RAMP GENERATION
244
+ # =============================================================================
245
+
246
+ def generate_color_ramp(
247
+ base_color: str,
248
+ shades: list[int] = None,
249
+ method: str = "hsl"
250
+ ) -> dict[str, str]:
251
+ """
252
+ Generate a color ramp from a base color.
253
+
254
+ Args:
255
+ base_color: The base color (typically becomes 500)
256
+ shades: List of shade values (default: [50, 100, 200, 300, 400, 500, 600, 700, 800, 900])
257
+ method: "hsl" (simple) or "oklch" (perceptually uniform, not implemented)
258
+
259
+ Returns:
260
+ Dict mapping shade to hex color.
261
+ """
262
+ if shades is None:
263
+ shades = [50, 100, 200, 300, 400, 500, 600, 700, 800, 900]
264
+
265
+ parsed = parse_color(base_color)
266
+ if not parsed:
267
+ return {}
268
+
269
+ h, s, l = parsed.hsl
270
+ ramp = {}
271
+
272
+ # Base color is 500
273
+ base_shade = 500
274
+
275
+ for shade in shades:
276
+ if shade == base_shade:
277
+ ramp[str(shade)] = parsed.hex
278
+ continue
279
+
280
+ # Calculate lightness adjustment
281
+ # Lighter shades (50-400): increase lightness
282
+ # Darker shades (600-900): decrease lightness
283
+ if shade < base_shade:
284
+ # Lighter: interpolate toward white
285
+ factor = (base_shade - shade) / base_shade
286
+ new_l = l + (100 - l) * factor * 0.85
287
+ # Slightly reduce saturation for very light shades
288
+ new_s = s * (1 - factor * 0.3)
289
+ else:
290
+ # Darker: interpolate toward black
291
+ factor = (shade - base_shade) / (900 - base_shade)
292
+ new_l = l * (1 - factor * 0.85)
293
+ # Increase saturation slightly for dark shades
294
+ new_s = min(100, s * (1 + factor * 0.2))
295
+
296
+ new_rgb = hsl_to_rgb(h, new_s, new_l)
297
+ ramp[str(shade)] = rgb_to_hex(*new_rgb)
298
+
299
+ return ramp
300
+
301
+
302
+ def generate_accessible_ramp(
303
+ base_color: str,
304
+ background: str = "#ffffff"
305
+ ) -> dict[str, dict]:
306
+ """
307
+ Generate a color ramp with accessibility information.
308
+
309
+ Returns dict with color values and their contrast ratios.
310
+ """
311
+ ramp = generate_color_ramp(base_color)
312
+ result = {}
313
+
314
+ for shade, color in ramp.items():
315
+ compliance = check_wcag_compliance(color, background)
316
+ result[shade] = {
317
+ "value": color,
318
+ "contrast_ratio": compliance["contrast_ratio"],
319
+ "aa_text": compliance["aa_normal_text"],
320
+ "best_text_color": get_best_text_color(color),
321
+ }
322
+
323
+ return result
324
+
325
+
326
+ # =============================================================================
327
+ # COLOR CATEGORIZATION
328
+ # =============================================================================
329
+
330
+ def categorize_color(color: str) -> str:
331
+ """
332
+ Categorize a color by its general hue.
333
+
334
+ Returns: "red", "orange", "yellow", "green", "cyan", "blue", "purple", "pink", "neutral"
335
+ """
336
+ parsed = parse_color(color)
337
+ if not parsed:
338
+ return "unknown"
339
+
340
+ h, s, l = parsed.hsl
341
+
342
+ # Neutrals (low saturation or extreme lightness)
343
+ if s < 10 or l < 5 or l > 95:
344
+ return "neutral"
345
+
346
+ # Categorize by hue
347
+ if h < 15 or h >= 345:
348
+ return "red"
349
+ elif h < 45:
350
+ return "orange"
351
+ elif h < 70:
352
+ return "yellow"
353
+ elif h < 150:
354
+ return "green"
355
+ elif h < 190:
356
+ return "cyan"
357
+ elif h < 260:
358
+ return "blue"
359
+ elif h < 290:
360
+ return "purple"
361
+ else:
362
+ return "pink"
363
+
364
+
365
+ def suggest_color_name(color: str, role: str = None) -> str:
366
+ """
367
+ Suggest a semantic name for a color.
368
+
369
+ Args:
370
+ color: The color value
371
+ role: Optional role hint ("primary", "background", "text", etc.)
372
+
373
+ Returns suggested name like "blue-500" or "neutral-100"
374
+ """
375
+ parsed = parse_color(color)
376
+ if not parsed:
377
+ return "unknown"
378
+
379
+ category = categorize_color(color)
380
+ h, s, l = parsed.hsl
381
+
382
+ # Determine shade level based on lightness
383
+ if l >= 95:
384
+ shade = "50"
385
+ elif l >= 85:
386
+ shade = "100"
387
+ elif l >= 75:
388
+ shade = "200"
389
+ elif l >= 65:
390
+ shade = "300"
391
+ elif l >= 55:
392
+ shade = "400"
393
+ elif l >= 45:
394
+ shade = "500"
395
+ elif l >= 35:
396
+ shade = "600"
397
+ elif l >= 25:
398
+ shade = "700"
399
+ elif l >= 15:
400
+ shade = "800"
401
+ else:
402
+ shade = "900"
403
+
404
+ return f"{category}-{shade}"
405
+
406
+
407
+ def group_colors_by_category(colors: list[str]) -> dict[str, list[str]]:
408
+ """
409
+ Group colors by their category.
410
+
411
+ Returns dict mapping category to list of colors.
412
+ """
413
+ groups: dict[str, list[str]] = {}
414
+
415
+ for color in colors:
416
+ category = categorize_color(color)
417
+ if category not in groups:
418
+ groups[category] = []
419
+ groups[category].append(normalize_hex(color))
420
+
421
+ return groups
422
+
423
+
424
+ # =============================================================================
425
+ # UTILITY FUNCTIONS
426
+ # =============================================================================
427
+
428
+ def sort_colors_by_hue(colors: list[str]) -> list[str]:
429
+ """Sort colors by hue, then by lightness."""
430
+ def sort_key(color: str):
431
+ parsed = parse_color(color)
432
+ if not parsed:
433
+ return (0, 0, 0)
434
+ h, s, l = parsed.hsl
435
+ # Neutrals (low saturation) go at the end
436
+ if s < 10:
437
+ return (360 + l, s, l)
438
+ return (h, s, l)
439
+
440
+ return sorted([normalize_hex(c) for c in colors], key=sort_key)
441
+
442
+
443
+ def sort_colors_by_lightness(colors: list[str]) -> list[str]:
444
+ """Sort colors from light to dark."""
445
+ def sort_key(color: str):
446
+ parsed = parse_color(color)
447
+ return -parsed.hsl[2] if parsed else 0
448
+
449
+ return sorted([normalize_hex(c) for c in colors], key=sort_key)
450
+
451
+
452
+ def is_dark_color(color: str) -> bool:
453
+ """Check if a color is considered dark (for text on backgrounds)."""
454
+ parsed = parse_color(color)
455
+ if not parsed:
456
+ return False
457
+ return parsed.hsl[2] < 50
458
+
459
+
460
+ def is_light_color(color: str) -> bool:
461
+ """Check if a color is considered light."""
462
+ return not is_dark_color(color)
core/hf_inference.py ADDED
@@ -0,0 +1,580 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ HuggingFace Inference Client
3
+ Design System Extractor v2
4
+
5
+ Handles all LLM inference calls using HuggingFace Inference API.
6
+ Supports diverse models from different providers for specialized tasks.
7
+ """
8
+
9
+ import os
10
+ from typing import Optional, AsyncGenerator
11
+ from dataclasses import dataclass
12
+ from huggingface_hub import InferenceClient, AsyncInferenceClient
13
+
14
+ from config.settings import get_settings
15
+
16
+
17
+ @dataclass
18
+ class ModelInfo:
19
+ """Information about a model."""
20
+ model_id: str
21
+ provider: str
22
+ context_length: int
23
+ strengths: list[str]
24
+ best_for: str
25
+ tier: str # "free", "pro", "pro+"
26
+
27
+
28
+ # =============================================================================
29
+ # COMPREHENSIVE MODEL REGISTRY — Organized by Provider
30
+ # =============================================================================
31
+
32
+ AVAILABLE_MODELS = {
33
+ # =========================================================================
34
+ # META — Llama Family (Best for reasoning)
35
+ # =========================================================================
36
+ "meta-llama/Llama-3.1-405B-Instruct": ModelInfo(
37
+ model_id="meta-llama/Llama-3.1-405B-Instruct",
38
+ provider="Meta",
39
+ context_length=128000,
40
+ strengths=["Best reasoning", "Massive knowledge", "Complex analysis"],
41
+ best_for="Agent 3 (Advisor) — PREMIUM CHOICE",
42
+ tier="pro+"
43
+ ),
44
+ "meta-llama/Llama-3.1-70B-Instruct": ModelInfo(
45
+ model_id="meta-llama/Llama-3.1-70B-Instruct",
46
+ provider="Meta",
47
+ context_length=128000,
48
+ strengths=["Excellent reasoning", "Long context", "Design knowledge"],
49
+ best_for="Agent 3 (Advisor) — RECOMMENDED",
50
+ tier="pro"
51
+ ),
52
+ "meta-llama/Llama-3.1-8B-Instruct": ModelInfo(
53
+ model_id="meta-llama/Llama-3.1-8B-Instruct",
54
+ provider="Meta",
55
+ context_length=128000,
56
+ strengths=["Fast", "Good reasoning for size", "Long context"],
57
+ best_for="Budget Agent 3 fallback",
58
+ tier="free"
59
+ ),
60
+
61
+ # =========================================================================
62
+ # MISTRAL — European Excellence
63
+ # =========================================================================
64
+ "mistralai/Mixtral-8x22B-Instruct-v0.1": ModelInfo(
65
+ model_id="mistralai/Mixtral-8x22B-Instruct-v0.1",
66
+ provider="Mistral",
67
+ context_length=65536,
68
+ strengths=["Large MoE", "Strong reasoning", "Efficient"],
69
+ best_for="Agent 3 (Advisor) — Pro alternative",
70
+ tier="pro"
71
+ ),
72
+ "mistralai/Mixtral-8x7B-Instruct-v0.1": ModelInfo(
73
+ model_id="mistralai/Mixtral-8x7B-Instruct-v0.1",
74
+ provider="Mistral",
75
+ context_length=32768,
76
+ strengths=["Good MoE efficiency", "Solid reasoning"],
77
+ best_for="Agent 3 (Advisor) — Free tier option",
78
+ tier="free"
79
+ ),
80
+ "mistralai/Mistral-7B-Instruct-v0.3": ModelInfo(
81
+ model_id="mistralai/Mistral-7B-Instruct-v0.3",
82
+ provider="Mistral",
83
+ context_length=32768,
84
+ strengths=["Fast", "Good instruction following"],
85
+ best_for="General fallback",
86
+ tier="free"
87
+ ),
88
+ "mistralai/Codestral-22B-v0.1": ModelInfo(
89
+ model_id="mistralai/Codestral-22B-v0.1",
90
+ provider="Mistral",
91
+ context_length=32768,
92
+ strengths=["Code specialist", "JSON generation", "Structured output"],
93
+ best_for="Agent 4 (Generator) — RECOMMENDED",
94
+ tier="pro"
95
+ ),
96
+
97
+ # =========================================================================
98
+ # COHERE — Command R Family (Analysis & Retrieval)
99
+ # =========================================================================
100
+ "CohereForAI/c4ai-command-r-plus": ModelInfo(
101
+ model_id="CohereForAI/c4ai-command-r-plus",
102
+ provider="Cohere",
103
+ context_length=128000,
104
+ strengths=["Excellent analysis", "RAG optimized", "Long context"],
105
+ best_for="Agent 3 (Advisor) — Great for research tasks",
106
+ tier="pro"
107
+ ),
108
+ "CohereForAI/c4ai-command-r-v01": ModelInfo(
109
+ model_id="CohereForAI/c4ai-command-r-v01",
110
+ provider="Cohere",
111
+ context_length=128000,
112
+ strengths=["Good analysis", "Efficient"],
113
+ best_for="Agent 3 budget option",
114
+ tier="free"
115
+ ),
116
+
117
+ # =========================================================================
118
+ # GOOGLE — Gemma Family
119
+ # =========================================================================
120
+ "google/gemma-2-27b-it": ModelInfo(
121
+ model_id="google/gemma-2-27b-it",
122
+ provider="Google",
123
+ context_length=8192,
124
+ strengths=["Strong instruction following", "Good balance"],
125
+ best_for="Agent 2 (Normalizer) — Quality option",
126
+ tier="pro"
127
+ ),
128
+ "google/gemma-2-9b-it": ModelInfo(
129
+ model_id="google/gemma-2-9b-it",
130
+ provider="Google",
131
+ context_length=8192,
132
+ strengths=["Fast", "Good instruction following"],
133
+ best_for="Agent 2 (Normalizer) — Balanced",
134
+ tier="free"
135
+ ),
136
+
137
+ # =========================================================================
138
+ # MICROSOFT — Phi Family (Small but Mighty)
139
+ # =========================================================================
140
+ "microsoft/Phi-3.5-mini-instruct": ModelInfo(
141
+ model_id="microsoft/Phi-3.5-mini-instruct",
142
+ provider="Microsoft",
143
+ context_length=128000,
144
+ strengths=["Very fast", "Great structured output", "Long context"],
145
+ best_for="Agent 2 (Normalizer) — RECOMMENDED",
146
+ tier="free"
147
+ ),
148
+ "microsoft/Phi-3-medium-4k-instruct": ModelInfo(
149
+ model_id="microsoft/Phi-3-medium-4k-instruct",
150
+ provider="Microsoft",
151
+ context_length=4096,
152
+ strengths=["Fast", "Good for simple tasks"],
153
+ best_for="Simple naming tasks",
154
+ tier="free"
155
+ ),
156
+
157
+ # =========================================================================
158
+ # QWEN — Alibaba Family
159
+ # =========================================================================
160
+ "Qwen/Qwen2.5-72B-Instruct": ModelInfo(
161
+ model_id="Qwen/Qwen2.5-72B-Instruct",
162
+ provider="Alibaba",
163
+ context_length=32768,
164
+ strengths=["Strong reasoning", "Multilingual", "Good design knowledge"],
165
+ best_for="Agent 3 (Advisor) — Alternative",
166
+ tier="pro"
167
+ ),
168
+ "Qwen/Qwen2.5-32B-Instruct": ModelInfo(
169
+ model_id="Qwen/Qwen2.5-32B-Instruct",
170
+ provider="Alibaba",
171
+ context_length=32768,
172
+ strengths=["Good balance", "Multilingual"],
173
+ best_for="Medium-tier option",
174
+ tier="pro"
175
+ ),
176
+ "Qwen/Qwen2.5-Coder-32B-Instruct": ModelInfo(
177
+ model_id="Qwen/Qwen2.5-Coder-32B-Instruct",
178
+ provider="Alibaba",
179
+ context_length=32768,
180
+ strengths=["Code specialist", "JSON/structured output"],
181
+ best_for="Agent 4 (Generator) — Alternative",
182
+ tier="pro"
183
+ ),
184
+ "Qwen/Qwen2.5-7B-Instruct": ModelInfo(
185
+ model_id="Qwen/Qwen2.5-7B-Instruct",
186
+ provider="Alibaba",
187
+ context_length=32768,
188
+ strengths=["Fast", "Good all-rounder"],
189
+ best_for="General fallback",
190
+ tier="free"
191
+ ),
192
+
193
+ # =========================================================================
194
+ # DEEPSEEK — Code Specialists
195
+ # =========================================================================
196
+ "deepseek-ai/deepseek-coder-33b-instruct": ModelInfo(
197
+ model_id="deepseek-ai/deepseek-coder-33b-instruct",
198
+ provider="DeepSeek",
199
+ context_length=16384,
200
+ strengths=["Excellent code generation", "JSON specialist"],
201
+ best_for="Agent 4 (Generator) — Code focused",
202
+ tier="pro"
203
+ ),
204
+ "deepseek-ai/DeepSeek-V2.5": ModelInfo(
205
+ model_id="deepseek-ai/DeepSeek-V2.5",
206
+ provider="DeepSeek",
207
+ context_length=32768,
208
+ strengths=["Strong reasoning", "Good code"],
209
+ best_for="Multi-purpose",
210
+ tier="pro"
211
+ ),
212
+
213
+ # =========================================================================
214
+ # BIGCODE — StarCoder Family
215
+ # =========================================================================
216
+ "bigcode/starcoder2-15b-instruct-v0.1": ModelInfo(
217
+ model_id="bigcode/starcoder2-15b-instruct-v0.1",
218
+ provider="BigCode",
219
+ context_length=16384,
220
+ strengths=["Code generation", "Multiple languages"],
221
+ best_for="Agent 4 (Generator) — Open source code model",
222
+ tier="free"
223
+ ),
224
+ }
225
+
226
+
227
+ # =============================================================================
228
+ # RECOMMENDED CONFIGURATIONS BY TIER
229
+ # =============================================================================
230
+
231
+ MODEL_PRESETS = {
232
+ "budget": {
233
+ "name": "Budget (Free Tier)",
234
+ "description": "Best free models for each task",
235
+ "agent2": "microsoft/Phi-3.5-mini-instruct",
236
+ "agent3": "mistralai/Mixtral-8x7B-Instruct-v0.1",
237
+ "agent4": "bigcode/starcoder2-15b-instruct-v0.1",
238
+ "fallback": "mistralai/Mistral-7B-Instruct-v0.3",
239
+ },
240
+ "balanced": {
241
+ "name": "Balanced (Pro Tier)",
242
+ "description": "Good quality/cost balance",
243
+ "agent2": "google/gemma-2-9b-it",
244
+ "agent3": "meta-llama/Llama-3.1-70B-Instruct",
245
+ "agent4": "mistralai/Codestral-22B-v0.1",
246
+ "fallback": "Qwen/Qwen2.5-7B-Instruct",
247
+ },
248
+ "quality": {
249
+ "name": "Maximum Quality (Pro+)",
250
+ "description": "Best models regardless of cost",
251
+ "agent2": "google/gemma-2-27b-it",
252
+ "agent3": "meta-llama/Llama-3.1-405B-Instruct",
253
+ "agent4": "deepseek-ai/deepseek-coder-33b-instruct",
254
+ "fallback": "meta-llama/Llama-3.1-8B-Instruct",
255
+ },
256
+ "diverse": {
257
+ "name": "Diverse Providers",
258
+ "description": "One model from each major provider",
259
+ "agent2": "microsoft/Phi-3.5-mini-instruct", # Microsoft
260
+ "agent3": "CohereForAI/c4ai-command-r-plus", # Cohere
261
+ "agent4": "mistralai/Codestral-22B-v0.1", # Mistral
262
+ "fallback": "meta-llama/Llama-3.1-8B-Instruct", # Meta
263
+ },
264
+ }
265
+
266
+
267
+ # =============================================================================
268
+ # AGENT-SPECIFIC RECOMMENDATIONS
269
+ # =============================================================================
270
+
271
+ AGENT_MODEL_RECOMMENDATIONS = {
272
+ "crawler": {
273
+ "requires_llm": False,
274
+ "notes": "Pure rule-based extraction using Playwright + CSS parsing"
275
+ },
276
+ "extractor": {
277
+ "requires_llm": False,
278
+ "notes": "Pure rule-based extraction using Playwright + CSS parsing"
279
+ },
280
+ "normalizer": {
281
+ "requires_llm": True,
282
+ "task": "Token naming, duplicate detection, pattern inference",
283
+ "needs": ["Fast inference", "Good instruction following", "Structured output"],
284
+ "recommended": [
285
+ ("microsoft/Phi-3.5-mini-instruct", "BEST — Fast, great structured output"),
286
+ ("google/gemma-2-9b-it", "Good balance of speed and quality"),
287
+ ("Qwen/Qwen2.5-7B-Instruct", "Reliable all-rounder"),
288
+ ],
289
+ "temperature": 0.2,
290
+ },
291
+ "advisor": {
292
+ "requires_llm": True,
293
+ "task": "Design system analysis, best practice recommendations",
294
+ "needs": ["Strong reasoning", "Design knowledge", "Creative suggestions"],
295
+ "recommended": [
296
+ ("meta-llama/Llama-3.1-70B-Instruct", "BEST — Excellent reasoning"),
297
+ ("CohereForAI/c4ai-command-r-plus", "Great for analysis tasks"),
298
+ ("Qwen/Qwen2.5-72B-Instruct", "Strong alternative"),
299
+ ("mistralai/Mixtral-8x7B-Instruct-v0.1", "Best free option"),
300
+ ],
301
+ "temperature": 0.4,
302
+ },
303
+ "generator": {
304
+ "requires_llm": True,
305
+ "task": "Generate JSON tokens, CSS variables, structured output",
306
+ "needs": ["Code generation", "JSON formatting", "Schema adherence"],
307
+ "recommended": [
308
+ ("mistralai/Codestral-22B-v0.1", "BEST — Mistral's code model"),
309
+ ("deepseek-ai/deepseek-coder-33b-instruct", "Excellent code specialist"),
310
+ ("Qwen/Qwen2.5-Coder-32B-Instruct", "Strong code model"),
311
+ ("bigcode/starcoder2-15b-instruct-v0.1", "Best free option"),
312
+ ],
313
+ "temperature": 0.1,
314
+ },
315
+ }
316
+
317
+
318
+ # =============================================================================
319
+ # INFERENCE CLIENT
320
+ # =============================================================================
321
+
322
+ class HFInferenceClient:
323
+ """
324
+ Wrapper around HuggingFace Inference API.
325
+
326
+ Handles model selection, retries, and fallbacks.
327
+ """
328
+
329
+ def __init__(self):
330
+ self.settings = get_settings()
331
+ self.token = self.settings.hf.hf_token
332
+
333
+ if not self.token:
334
+ raise ValueError("HF_TOKEN is required for inference")
335
+
336
+ # Create clients
337
+ self.sync_client = InferenceClient(token=self.token)
338
+ self.async_client = AsyncInferenceClient(token=self.token)
339
+
340
+ def get_model_for_agent(self, agent_name: str) -> str:
341
+ """Get the appropriate model for an agent."""
342
+ return self.settings.get_model_for_agent(agent_name)
343
+
344
+ def get_temperature_for_agent(self, agent_name: str) -> float:
345
+ """Get recommended temperature for an agent."""
346
+ temps = {
347
+ "normalizer": 0.2, # Consistent naming
348
+ "advisor": 0.4, # Creative recommendations
349
+ "generator": 0.1, # Precise formatting
350
+ }
351
+ return temps.get(agent_name, 0.3)
352
+
353
+ def _build_messages(
354
+ self,
355
+ system_prompt: str,
356
+ user_message: str,
357
+ examples: list[dict] = None
358
+ ) -> list[dict]:
359
+ """Build message list for chat completion."""
360
+ messages = []
361
+
362
+ if system_prompt:
363
+ messages.append({"role": "system", "content": system_prompt})
364
+
365
+ if examples:
366
+ for example in examples:
367
+ messages.append({"role": "user", "content": example["user"]})
368
+ messages.append({"role": "assistant", "content": example["assistant"]})
369
+
370
+ messages.append({"role": "user", "content": user_message})
371
+
372
+ return messages
373
+
374
+ def complete(
375
+ self,
376
+ agent_name: str,
377
+ system_prompt: str,
378
+ user_message: str,
379
+ examples: list[dict] = None,
380
+ max_tokens: int = None,
381
+ temperature: float = None,
382
+ json_mode: bool = False,
383
+ ) -> str:
384
+ """
385
+ Synchronous completion.
386
+
387
+ Args:
388
+ agent_name: Which agent is making the call (for model selection)
389
+ system_prompt: System instructions
390
+ user_message: User input
391
+ examples: Optional few-shot examples
392
+ max_tokens: Max tokens to generate
393
+ temperature: Sampling temperature (uses agent default if not specified)
394
+ json_mode: If True, instruct model to output JSON
395
+
396
+ Returns:
397
+ Generated text
398
+ """
399
+ model = self.get_model_for_agent(agent_name)
400
+ max_tokens = max_tokens or self.settings.hf.max_new_tokens
401
+ temperature = temperature or self.get_temperature_for_agent(agent_name)
402
+
403
+ # Build messages
404
+ if json_mode:
405
+ system_prompt = f"{system_prompt}\n\nYou must respond with valid JSON only. No markdown, no explanation, just JSON."
406
+
407
+ messages = self._build_messages(system_prompt, user_message, examples)
408
+
409
+ try:
410
+ response = self.sync_client.chat_completion(
411
+ model=model,
412
+ messages=messages,
413
+ max_tokens=max_tokens,
414
+ temperature=temperature,
415
+ )
416
+ return response.choices[0].message.content
417
+
418
+ except Exception as e:
419
+ # Try fallback model
420
+ fallback = self.settings.models.fallback_model
421
+ if fallback != model:
422
+ print(f"Primary model {model} failed, trying fallback: {fallback}")
423
+ response = self.sync_client.chat_completion(
424
+ model=fallback,
425
+ messages=messages,
426
+ max_tokens=max_tokens,
427
+ temperature=temperature,
428
+ )
429
+ return response.choices[0].message.content
430
+ raise e
431
+
432
+ async def complete_async(
433
+ self,
434
+ agent_name: str,
435
+ system_prompt: str,
436
+ user_message: str,
437
+ examples: list[dict] = None,
438
+ max_tokens: int = None,
439
+ temperature: float = None,
440
+ json_mode: bool = False,
441
+ ) -> str:
442
+ """
443
+ Asynchronous completion.
444
+
445
+ Same parameters as complete().
446
+ """
447
+ model = self.get_model_for_agent(agent_name)
448
+ max_tokens = max_tokens or self.settings.hf.max_new_tokens
449
+ temperature = temperature or self.get_temperature_for_agent(agent_name)
450
+
451
+ if json_mode:
452
+ system_prompt = f"{system_prompt}\n\nYou must respond with valid JSON only. No markdown, no explanation, just JSON."
453
+
454
+ messages = self._build_messages(system_prompt, user_message, examples)
455
+
456
+ try:
457
+ response = await self.async_client.chat_completion(
458
+ model=model,
459
+ messages=messages,
460
+ max_tokens=max_tokens,
461
+ temperature=temperature,
462
+ )
463
+ return response.choices[0].message.content
464
+
465
+ except Exception as e:
466
+ fallback = self.settings.models.fallback_model
467
+ if fallback != model:
468
+ print(f"Primary model {model} failed, trying fallback: {fallback}")
469
+ response = await self.async_client.chat_completion(
470
+ model=fallback,
471
+ messages=messages,
472
+ max_tokens=max_tokens,
473
+ temperature=temperature,
474
+ )
475
+ return response.choices[0].message.content
476
+ raise e
477
+
478
+ async def stream_async(
479
+ self,
480
+ agent_name: str,
481
+ system_prompt: str,
482
+ user_message: str,
483
+ max_tokens: int = None,
484
+ temperature: float = None,
485
+ ) -> AsyncGenerator[str, None]:
486
+ """
487
+ Async streaming completion.
488
+
489
+ Yields tokens as they are generated.
490
+ """
491
+ model = self.get_model_for_agent(agent_name)
492
+ max_tokens = max_tokens or self.settings.hf.max_new_tokens
493
+ temperature = temperature or self.get_temperature_for_agent(agent_name)
494
+
495
+ messages = self._build_messages(system_prompt, user_message)
496
+
497
+ async for chunk in await self.async_client.chat_completion(
498
+ model=model,
499
+ messages=messages,
500
+ max_tokens=max_tokens,
501
+ temperature=temperature,
502
+ stream=True,
503
+ ):
504
+ if chunk.choices[0].delta.content:
505
+ yield chunk.choices[0].delta.content
506
+
507
+
508
+ # =============================================================================
509
+ # SINGLETON & CONVENIENCE FUNCTIONS
510
+ # =============================================================================
511
+
512
+ _client: Optional[HFInferenceClient] = None
513
+
514
+
515
+ def get_inference_client() -> HFInferenceClient:
516
+ """Get or create the inference client singleton."""
517
+ global _client
518
+ if _client is None:
519
+ _client = HFInferenceClient()
520
+ return _client
521
+
522
+
523
+ def complete(
524
+ agent_name: str,
525
+ system_prompt: str,
526
+ user_message: str,
527
+ **kwargs
528
+ ) -> str:
529
+ """Convenience function for sync completion."""
530
+ client = get_inference_client()
531
+ return client.complete(agent_name, system_prompt, user_message, **kwargs)
532
+
533
+
534
+ async def complete_async(
535
+ agent_name: str,
536
+ system_prompt: str,
537
+ user_message: str,
538
+ **kwargs
539
+ ) -> str:
540
+ """Convenience function for async completion."""
541
+ client = get_inference_client()
542
+ return await client.complete_async(agent_name, system_prompt, user_message, **kwargs)
543
+
544
+
545
+ def get_model_info(model_id: str) -> dict:
546
+ """Get information about a specific model."""
547
+ if model_id in AVAILABLE_MODELS:
548
+ info = AVAILABLE_MODELS[model_id]
549
+ return {
550
+ "model_id": info.model_id,
551
+ "provider": info.provider,
552
+ "context_length": info.context_length,
553
+ "strengths": info.strengths,
554
+ "best_for": info.best_for,
555
+ "tier": info.tier,
556
+ }
557
+ return {"model_id": model_id, "provider": "unknown"}
558
+
559
+
560
+ def get_models_by_provider() -> dict[str, list[str]]:
561
+ """Get all models grouped by provider."""
562
+ by_provider = {}
563
+ for model_id, info in AVAILABLE_MODELS.items():
564
+ if info.provider not in by_provider:
565
+ by_provider[info.provider] = []
566
+ by_provider[info.provider].append(model_id)
567
+ return by_provider
568
+
569
+
570
+ def get_models_by_tier(tier: str) -> list[str]:
571
+ """Get all models for a specific tier (free, pro, pro+)."""
572
+ return [
573
+ model_id for model_id, info in AVAILABLE_MODELS.items()
574
+ if info.tier == tier
575
+ ]
576
+
577
+
578
+ def get_preset_config(preset_name: str) -> dict:
579
+ """Get a preset model configuration."""
580
+ return MODEL_PRESETS.get(preset_name, MODEL_PRESETS["balanced"])
core/token_schema.py ADDED
@@ -0,0 +1,473 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Token Schema Definitions
3
+ Design System Extractor v2
4
+
5
+ Pydantic models for all token types and extraction results.
6
+ These are the core data structures used throughout the application.
7
+ """
8
+
9
+ from datetime import datetime
10
+ from enum import Enum
11
+ from typing import Optional, Any
12
+ from pydantic import BaseModel, Field, field_validator
13
+
14
+
15
+ # =============================================================================
16
+ # ENUMS
17
+ # =============================================================================
18
+
19
+ class TokenSource(str, Enum):
20
+ """Origin of a token value."""
21
+ DETECTED = "detected" # Directly found in CSS
22
+ INFERRED = "inferred" # Derived from patterns
23
+ UPGRADED = "upgraded" # User-selected improvement
24
+ MANUAL = "manual" # User manually added
25
+
26
+
27
+ class Confidence(str, Enum):
28
+ """Confidence level for extracted tokens."""
29
+ HIGH = "high" # 10+ occurrences, consistent usage
30
+ MEDIUM = "medium" # 3-9 occurrences
31
+ LOW = "low" # 1-2 occurrences or conflicting
32
+
33
+
34
+ class Viewport(str, Enum):
35
+ """Viewport type."""
36
+ DESKTOP = "desktop" # 1440px width
37
+ MOBILE = "mobile" # 375px width
38
+
39
+
40
+ class PageType(str, Enum):
41
+ """Type of page template."""
42
+ HOMEPAGE = "homepage"
43
+ LISTING = "listing"
44
+ DETAIL = "detail"
45
+ FORM = "form"
46
+ MARKETING = "marketing"
47
+ AUTH = "auth"
48
+ CHECKOUT = "checkout"
49
+ ABOUT = "about"
50
+ CONTACT = "contact"
51
+ OTHER = "other"
52
+
53
+
54
+ # =============================================================================
55
+ # BASE TOKEN MODEL
56
+ # =============================================================================
57
+
58
+ class BaseToken(BaseModel):
59
+ """Base class for all tokens."""
60
+ source: TokenSource = TokenSource.DETECTED
61
+ confidence: Confidence = Confidence.MEDIUM
62
+ frequency: int = 0
63
+ suggested_name: Optional[str] = None
64
+
65
+ # For tracking user decisions
66
+ accepted: bool = True
67
+ flagged: bool = False
68
+ notes: Optional[str] = None
69
+
70
+
71
+ # =============================================================================
72
+ # COLOR TOKENS
73
+ # =============================================================================
74
+
75
+ class ColorToken(BaseToken):
76
+ """Extracted color token."""
77
+ value: str # hex value (e.g., "#007bff")
78
+ value_rgb: Optional[str] = None # "rgb(0, 123, 255)"
79
+ value_hsl: Optional[str] = None # "hsl(211, 100%, 50%)"
80
+
81
+ # Context information
82
+ contexts: list[str] = Field(default_factory=list) # ["background", "text", "border"]
83
+ elements: list[str] = Field(default_factory=list) # ["button", "header", "link"]
84
+ css_properties: list[str] = Field(default_factory=list) # ["background-color", "color"]
85
+
86
+ # Accessibility
87
+ contrast_white: Optional[float] = None # Contrast ratio against white
88
+ contrast_black: Optional[float] = None # Contrast ratio against black
89
+ wcag_aa_large_text: bool = False
90
+ wcag_aa_small_text: bool = False
91
+ wcag_aaa_large_text: bool = False
92
+ wcag_aaa_small_text: bool = False
93
+
94
+ @field_validator("value")
95
+ @classmethod
96
+ def validate_hex(cls, v: str) -> str:
97
+ """Ensure hex color is properly formatted."""
98
+ v = v.strip().lower()
99
+ if not v.startswith("#"):
100
+ v = f"#{v}"
101
+ # Convert 3-digit hex to 6-digit
102
+ if len(v) == 4:
103
+ v = f"#{v[1]}{v[1]}{v[2]}{v[2]}{v[3]}{v[3]}"
104
+ return v
105
+
106
+
107
+ class ColorRamp(BaseModel):
108
+ """Generated color ramp with shades."""
109
+ base_color: str # Original extracted color
110
+ name: str # e.g., "primary", "neutral"
111
+ shades: dict[str, str] = Field(default_factory=dict) # {"50": "#e6f2ff", "500": "#007bff", ...}
112
+ source: TokenSource = TokenSource.UPGRADED
113
+
114
+
115
+ # =============================================================================
116
+ # TYPOGRAPHY TOKENS
117
+ # =============================================================================
118
+
119
+ class TypographyToken(BaseToken):
120
+ """Extracted typography token."""
121
+ font_family: str
122
+ font_size: str # "16px" or "1rem"
123
+ font_size_px: Optional[float] = None # Computed px value
124
+ font_weight: int = 400
125
+ line_height: str = "1.5" # "1.5" or "24px"
126
+ line_height_computed: Optional[float] = None # Computed ratio
127
+ letter_spacing: Optional[str] = None
128
+ text_transform: Optional[str] = None # "uppercase", "lowercase", etc.
129
+
130
+ # Context
131
+ elements: list[str] = Field(default_factory=list) # ["h1", "p", "button"]
132
+ css_selectors: list[str] = Field(default_factory=list) # [".heading", ".body-text"]
133
+
134
+
135
+ class TypeScale(BaseModel):
136
+ """Typography scale configuration."""
137
+ name: str # "Major Third", "Perfect Fourth"
138
+ ratio: float # 1.25, 1.333
139
+ base_size: int = 16 # px
140
+ sizes: dict[str, str] = Field(default_factory=dict) # {"xs": "12px", "sm": "14px", ...}
141
+ source: TokenSource = TokenSource.UPGRADED
142
+
143
+
144
+ class FontFamily(BaseModel):
145
+ """Font family information."""
146
+ name: str # "Inter"
147
+ fallbacks: list[str] = Field(default_factory=list) # ["system-ui", "sans-serif"]
148
+ category: str = "sans-serif" # "serif", "sans-serif", "monospace"
149
+ frequency: int = 0
150
+ usage: str = "primary" # "primary", "secondary", "accent", "monospace"
151
+
152
+
153
+ # =============================================================================
154
+ # SPACING TOKENS
155
+ # =============================================================================
156
+
157
+ class SpacingToken(BaseToken):
158
+ """Extracted spacing token."""
159
+ value: str # "16px"
160
+ value_px: int # 16
161
+
162
+ # Context
163
+ contexts: list[str] = Field(default_factory=list) # ["margin", "padding", "gap"]
164
+ properties: list[str] = Field(default_factory=list) # ["margin-top", "padding-left"]
165
+
166
+ # Analysis
167
+ fits_base_4: bool = False # Divisible by 4
168
+ fits_base_8: bool = False # Divisible by 8
169
+ is_outlier: bool = False # Doesn't fit common patterns
170
+
171
+
172
+ class SpacingScale(BaseModel):
173
+ """Spacing scale configuration."""
174
+ name: str # "8px base"
175
+ base: int # 8
176
+ scale: list[int] = Field(default_factory=list) # [4, 8, 16, 24, 32, 48, 64]
177
+ names: dict[int, str] = Field(default_factory=dict) # {4: "xs", 8: "sm", 16: "md"}
178
+ source: TokenSource = TokenSource.UPGRADED
179
+
180
+
181
+ # =============================================================================
182
+ # BORDER RADIUS TOKENS
183
+ # =============================================================================
184
+
185
+ class RadiusToken(BaseToken):
186
+ """Extracted border radius token."""
187
+ value: str # "8px" or "50%"
188
+ value_px: Optional[int] = None # If px value
189
+
190
+ # Context
191
+ elements: list[str] = Field(default_factory=list) # ["button", "card", "input"]
192
+
193
+ # Analysis
194
+ fits_base_4: bool = False
195
+ fits_base_8: bool = False
196
+
197
+
198
+ # =============================================================================
199
+ # SHADOW TOKENS
200
+ # =============================================================================
201
+
202
+ class ShadowToken(BaseToken):
203
+ """Extracted box shadow token."""
204
+ value: str # Full CSS shadow value
205
+
206
+ # Parsed components
207
+ offset_x: Optional[str] = None
208
+ offset_y: Optional[str] = None
209
+ blur: Optional[str] = None
210
+ spread: Optional[str] = None
211
+ color: Optional[str] = None
212
+ inset: bool = False
213
+
214
+ # Context
215
+ elements: list[str] = Field(default_factory=list)
216
+
217
+
218
+ # =============================================================================
219
+ # PAGE & CRAWL MODELS
220
+ # =============================================================================
221
+
222
+ class DiscoveredPage(BaseModel):
223
+ """A page discovered during crawling."""
224
+ url: str
225
+ title: Optional[str] = None
226
+ page_type: PageType = PageType.OTHER
227
+ depth: int = 0 # Distance from homepage
228
+ selected: bool = True # User can deselect pages
229
+
230
+ # Crawl status
231
+ crawled: bool = False
232
+ error: Optional[str] = None
233
+
234
+
235
+ class CrawlResult(BaseModel):
236
+ """Result of crawling a single page."""
237
+ url: str
238
+ viewport: Viewport
239
+ success: bool
240
+
241
+ # Timing
242
+ started_at: datetime
243
+ completed_at: Optional[datetime] = None
244
+ duration_ms: Optional[int] = None
245
+
246
+ # Results
247
+ colors_found: int = 0
248
+ typography_found: int = 0
249
+ spacing_found: int = 0
250
+
251
+ # Errors
252
+ error: Optional[str] = None
253
+ warnings: list[str] = Field(default_factory=list)
254
+
255
+
256
+ # =============================================================================
257
+ # EXTRACTION RESULT
258
+ # =============================================================================
259
+
260
+ class ExtractedTokens(BaseModel):
261
+ """Complete extraction result for one viewport."""
262
+ viewport: Viewport
263
+ source_url: str
264
+ pages_crawled: list[str] = Field(default_factory=list)
265
+
266
+ # Extracted tokens
267
+ colors: list[ColorToken] = Field(default_factory=list)
268
+ typography: list[TypographyToken] = Field(default_factory=list)
269
+ spacing: list[SpacingToken] = Field(default_factory=list)
270
+ radius: list[RadiusToken] = Field(default_factory=list)
271
+ shadows: list[ShadowToken] = Field(default_factory=list)
272
+
273
+ # Detected patterns
274
+ font_families: list[FontFamily] = Field(default_factory=list)
275
+ base_font_size: Optional[str] = None
276
+ spacing_base: Optional[int] = None # Detected: 4 or 8
277
+ naming_convention: Optional[str] = None # "bem", "utility", "none"
278
+
279
+ # Metadata
280
+ extraction_timestamp: datetime = Field(default_factory=datetime.now)
281
+ extraction_duration_ms: Optional[int] = None
282
+
283
+ # Quality indicators
284
+ total_elements_analyzed: int = 0
285
+ unique_colors: int = 0
286
+ unique_font_sizes: int = 0
287
+ unique_spacing_values: int = 0
288
+
289
+ # Issues
290
+ errors: list[str] = Field(default_factory=list)
291
+ warnings: list[str] = Field(default_factory=list)
292
+
293
+ def summary(self) -> dict:
294
+ """Get extraction summary."""
295
+ return {
296
+ "viewport": self.viewport.value,
297
+ "pages_crawled": len(self.pages_crawled),
298
+ "colors": len(self.colors),
299
+ "typography": len(self.typography),
300
+ "spacing": len(self.spacing),
301
+ "radius": len(self.radius),
302
+ "shadows": len(self.shadows),
303
+ "font_families": len(self.font_families),
304
+ "errors": len(self.errors),
305
+ "warnings": len(self.warnings),
306
+ }
307
+
308
+
309
+ # =============================================================================
310
+ # NORMALIZED TOKENS (Agent 2 Output)
311
+ # =============================================================================
312
+
313
+ class NormalizedTokens(BaseModel):
314
+ """Normalized and structured tokens from Agent 2."""
315
+ viewport: Viewport
316
+ source_url: str
317
+
318
+ # Normalized tokens with suggested names
319
+ colors: dict[str, ColorToken] = Field(default_factory=dict) # {"primary-500": ColorToken, ...}
320
+ typography: dict[str, TypographyToken] = Field(default_factory=dict)
321
+ spacing: dict[str, SpacingToken] = Field(default_factory=dict)
322
+ radius: dict[str, RadiusToken] = Field(default_factory=dict)
323
+ shadows: dict[str, ShadowToken] = Field(default_factory=dict)
324
+
325
+ # Detected info
326
+ font_families: list[FontFamily] = Field(default_factory=list)
327
+ detected_spacing_base: Optional[int] = None
328
+ detected_naming_convention: Optional[str] = None
329
+
330
+ # Duplicates & conflicts
331
+ duplicate_colors: list[tuple[str, str]] = Field(default_factory=list) # [("#1a1a1a", "#1b1b1b"), ...]
332
+ conflicting_tokens: list[str] = Field(default_factory=list)
333
+
334
+ # Metadata
335
+ normalized_at: datetime = Field(default_factory=datetime.now)
336
+
337
+
338
+ # =============================================================================
339
+ # UPGRADE OPTIONS (Agent 3 Output)
340
+ # =============================================================================
341
+
342
+ class UpgradeOption(BaseModel):
343
+ """A single upgrade option."""
344
+ id: str
345
+ name: str
346
+ description: str
347
+ category: str # "typography", "spacing", "colors", "naming"
348
+
349
+ # The actual values
350
+ values: dict[str, Any] = Field(default_factory=dict)
351
+
352
+ # Metadata
353
+ pros: list[str] = Field(default_factory=list)
354
+ cons: list[str] = Field(default_factory=list)
355
+ effort: str = "low" # "low", "medium", "high"
356
+ recommended: bool = False
357
+
358
+ # Selection state
359
+ selected: bool = False
360
+
361
+
362
+ class UpgradeRecommendations(BaseModel):
363
+ """All upgrade recommendations from Agent 3."""
364
+
365
+ # Options by category
366
+ typography_scales: list[UpgradeOption] = Field(default_factory=list)
367
+ spacing_systems: list[UpgradeOption] = Field(default_factory=list)
368
+ color_ramps: list[UpgradeOption] = Field(default_factory=list)
369
+ naming_conventions: list[UpgradeOption] = Field(default_factory=list)
370
+
371
+ # Accessibility fixes
372
+ accessibility_issues: list[dict] = Field(default_factory=list)
373
+ accessibility_fixes: list[UpgradeOption] = Field(default_factory=list)
374
+
375
+ # Metadata
376
+ generated_at: datetime = Field(default_factory=datetime.now)
377
+
378
+
379
+ # =============================================================================
380
+ # FINAL OUTPUT (Agent 4 Output)
381
+ # =============================================================================
382
+
383
+ class TokenMetadata(BaseModel):
384
+ """Metadata for exported tokens."""
385
+ source_url: str
386
+ extracted_at: datetime
387
+ version: str
388
+ viewport: Viewport
389
+ generator: str = "Design System Extractor v2"
390
+
391
+
392
+ class FinalTokens(BaseModel):
393
+ """Final exported token set."""
394
+ metadata: TokenMetadata
395
+
396
+ # Token collections
397
+ colors: dict[str, dict] = Field(default_factory=dict)
398
+ typography: dict[str, dict] = Field(default_factory=dict)
399
+ spacing: dict[str, dict] = Field(default_factory=dict)
400
+ radius: dict[str, dict] = Field(default_factory=dict)
401
+ shadows: dict[str, dict] = Field(default_factory=dict)
402
+
403
+ def to_tokens_studio_format(self) -> dict:
404
+ """Convert to Tokens Studio compatible format."""
405
+ return {
406
+ "$metadata": {
407
+ "source": self.metadata.source_url,
408
+ "version": self.metadata.version,
409
+ },
410
+ "color": self.colors,
411
+ "typography": self.typography,
412
+ "spacing": self.spacing,
413
+ "borderRadius": self.radius,
414
+ "boxShadow": self.shadows,
415
+ }
416
+
417
+ def to_css_variables(self) -> str:
418
+ """Convert to CSS custom properties."""
419
+ lines = [":root {"]
420
+
421
+ for name, data in self.colors.items():
422
+ value = data.get("value", data) if isinstance(data, dict) else data
423
+ lines.append(f" --color-{name}: {value};")
424
+
425
+ for name, data in self.spacing.items():
426
+ value = data.get("value", data) if isinstance(data, dict) else data
427
+ lines.append(f" --space-{name}: {value};")
428
+
429
+ lines.append("}")
430
+ return "\n".join(lines)
431
+
432
+
433
+ # =============================================================================
434
+ # LANGGRAPH STATE
435
+ # =============================================================================
436
+
437
+ class WorkflowState(BaseModel):
438
+ """LangGraph workflow state."""
439
+
440
+ # Input
441
+ base_url: str
442
+
443
+ # Discovery phase
444
+ discovered_pages: list[DiscoveredPage] = Field(default_factory=list)
445
+ confirmed_pages: list[str] = Field(default_factory=list)
446
+
447
+ # Extraction phase
448
+ desktop_tokens: Optional[ExtractedTokens] = None
449
+ mobile_tokens: Optional[ExtractedTokens] = None
450
+
451
+ # Normalization phase
452
+ desktop_normalized: Optional[NormalizedTokens] = None
453
+ mobile_normalized: Optional[NormalizedTokens] = None
454
+
455
+ # Upgrade phase
456
+ upgrade_recommendations: Optional[UpgradeRecommendations] = None
457
+ selected_upgrades: dict[str, str] = Field(default_factory=dict) # {"typography_scale": "major_third", ...}
458
+
459
+ # Generation phase
460
+ desktop_final: Optional[FinalTokens] = None
461
+ mobile_final: Optional[FinalTokens] = None
462
+
463
+ # Workflow status
464
+ current_stage: str = "init" # "init", "discover", "confirm", "extract", "normalize", "review", "upgrade", "generate", "export"
465
+ errors: list[str] = Field(default_factory=list)
466
+ warnings: list[str] = Field(default_factory=list)
467
+
468
+ # Timestamps
469
+ started_at: Optional[datetime] = None
470
+ completed_at: Optional[datetime] = None
471
+
472
+ class Config:
473
+ arbitrary_types_allowed = True
docs/CONTEXT.md ADDED
@@ -0,0 +1,402 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Design System Extractor v2 — Master Context File
2
+
3
+ > **Upload this file to refresh Claude's context when continuing work on this project.**
4
+
5
+ ---
6
+
7
+ ## 🎯 Project Goal
8
+
9
+ Build a **semi-automated, human-in-the-loop agentic system** that:
10
+ 1. Reverse-engineers a design system from a live website
11
+ 2. Reconstructs and upgrades it into a modern, scalable design system
12
+ 3. Outputs production-ready JSON tokens
13
+
14
+ **Philosophy:** This is a design-aware co-pilot, NOT a magic button. Humans decide, agents propose.
15
+
16
+ ---
17
+
18
+ ## 🏗️ Architecture Overview
19
+
20
+ ```
21
+ ┌─────────────────────────────────────────────────────────────────────────────┐
22
+ │ TECH STACK │
23
+ ├─────────────────────────────────────────────────────────────────────────────┤
24
+ │ Frontend: Gradio (long-scroll, sectioned UI with live preview) │
25
+ │ Orchestration: LangGraph (agent state management & workflow) │
26
+ │ Models: HuggingFace Inference API (see model assignments below) │
27
+ │ Hosting: Hugging Face Spaces │
28
+ │ Storage: HF Spaces persistent storage │
29
+ │ Output: Platform-agnostic JSON tokens │
30
+ └─────────────────────────────────────────────────────────────────────────────┘
31
+ ```
32
+
33
+ ---
34
+
35
+ ## 🧠 Model Assignments
36
+
37
+ | Agent | Role | Model | Provider | Why |
38
+ |-------|------|-------|----------|-----|
39
+ | **Agent 1** | Crawler & Extractor | None (Rule-based) | — | Pure CSS extraction, no LLM needed |
40
+ | **Agent 2** | Normalizer | `microsoft/Phi-3.5-mini-instruct` | Microsoft | Fast, great structured output |
41
+ | **Agent 3** | Advisor | `meta-llama/Llama-3.1-70B-Instruct` | Meta | Excellent reasoning, design knowledge |
42
+ | **Agent 4** | Generator | `mistralai/Codestral-22B-v0.1` | Mistral | Code specialist, JSON formatting |
43
+
44
+ ### Model Presets
45
+
46
+ | Preset | Agent 2 | Agent 3 | Agent 4 |
47
+ |--------|---------|---------|---------|
48
+ | **Budget (Free)** | Phi-3.5-mini | Mixtral-8x7B | StarCoder2-15B |
49
+ | **Balanced (Pro)** | Gemma-2-9b | Llama-3.1-70B | Codestral-22B |
50
+ | **Quality (Pro+)** | Gemma-2-27b | Llama-3.1-405B | DeepSeek-Coder-33B |
51
+ | **Diverse** | Microsoft Phi | Cohere Command R+ | Mistral Codestral |
52
+
53
+ ### Available Providers
54
+ - **Meta**: Llama 3.1 family (8B, 70B, 405B)
55
+ - **Mistral**: Mixtral, Mistral, Codestral
56
+ - **Cohere**: Command R, Command R+
57
+ - **Google**: Gemma 2 family
58
+ - **Microsoft**: Phi 3.5 family
59
+ - **Alibaba**: Qwen 2.5 family
60
+ - **DeepSeek**: DeepSeek Coder, V2.5
61
+ - **BigCode**: StarCoder2
62
+
63
+ ---
64
+
65
+ ## 🤖 Agent Personas (4 Agents)
66
+
67
+ ### Agent 1: Website Crawler & Extractor
68
+ - **Persona:** Meticulous Design Archaeologist
69
+ - **Tool:** Playwright
70
+ - **Job:**
71
+ - Auto-discover 10+ pages from base URL
72
+ - Crawl Desktop (1440px) + Mobile (375px) separately
73
+ - Scroll to bottom + wait for network idle
74
+ - Extract: colors, typography, spacing, radius, shadows
75
+ - **Output:** Raw tokens with frequency, context, confidence
76
+
77
+ ### Agent 2: Token Normalizer & Structurer
78
+ - **Persona:** Design System Librarian
79
+ - **Job:**
80
+ - Clean noisy extraction, dedupe
81
+ - Infer naming patterns
82
+ - Tag tokens as: `detected` | `inferred` | `low-confidence`
83
+ - **Output:** Structured token sets with metadata
84
+
85
+ ### Agent 3: Design System Best Practices Advisor
86
+ - **Persona:** Senior Staff Design Systems Architect
87
+ - **Job:**
88
+ - Research modern DS patterns (Material, Polaris, Carbon, etc.)
89
+ - Propose upgrade OPTIONS (not decisions)
90
+ - Suggest: type scales (3 options), spacing (8px), color ramps (AA compliant), naming conventions
91
+ - **Output:** Option sets with rationale
92
+
93
+ ### Agent 4: Plugin & JSON Generator
94
+ - **Persona:** Automation Engineer
95
+ - **Job:**
96
+ - Convert finalized tokens to Figma-compatible JSON
97
+ - Generate: typography, color (with tints/shades), spacing variables
98
+ - Maintain Desktop + Mobile + version metadata
99
+ - **Output:** Production-ready JSON
100
+
101
+ ---
102
+
103
+ ## 🖥️ UI Stages (3 Stages)
104
+
105
+ ### Stage 1: Extraction Review
106
+ - **Purpose:** Trust building
107
+ - **Shows:** Token tables, color swatches, font previews, confidence indicators
108
+ - **Human Actions:** Accept/reject tokens, flag anomalies, toggle Desktop↔Mobile
109
+
110
+ ### Stage 2: Upgrade Playground (MOST IMPORTANT)
111
+ - **Purpose:** Decision-making through live visuals
112
+ - **Shows:** Side-by-side option selector + live preview
113
+ - **Human Actions:** Select type scale A/B/C, spacing system, color ramps — preview updates instantly
114
+
115
+ ### Stage 3: Final Review & Export
116
+ - **Purpose:** Confidence before export
117
+ - **Shows:** Token preview, JSON tree, diff view (original vs final)
118
+ - **Human Actions:** Download JSON, save version, label version
119
+
120
+ ---
121
+
122
+ ## 📁 Project Structure
123
+
124
+ ```
125
+ design-system-extractor/
126
+ ├── app.py # Gradio main entry point
127
+ ├── requirements.txt
128
+ ├── README.md
129
+
130
+ ├── config/
131
+ │ ├── .env.example # Environment variables template
132
+ │ ├── agents.yaml # Agent personas & configurations
133
+ │ └── settings.py # Application settings
134
+
135
+ ├── agents/
136
+ │ ├── __init__.py
137
+ │ ├── state.py # LangGraph state definitions
138
+ │ ├── graph.py # LangGraph workflow orchestration
139
+ │ ├── crawler.py # Agent 1: Website crawler
140
+ │ ├── extractor.py # Agent 1: Token extraction
141
+ │ ├── normalizer.py # Agent 2: Token normalization
142
+ │ ├── advisor.py # Agent 3: Best practices
143
+ │ └── generator.py # Agent 4: JSON generator
144
+
145
+ ├── core/
146
+ │ ├── __init__.py
147
+ │ ├── browser.py # Playwright browser management
148
+ │ ├── css_parser.py # CSS/computed style extraction
149
+ │ ├── color_utils.py # Color analysis, contrast, ramps
150
+ │ ├── typography_utils.py # Type scale detection & generation
151
+ │ ├── spacing_utils.py # Spacing pattern detection
152
+ │ └── token_schema.py # Token data structures (Pydantic)
153
+
154
+ ├── ui/
155
+ │ ├── __init__.py
156
+ │ ├── components.py # Reusable Gradio components
157
+ │ ├── stage1_extraction.py # Stage 1 UI
158
+ │ ├── stage2_upgrade.py # Stage 2 UI
159
+ │ ├── stage3_export.py # Stage 3 UI
160
+ │ └── preview_generator.py # HTML preview generation
161
+
162
+ ├── templates/
163
+ │ ├── preview.html # Live preview base template
164
+ │ └── specimen.html # Design system specimen template
165
+
166
+ ├── storage/
167
+ │ └── persistence.py # HF Spaces storage management
168
+
169
+ ├── tests/
170
+ │ ├── test_crawler.py
171
+ │ ├── test_extractor.py
172
+ │ └── test_normalizer.py
173
+
174
+ └── docs/
175
+ ├── CONTEXT.md # THIS FILE - upload for context refresh
176
+ └── API.md # API documentation
177
+ ```
178
+
179
+ ---
180
+
181
+ ## 🔧 Key Technical Decisions
182
+
183
+ | Decision | Choice | Rationale |
184
+ |----------|--------|-----------|
185
+ | Viewports | Fixed 1440px + 375px | Simplicity, covers main use cases |
186
+ | Scrolling | Bottom + network idle | Captures lazy-loaded content |
187
+ | Infinite scroll | Skip | Avoid complexity |
188
+ | Modals | Manual trigger | User decides what to capture |
189
+ | Color ramps | 5-10 shades, AA compliant | Industry standard |
190
+ | Type scales | 3 options (1.25, 1.333, 1.414) | User selects |
191
+ | Spacing | 8px base system | Modern standard |
192
+ | ML models | Minimal, rule-based preferred | Simplicity, reliability |
193
+ | Versioning | HF Spaces persistent storage | Built-in, free |
194
+ | Preview | Gradio + iframe (best for dynamic) | Smooth updates |
195
+
196
+ ---
197
+
198
+ ## 📊 Token Schema (Core Data Structures)
199
+
200
+ ```python
201
+ class TokenSource(Enum):
202
+ DETECTED = "detected" # Directly found in CSS
203
+ INFERRED = "inferred" # Derived from patterns
204
+ UPGRADED = "upgraded" # User-selected improvement
205
+
206
+ class Confidence(Enum):
207
+ HIGH = "high" # 10+ occurrences
208
+ MEDIUM = "medium" # 3-9 occurrences
209
+ LOW = "low" # 1-2 occurrences
210
+
211
+ class Viewport(Enum):
212
+ DESKTOP = "desktop" # 1440px
213
+ MOBILE = "mobile" # 375px
214
+ ```
215
+
216
+ ### Token Types:
217
+ - **ColorToken:** value, frequency, contexts, elements, contrast ratios
218
+ - **TypographyToken:** family, size, weight, line-height, elements
219
+ - **SpacingToken:** value, frequency, contexts, fits_base_8
220
+ - **RadiusToken:** value, frequency, elements
221
+ - **ShadowToken:** value, frequency, elements
222
+
223
+ ---
224
+
225
+ ## 🔄 LangGraph Workflow
226
+
227
+ ```
228
+ ┌─────────────┐
229
+ │ START │
230
+ └──────┬──────┘
231
+
232
+
233
+ ┌─────────────┐
234
+ │ URL Input │
235
+ └──────┬──────┘
236
+
237
+
238
+ ┌────────────────────────┐
239
+ │ Agent 1: Discover │
240
+ │ (find pages) │
241
+ └───────────┬────────────┘
242
+
243
+
244
+ ┌────────────────────────┐
245
+ │ HUMAN: Confirm pages │◄─── Checkpoint 1
246
+ └───────────┬────────────┘
247
+
248
+
249
+ ┌────────────────────────┐
250
+ │ Agent 1: Extract │
251
+ │ (crawl & extract) │
252
+ └───────────┬────────────┘
253
+
254
+
255
+ ┌────────────────────────┐
256
+ │ Agent 2: Normalize │
257
+ └───────────┬────────────┘
258
+
259
+
260
+ ┌────────────────────────┐
261
+ │ HUMAN: Review tokens │◄─── Checkpoint 2 (Stage 1 UI)
262
+ └───────────┬────────────┘
263
+
264
+ ┌───────────────┴───────────────┐
265
+ │ │
266
+ ▼ ▼
267
+ ┌──────────────────┐ ┌──────────────────┐
268
+ │ Agent 3: Advise │ │ (parallel) │
269
+ │ (best practices) │ │ │
270
+ └────────┬─────────┘ └──────────────────┘
271
+
272
+
273
+ ┌────────────────────────┐
274
+ │ HUMAN: Select options │◄─── Checkpoint 3 (Stage 2 UI)
275
+ └───────────┬────────────┘
276
+
277
+
278
+ ┌────────────────────────┐
279
+ │ Agent 4: Generate │
280
+ │ (final JSON) │
281
+ └───────────┬────────────┘
282
+
283
+
284
+ ┌────────────────────────┐
285
+ │ HUMAN: Export │◄─── Checkpoint 4 (Stage 3 UI)
286
+ └───────────┬────────────┘
287
+
288
+
289
+ ┌─────────┐
290
+ │ END │
291
+ └─────────┘
292
+ ```
293
+
294
+ ---
295
+
296
+ ## 🚦 Human-in-the-Loop Rules
297
+
298
+ 1. **No irreversible automation**
299
+ 2. **Agents propose → Humans decide**
300
+ 3. **Every auto action must be:**
301
+ - Visible
302
+ - Reversible
303
+ - Previewed
304
+
305
+ ---
306
+
307
+ ## 📦 Output JSON Format
308
+
309
+ ```json
310
+ {
311
+ "metadata": {
312
+ "source_url": "https://example.com",
313
+ "extracted_at": "2025-01-23T10:00:00Z",
314
+ "version": "v1-recovered",
315
+ "viewport": "desktop"
316
+ },
317
+ "colors": {
318
+ "primary": {
319
+ "50": { "value": "#e6f2ff", "source": "upgraded" },
320
+ "500": { "value": "#007bff", "source": "detected" },
321
+ "900": { "value": "#001a33", "source": "upgraded" }
322
+ }
323
+ },
324
+ "typography": {
325
+ "heading-xl": {
326
+ "fontFamily": "Inter",
327
+ "fontSize": "32px",
328
+ "fontWeight": 700,
329
+ "lineHeight": "1.2",
330
+ "source": "detected"
331
+ }
332
+ },
333
+ "spacing": {
334
+ "xs": { "value": "4px", "source": "upgraded" },
335
+ "sm": { "value": "8px", "source": "detected" },
336
+ "md": { "value": "16px", "source": "detected" }
337
+ }
338
+ }
339
+ ```
340
+
341
+ ---
342
+
343
+ ## 🛠️ Implementation Phases
344
+
345
+ ### Phase 1 (Current)
346
+ - [x] Project structure
347
+ - [x] Configuration files
348
+ - [ ] Token schema (Pydantic models)
349
+ - [ ] Agent 1: Crawler
350
+ - [ ] Agent 1: Extractor
351
+ - [ ] Agent 2: Normalizer
352
+ - [ ] Stage 1 UI
353
+ - [ ] LangGraph basic workflow
354
+
355
+ ### Phase 2
356
+ - [ ] Agent 3: Advisor
357
+ - [ ] Stage 2 UI (Upgrade Playground)
358
+ - [ ] Live preview system
359
+
360
+ ### Phase 3
361
+ - [ ] Agent 4: Generator
362
+ - [ ] Stage 3 UI
363
+ - [ ] Export functionality
364
+
365
+ ### Phase 4
366
+ - [ ] Full LangGraph orchestration
367
+ - [ ] HF Spaces deployment
368
+ - [ ] Persistent storage
369
+
370
+ ---
371
+
372
+ ## 🔑 Environment Variables
373
+
374
+ ```env
375
+ # Required
376
+ HF_TOKEN=your_huggingface_token
377
+
378
+ # Model Configuration (defaults shown — diverse providers)
379
+ AGENT2_MODEL=microsoft/Phi-3.5-mini-instruct # Microsoft - Fast naming
380
+ AGENT3_MODEL=meta-llama/Llama-3.1-70B-Instruct # Meta - Strong reasoning
381
+ AGENT4_MODEL=mistralai/Codestral-22B-v0.1 # Mistral - Code/JSON
382
+
383
+ # Optional
384
+ DEBUG=true
385
+ LOG_LEVEL=INFO
386
+ ```
387
+
388
+ ---
389
+
390
+ ## 📝 Notes for Claude
391
+
392
+ When continuing this project:
393
+ 1. **Check current phase** in Implementation Phases section
394
+ 2. **Review agent personas** in agents.yaml for consistent behavior
395
+ 3. **Follow token schema** defined in core/token_schema.py
396
+ 4. **Maintain LangGraph state** consistency across agents
397
+ 5. **Use Gradio components** from ui/components.py for consistency
398
+ 6. **Test with** real websites before deployment
399
+
400
+ ---
401
+
402
+ *Last updated: 2025-01-23*
requirements.txt ADDED
@@ -0,0 +1,71 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # =============================================================================
2
+ # Design System Extractor v2 — Dependencies
3
+ # =============================================================================
4
+
5
+ # -----------------------------------------------------------------------------
6
+ # Core Framework
7
+ # -----------------------------------------------------------------------------
8
+ gradio>=4.44.0
9
+ langgraph>=0.2.0
10
+ langchain>=0.3.0
11
+
12
+ # -----------------------------------------------------------------------------
13
+ # HuggingFace (Primary LLM Provider)
14
+ # -----------------------------------------------------------------------------
15
+ huggingface-hub>=0.25.0
16
+ transformers>=4.40.0
17
+
18
+ # -----------------------------------------------------------------------------
19
+ # Data Validation & Configuration
20
+ # -----------------------------------------------------------------------------
21
+ pydantic>=2.0.0
22
+ pydantic-settings>=2.0.0
23
+ python-dotenv>=1.0.0
24
+ PyYAML>=6.0.0
25
+
26
+ # -----------------------------------------------------------------------------
27
+ # Web Crawling & Browser Automation
28
+ # -----------------------------------------------------------------------------
29
+ playwright>=1.40.0
30
+ beautifulsoup4>=4.12.0
31
+ lxml>=5.0.0
32
+ httpx>=0.25.0
33
+
34
+ # -----------------------------------------------------------------------------
35
+ # CSS & Color Processing
36
+ # -----------------------------------------------------------------------------
37
+ cssutils>=2.9.0
38
+ colormath>=3.0.0
39
+ colour>=0.1.5
40
+
41
+ # -----------------------------------------------------------------------------
42
+ # Data Processing
43
+ # -----------------------------------------------------------------------------
44
+ numpy>=1.24.0
45
+ pandas>=2.0.0
46
+
47
+ # -----------------------------------------------------------------------------
48
+ # Async Support
49
+ # -----------------------------------------------------------------------------
50
+ aiofiles>=23.0.0
51
+
52
+ # -----------------------------------------------------------------------------
53
+ # Utilities
54
+ # -----------------------------------------------------------------------------
55
+ rich>=13.0.0
56
+ tqdm>=4.66.0
57
+ python-slugify>=8.0.0
58
+
59
+ # -----------------------------------------------------------------------------
60
+ # Testing (development only)
61
+ # -----------------------------------------------------------------------------
62
+ pytest>=7.4.0
63
+ pytest-asyncio>=0.21.0
64
+ pytest-cov>=4.1.0
65
+
66
+ # -----------------------------------------------------------------------------
67
+ # Type Checking (development only)
68
+ # -----------------------------------------------------------------------------
69
+ mypy>=1.5.0
70
+ types-PyYAML>=6.0.0
71
+ types-beautifulsoup4>=4.12.0
storage/__init__.py ADDED
File without changes
tests/__init__.py ADDED
File without changes
ui/__init__.py ADDED
File without changes