riazmo commited on
Commit
d2d4fa3
·
verified ·
1 Parent(s): 2627211

Delete docs/CONTEXT.md

Browse files
Files changed (1) hide show
  1. docs/CONTEXT.md +0 -402
docs/CONTEXT.md DELETED
@@ -1,402 +0,0 @@
1
- # Design System Extractor v2 — Master Context File
2
-
3
- > **Upload this file to refresh Claude's context when continuing work on this project.**
4
-
5
- ---
6
-
7
- ## 🎯 Project Goal
8
-
9
- Build a **semi-automated, human-in-the-loop agentic system** that:
10
- 1. Reverse-engineers a design system from a live website
11
- 2. Reconstructs and upgrades it into a modern, scalable design system
12
- 3. Outputs production-ready JSON tokens
13
-
14
- **Philosophy:** This is a design-aware co-pilot, NOT a magic button. Humans decide, agents propose.
15
-
16
- ---
17
-
18
- ## 🏗️ Architecture Overview
19
-
20
- ```
21
- ┌─────────────────────────────────────────────────────────────────────────────┐
22
- │ TECH STACK │
23
- ├─────────────────────────────────────────────────────────────────────────────┤
24
- │ Frontend: Gradio (long-scroll, sectioned UI with live preview) │
25
- │ Orchestration: LangGraph (agent state management & workflow) │
26
- │ Models: HuggingFace Inference API (see model assignments below) │
27
- │ Hosting: Hugging Face Spaces │
28
- │ Storage: HF Spaces persistent storage │
29
- │ Output: Platform-agnostic JSON tokens │
30
- └─────────────────────────────────────────────────────────────────────────────┘
31
- ```
32
-
33
- ---
34
-
35
- ## 🧠 Model Assignments
36
-
37
- | Agent | Role | Model | Provider | Why |
38
- |-------|------|-------|----------|-----|
39
- | **Agent 1** | Crawler & Extractor | None (Rule-based) | — | Pure CSS extraction, no LLM needed |
40
- | **Agent 2** | Normalizer | `microsoft/Phi-3.5-mini-instruct` | Microsoft | Fast, great structured output |
41
- | **Agent 3** | Advisor | `meta-llama/Llama-3.1-70B-Instruct` | Meta | Excellent reasoning, design knowledge |
42
- | **Agent 4** | Generator | `mistralai/Codestral-22B-v0.1` | Mistral | Code specialist, JSON formatting |
43
-
44
- ### Model Presets
45
-
46
- | Preset | Agent 2 | Agent 3 | Agent 4 |
47
- |--------|---------|---------|---------|
48
- | **Budget (Free)** | Phi-3.5-mini | Mixtral-8x7B | StarCoder2-15B |
49
- | **Balanced (Pro)** | Gemma-2-9b | Llama-3.1-70B | Codestral-22B |
50
- | **Quality (Pro+)** | Gemma-2-27b | Llama-3.1-405B | DeepSeek-Coder-33B |
51
- | **Diverse** | Microsoft Phi | Cohere Command R+ | Mistral Codestral |
52
-
53
- ### Available Providers
54
- - **Meta**: Llama 3.1 family (8B, 70B, 405B)
55
- - **Mistral**: Mixtral, Mistral, Codestral
56
- - **Cohere**: Command R, Command R+
57
- - **Google**: Gemma 2 family
58
- - **Microsoft**: Phi 3.5 family
59
- - **Alibaba**: Qwen 2.5 family
60
- - **DeepSeek**: DeepSeek Coder, V2.5
61
- - **BigCode**: StarCoder2
62
-
63
- ---
64
-
65
- ## 🤖 Agent Personas (4 Agents)
66
-
67
- ### Agent 1: Website Crawler & Extractor
68
- - **Persona:** Meticulous Design Archaeologist
69
- - **Tool:** Playwright
70
- - **Job:**
71
- - Auto-discover 10+ pages from base URL
72
- - Crawl Desktop (1440px) + Mobile (375px) separately
73
- - Scroll to bottom + wait for network idle
74
- - Extract: colors, typography, spacing, radius, shadows
75
- - **Output:** Raw tokens with frequency, context, confidence
76
-
77
- ### Agent 2: Token Normalizer & Structurer
78
- - **Persona:** Design System Librarian
79
- - **Job:**
80
- - Clean noisy extraction, dedupe
81
- - Infer naming patterns
82
- - Tag tokens as: `detected` | `inferred` | `low-confidence`
83
- - **Output:** Structured token sets with metadata
84
-
85
- ### Agent 3: Design System Best Practices Advisor
86
- - **Persona:** Senior Staff Design Systems Architect
87
- - **Job:**
88
- - Research modern DS patterns (Material, Polaris, Carbon, etc.)
89
- - Propose upgrade OPTIONS (not decisions)
90
- - Suggest: type scales (3 options), spacing (8px), color ramps (AA compliant), naming conventions
91
- - **Output:** Option sets with rationale
92
-
93
- ### Agent 4: Plugin & JSON Generator
94
- - **Persona:** Automation Engineer
95
- - **Job:**
96
- - Convert finalized tokens to Figma-compatible JSON
97
- - Generate: typography, color (with tints/shades), spacing variables
98
- - Maintain Desktop + Mobile + version metadata
99
- - **Output:** Production-ready JSON
100
-
101
- ---
102
-
103
- ## 🖥️ UI Stages (3 Stages)
104
-
105
- ### Stage 1: Extraction Review
106
- - **Purpose:** Trust building
107
- - **Shows:** Token tables, color swatches, font previews, confidence indicators
108
- - **Human Actions:** Accept/reject tokens, flag anomalies, toggle Desktop↔Mobile
109
-
110
- ### Stage 2: Upgrade Playground (MOST IMPORTANT)
111
- - **Purpose:** Decision-making through live visuals
112
- - **Shows:** Side-by-side option selector + live preview
113
- - **Human Actions:** Select type scale A/B/C, spacing system, color ramps — preview updates instantly
114
-
115
- ### Stage 3: Final Review & Export
116
- - **Purpose:** Confidence before export
117
- - **Shows:** Token preview, JSON tree, diff view (original vs final)
118
- - **Human Actions:** Download JSON, save version, label version
119
-
120
- ---
121
-
122
- ## 📁 Project Structure
123
-
124
- ```
125
- design-system-extractor/
126
- ├── app.py # Gradio main entry point
127
- ├── requirements.txt
128
- ├── README.md
129
-
130
- ├── config/
131
- │ ├── .env.example # Environment variables template
132
- │ ├── agents.yaml # Agent personas & configurations
133
- │ └── settings.py # Application settings
134
-
135
- ├── agents/
136
- │ ├── __init__.py
137
- │ ├── state.py # LangGraph state definitions
138
- │ ├── graph.py # LangGraph workflow orchestration
139
- │ ├── crawler.py # Agent 1: Website crawler
140
- │ ├── extractor.py # Agent 1: Token extraction
141
- │ ├── normalizer.py # Agent 2: Token normalization
142
- │ ├── advisor.py # Agent 3: Best practices
143
- │ └── generator.py # Agent 4: JSON generator
144
-
145
- ├── core/
146
- │ ├── __init__.py
147
- │ ├── browser.py # Playwright browser management
148
- │ ├── css_parser.py # CSS/computed style extraction
149
- │ ├── color_utils.py # Color analysis, contrast, ramps
150
- │ ├── typography_utils.py # Type scale detection & generation
151
- │ ├── spacing_utils.py # Spacing pattern detection
152
- │ └── token_schema.py # Token data structures (Pydantic)
153
-
154
- ├── ui/
155
- │ ├── __init__.py
156
- │ ├── components.py # Reusable Gradio components
157
- │ ├── stage1_extraction.py # Stage 1 UI
158
- │ ├── stage2_upgrade.py # Stage 2 UI
159
- │ ├── stage3_export.py # Stage 3 UI
160
- │ └── preview_generator.py # HTML preview generation
161
-
162
- ├── templates/
163
- │ ├── preview.html # Live preview base template
164
- │ └── specimen.html # Design system specimen template
165
-
166
- ├── storage/
167
- │ └── persistence.py # HF Spaces storage management
168
-
169
- ├── tests/
170
- │ ├── test_crawler.py
171
- │ ├── test_extractor.py
172
- │ └── test_normalizer.py
173
-
174
- └── docs/
175
- ├── CONTEXT.md # THIS FILE - upload for context refresh
176
- └── API.md # API documentation
177
- ```
178
-
179
- ---
180
-
181
- ## 🔧 Key Technical Decisions
182
-
183
- | Decision | Choice | Rationale |
184
- |----------|--------|-----------|
185
- | Viewports | Fixed 1440px + 375px | Simplicity, covers main use cases |
186
- | Scrolling | Bottom + network idle | Captures lazy-loaded content |
187
- | Infinite scroll | Skip | Avoid complexity |
188
- | Modals | Manual trigger | User decides what to capture |
189
- | Color ramps | 5-10 shades, AA compliant | Industry standard |
190
- | Type scales | 3 options (1.25, 1.333, 1.414) | User selects |
191
- | Spacing | 8px base system | Modern standard |
192
- | ML models | Minimal, rule-based preferred | Simplicity, reliability |
193
- | Versioning | HF Spaces persistent storage | Built-in, free |
194
- | Preview | Gradio + iframe (best for dynamic) | Smooth updates |
195
-
196
- ---
197
-
198
- ## 📊 Token Schema (Core Data Structures)
199
-
200
- ```python
201
- class TokenSource(Enum):
202
- DETECTED = "detected" # Directly found in CSS
203
- INFERRED = "inferred" # Derived from patterns
204
- UPGRADED = "upgraded" # User-selected improvement
205
-
206
- class Confidence(Enum):
207
- HIGH = "high" # 10+ occurrences
208
- MEDIUM = "medium" # 3-9 occurrences
209
- LOW = "low" # 1-2 occurrences
210
-
211
- class Viewport(Enum):
212
- DESKTOP = "desktop" # 1440px
213
- MOBILE = "mobile" # 375px
214
- ```
215
-
216
- ### Token Types:
217
- - **ColorToken:** value, frequency, contexts, elements, contrast ratios
218
- - **TypographyToken:** family, size, weight, line-height, elements
219
- - **SpacingToken:** value, frequency, contexts, fits_base_8
220
- - **RadiusToken:** value, frequency, elements
221
- - **ShadowToken:** value, frequency, elements
222
-
223
- ---
224
-
225
- ## 🔄 LangGraph Workflow
226
-
227
- ```
228
- ┌─────────────┐
229
- │ START │
230
- └──────┬──────┘
231
-
232
-
233
- ┌─────────────┐
234
- │ URL Input │
235
- └──────┬──────┘
236
-
237
-
238
- ┌────────────────────────┐
239
- │ Agent 1: Discover │
240
- │ (find pages) │
241
- └───────────┬────────────┘
242
-
243
-
244
- ┌────────────────────────┐
245
- │ HUMAN: Confirm pages │◄─── Checkpoint 1
246
- └───────────┬────────────┘
247
-
248
-
249
- ┌────────────────────────┐
250
- │ Agent 1: Extract │
251
- │ (crawl & extract) │
252
- └───────────┬────────────┘
253
-
254
-
255
- ┌────────────────────────┐
256
- │ Agent 2: Normalize │
257
- └───────────┬────────────┘
258
-
259
-
260
- ┌────────────────────────┐
261
- │ HUMAN: Review tokens │◄─── Checkpoint 2 (Stage 1 UI)
262
- └───────────┬────────────┘
263
-
264
- ┌───────────────┴───────────────┐
265
- │ │
266
- ▼ ▼
267
- ┌──────────────────┐ ┌──────────────────┐
268
- │ Agent 3: Advise │ │ (parallel) │
269
- │ (best practices) │ │ │
270
- └────────┬─────────┘ └──────────────────┘
271
-
272
-
273
- ┌────────────────────────┐
274
- │ HUMAN: Select options │◄─── Checkpoint 3 (Stage 2 UI)
275
- └───────────┬────────────┘
276
-
277
-
278
- ┌────────────────────────┐
279
- │ Agent 4: Generate │
280
- │ (final JSON) │
281
- └───────────┬────────────┘
282
-
283
-
284
- ┌────────────────────────┐
285
- │ HUMAN: Export │◄─── Checkpoint 4 (Stage 3 UI)
286
- └───────────┬────────────┘
287
-
288
-
289
- ┌─────────┐
290
- │ END │
291
- └─────────┘
292
- ```
293
-
294
- ---
295
-
296
- ## 🚦 Human-in-the-Loop Rules
297
-
298
- 1. **No irreversible automation**
299
- 2. **Agents propose → Humans decide**
300
- 3. **Every auto action must be:**
301
- - Visible
302
- - Reversible
303
- - Previewed
304
-
305
- ---
306
-
307
- ## 📦 Output JSON Format
308
-
309
- ```json
310
- {
311
- "metadata": {
312
- "source_url": "https://example.com",
313
- "extracted_at": "2025-01-23T10:00:00Z",
314
- "version": "v1-recovered",
315
- "viewport": "desktop"
316
- },
317
- "colors": {
318
- "primary": {
319
- "50": { "value": "#e6f2ff", "source": "upgraded" },
320
- "500": { "value": "#007bff", "source": "detected" },
321
- "900": { "value": "#001a33", "source": "upgraded" }
322
- }
323
- },
324
- "typography": {
325
- "heading-xl": {
326
- "fontFamily": "Inter",
327
- "fontSize": "32px",
328
- "fontWeight": 700,
329
- "lineHeight": "1.2",
330
- "source": "detected"
331
- }
332
- },
333
- "spacing": {
334
- "xs": { "value": "4px", "source": "upgraded" },
335
- "sm": { "value": "8px", "source": "detected" },
336
- "md": { "value": "16px", "source": "detected" }
337
- }
338
- }
339
- ```
340
-
341
- ---
342
-
343
- ## 🛠️ Implementation Phases
344
-
345
- ### Phase 1 (Current)
346
- - [x] Project structure
347
- - [x] Configuration files
348
- - [ ] Token schema (Pydantic models)
349
- - [ ] Agent 1: Crawler
350
- - [ ] Agent 1: Extractor
351
- - [ ] Agent 2: Normalizer
352
- - [ ] Stage 1 UI
353
- - [ ] LangGraph basic workflow
354
-
355
- ### Phase 2
356
- - [ ] Agent 3: Advisor
357
- - [ ] Stage 2 UI (Upgrade Playground)
358
- - [ ] Live preview system
359
-
360
- ### Phase 3
361
- - [ ] Agent 4: Generator
362
- - [ ] Stage 3 UI
363
- - [ ] Export functionality
364
-
365
- ### Phase 4
366
- - [ ] Full LangGraph orchestration
367
- - [ ] HF Spaces deployment
368
- - [ ] Persistent storage
369
-
370
- ---
371
-
372
- ## 🔑 Environment Variables
373
-
374
- ```env
375
- # Required
376
- HF_TOKEN=your_huggingface_token
377
-
378
- # Model Configuration (defaults shown — diverse providers)
379
- AGENT2_MODEL=microsoft/Phi-3.5-mini-instruct # Microsoft - Fast naming
380
- AGENT3_MODEL=meta-llama/Llama-3.1-70B-Instruct # Meta - Strong reasoning
381
- AGENT4_MODEL=mistralai/Codestral-22B-v0.1 # Mistral - Code/JSON
382
-
383
- # Optional
384
- DEBUG=true
385
- LOG_LEVEL=INFO
386
- ```
387
-
388
- ---
389
-
390
- ## 📝 Notes for Claude
391
-
392
- When continuing this project:
393
- 1. **Check current phase** in Implementation Phases section
394
- 2. **Review agent personas** in agents.yaml for consistent behavior
395
- 3. **Follow token schema** defined in core/token_schema.py
396
- 4. **Maintain LangGraph state** consistency across agents
397
- 5. **Use Gradio components** from ui/components.py for consistency
398
- 6. **Test with** real websites before deployment
399
-
400
- ---
401
-
402
- *Last updated: 2025-01-23*