File size: 6,300 Bytes
a23a2ee
9f5ee50
 
 
 
a23a2ee
 
9f5ee50
a23a2ee
 
9f5ee50
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
---
title: Design System Extractor v2
emoji: 🎨
colorFrom: purple
colorTo: blue
sdk: docker
pinned: false
license: mit
---

# Design System Extractor v2

> 🎨 A semi-automated, human-in-the-loop agentic system that reverse-engineers design systems from live websites.

## 🎯 What It Does

When you have a website but no design system documentation (common when the original Sketch/Figma files are lost), this tool helps you:

1. **Crawl** your website to discover pages
2. **Extract** design tokens (colors, typography, spacing, shadows)
3. **Review** and validate extracted tokens with visual previews
4. **Upgrade** your system with modern best practices (optional)
5. **Export** production-ready JSON tokens for Figma/code

## 🧠 Philosophy

This is **not a magic button** β€” it's a design-aware co-pilot.

- **Agents propose β†’ Humans decide**
- **Every action is visible, reversible, and previewed**
- **No irreversible automation**

## πŸ—οΈ Architecture

```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                        TECH STACK                            β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Frontend:       Gradio (interactive UI with live preview)   β”‚
β”‚  Orchestration:  LangGraph (agent workflow management)       β”‚
β”‚  Models:         Claude API (reasoning) + Rule-based         β”‚
β”‚  Browser:        Playwright (crawling & extraction)          β”‚
β”‚  Hosting:        Hugging Face Spaces                         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

### Agent Personas

| Agent | Persona | Job |
|-------|---------|-----|
| **Agent 1** | Design Archaeologist | Discover pages, extract raw tokens |
| **Agent 2** | Design System Librarian | Normalize, dedupe, structure tokens |
| **Agent 3** | Senior DS Architect | Recommend upgrades (type scales, spacing, a11y) |
| **Agent 4** | Automation Engineer | Generate final JSON for Figma/code |

## πŸš€ Quick Start

### Prerequisites

- Python 3.11+
- Node.js (for some dependencies)

### Installation

```bash
# Clone the repository
git clone <repo-url>
cd design-system-extractor

# Create virtual environment
python -m venv venv
source venv/bin/activate  # or `venv\Scripts\activate` on Windows

# Install dependencies
pip install -r requirements.txt

# Install Playwright browsers
playwright install chromium

# Copy environment file
cp config/.env.example config/.env
# Edit .env and add your ANTHROPIC_API_KEY
```

### Running

```bash
python app.py
```

Open `http://localhost:7860` in your browser.

## πŸ“– Usage Guide

### Stage 1: Discovery

1. Enter your website URL (e.g., `https://example.com`)
2. Click "Discover Pages"
3. Review discovered pages and select which to extract from
4. Ensure you have a mix of page types (homepage, listing, detail, etc.)

### Stage 2: Extraction

1. Choose viewport (Desktop 1440px or Mobile 375px)
2. Click "Extract Tokens"
3. Review extracted:
   - **Colors**: With frequency, context, and AA compliance
   - **Typography**: Font families, sizes, weights
   - **Spacing**: Values with 8px grid fit indicators
4. Accept or reject individual tokens

### Stage 3: Export

1. Review final token set
2. Export as JSON
3. Import into Figma via Tokens Studio or your plugin

## πŸ“ Project Structure

```
design-system-extractor/
β”œβ”€β”€ app.py                          # Main Gradio application
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ README.md
β”‚
β”œβ”€β”€ config/
β”‚   β”œβ”€β”€ .env.example                # Environment template
β”‚   β”œβ”€β”€ agents.yaml                 # Agent personas & settings
β”‚   └── settings.py                 # Configuration loader
β”‚
β”œβ”€β”€ agents/
β”‚   β”œβ”€β”€ state.py                    # LangGraph state definitions
β”‚   β”œβ”€β”€ graph.py                    # Workflow orchestration
β”‚   β”œβ”€β”€ crawler.py                  # Agent 1: Page discovery
β”‚   β”œβ”€β”€ extractor.py                # Agent 1: Token extraction
β”‚   β”œβ”€β”€ normalizer.py               # Agent 2: Normalization
β”‚   β”œβ”€β”€ advisor.py                  # Agent 3: Best practices
β”‚   └── generator.py                # Agent 4: JSON generation
β”‚
β”œβ”€β”€ core/
β”‚   β”œβ”€β”€ token_schema.py             # Pydantic data models
β”‚   └── color_utils.py              # Color analysis utilities
β”‚
β”œβ”€β”€ ui/
β”‚   └── (Gradio components)
β”‚
└── docs/
    └── CONTEXT.md                  # Context file for AI assistance
```

## πŸ”§ Configuration

### Environment Variables

```env
# Required
ANTHROPIC_API_KEY=your_key_here

# Optional
DEBUG=false
LOG_LEVEL=INFO
BROWSER_HEADLESS=true
```

### Agent Configuration

Agent personas and behavior are defined in `config/agents.yaml`. This includes:

- Extraction targets (colors, typography, spacing)
- Naming conventions
- Confidence thresholds
- Upgrade options

## πŸ› οΈ Development

### Running Tests

```bash
pytest tests/
```

### Adding New Features

1. Update token schema in `core/token_schema.py`
2. Add agent logic in `agents/`
3. Update UI in `app.py`
4. Update `docs/CONTEXT.md` for AI assistance

## πŸ“¦ Output Format

Tokens are exported in a platform-agnostic JSON format:

```json
{
  "metadata": {
    "source_url": "https://example.com",
    "version": "v1-recovered",
    "viewport": "desktop"
  },
  "colors": {
    "primary-500": {
      "value": "#007bff",
      "source": "detected",
      "contrast_white": 4.5
    }
  },
  "typography": {
    "heading-lg": {
      "fontFamily": "Inter",
      "fontSize": "24px",
      "fontWeight": 700
    }
  },
  "spacing": {
    "md": {
      "value": "16px",
      "source": "detected"
    }
  }
}
```

## 🀝 Contributing

Contributions are welcome! Please read the contribution guidelines first.

## πŸ“„ License

MIT

---

Built with ❀️ for designers who've lost their source files.