---
title: Design System Extractor v2
emoji: 🎨
colorFrom: purple
colorTo: blue
sdk: docker
pinned: false
license: mit
---

# Design System Extractor v2

> 🎨 A semi-automated, human-in-the-loop agentic system that reverse-engineers design systems from live websites.

## 🎯 What It Does

When you have a website but no design system documentation (common when the original Sketch/Figma files are lost), this tool helps you:

1. **Crawl** your website to discover pages
2. **Extract** design tokens (colors, typography, spacing, shadows)
3. **Review** and validate extracted tokens with visual previews
4. **Upgrade** your system with modern best practices (optional)
5. **Export** production-ready JSON tokens for Figma/code

## 🧠 Philosophy

This is **not a magic button** — it's a design-aware co-pilot.

- **Agents propose → Humans decide**
- **Every action is visible, reversible, and previewed**
- **No irreversible automation**

## 🏗️ Architecture

```
┌──────────────────────────────────────────────────────────────┐
│                        TECH STACK                            │
├──────────────────────────────────────────────────────────────┤
│  Frontend:       Gradio (interactive UI with live preview)   │
│  Orchestration:  LangGraph (agent workflow management)       │
│  Models:         Claude API (reasoning) + Rule-based         │
│  Browser:        Playwright (crawling & extraction)          │
│  Hosting:        Hugging Face Spaces                         │
└──────────────────────────────────────────────────────────────┘
```

### Agent Personas

| Agent | Persona | Job |
|-------|---------|-----|
| **Agent 1** | Design Archaeologist | Discover pages, extract raw tokens |
| **Agent 2** | Design System Librarian | Normalize, dedupe, structure tokens |
| **Agent 3** | Senior DS Architect | Recommend upgrades (type scales, spacing, a11y) |
| **Agent 4** | Automation Engineer | Generate final JSON for Figma/code |

## 🚀 Quick Start

### Prerequisites

- Python 3.11+
- Node.js (for some dependencies)

### Installation

```bash
# Clone the repository
git clone <repo-url>
cd design-system-extractor

# Create virtual environment
python -m venv venv
source venv/bin/activate  # or `venv\Scripts\activate` on Windows

# Install dependencies
pip install -r requirements.txt

# Install Playwright browsers
playwright install chromium

# Copy environment file
cp config/.env.example config/.env
# Edit .env and add your ANTHROPIC_API_KEY
```

### Running

```bash
python app.py
```

Open `http://localhost:7860` in your browser.

## 📖 Usage Guide

### Stage 1: Discovery

1. Enter your website URL (e.g., `https://example.com`)
2. Click "Discover Pages"
3. Review discovered pages and select which to extract from
4. Ensure you have a mix of page types (homepage, listing, detail, etc.)

### Stage 2: Extraction

1. Choose viewport (Desktop 1440px or Mobile 375px)
2. Click "Extract Tokens"
3. Review extracted:
   - **Colors**: With frequency, context, and AA compliance
   - **Typography**: Font families, sizes, weights
   - **Spacing**: Values with 8px grid fit indicators
4. Accept or reject individual tokens

### Stage 3: Export

1. Review final token set
2. Export as JSON
3. Import into Figma via Tokens Studio or your plugin

## 📁 Project Structure

```
design-system-extractor/
├── app.py                          # Main Gradio application
├── requirements.txt
├── README.md
│
├── config/
│   ├── .env.example                # Environment template
│   ├── agents.yaml                 # Agent personas & settings
│   └── settings.py                 # Configuration loader
│
├── agents/
│   ├── state.py                    # LangGraph state definitions
│   ├── graph.py                    # Workflow orchestration
│   ├── crawler.py                  # Agent 1: Page discovery
│   ├── extractor.py                # Agent 1: Token extraction
│   ├── normalizer.py               # Agent 2: Normalization
│   ├── advisor.py                  # Agent 3: Best practices
│   └── generator.py                # Agent 4: JSON generation
│
├── core/
│   ├── token_schema.py             # Pydantic data models
│   └── color_utils.py              # Color analysis utilities
│
├── ui/
│   └── (Gradio components)
│
└── docs/
    └── CONTEXT.md                  # Context file for AI assistance
```

## 🔧 Configuration

### Environment Variables

```env
# Required
ANTHROPIC_API_KEY=your_key_here

# Optional
DEBUG=false
LOG_LEVEL=INFO
BROWSER_HEADLESS=true
```

### Agent Configuration

Agent personas and behavior are defined in `config/agents.yaml`. This includes:

- Extraction targets (colors, typography, spacing)
- Naming conventions
- Confidence thresholds
- Upgrade options

## 🛠️ Development

### Running Tests

```bash
pytest tests/
```

### Adding New Features

1. Update token schema in `core/token_schema.py`
2. Add agent logic in `agents/`
3. Update UI in `app.py`
4. Update `docs/CONTEXT.md` for AI assistance

## 📦 Output Format

Tokens are exported in a platform-agnostic JSON format:

```json
{
  "metadata": {
    "source_url": "https://example.com",
    "version": "v1-recovered",
    "viewport": "desktop"
  },
  "colors": {
    "primary-500": {
      "value": "#007bff",
      "source": "detected",
      "contrast_white": 4.5
    }
  },
  "typography": {
    "heading-lg": {
      "fontFamily": "Inter",
      "fontSize": "24px",
      "fontWeight": 700
    }
  },
  "spacing": {
    "md": {
      "value": "16px",
      "source": "detected"
    }
  }
}
```

## 🤝 Contributing

Contributions are welcome! Please read the contribution guidelines first.

## 📄 License

MIT

---

Built with ❤️ for designers who've lost their source files.