File size: 9,640 Bytes
439d3c4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
# Design System Extractor v2 β€” Project Context

## Architecture Overview

```
Stage 0: Configuration         Stage 1: Discovery & Extraction         Stage 2: AI Analysis              Stage 3: Export
 β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”           β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”           β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
 β”‚ HF Token Setup   β”‚ ──────>  β”‚ URL Discovery (sitemap/   β”‚ ──────>  β”‚ Layer 1: Rule Engine     β”‚ ──> β”‚ Figma Tokens β”‚
 β”‚ Benchmark Select β”‚           β”‚ crawl) + Token Extraction β”‚           β”‚ Layer 2: Benchmarks      β”‚     β”‚ JSON Export   β”‚
 β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜           β”‚ (Desktop + Mobile CSS)    β”‚           β”‚ Layer 3: LLM Agents (x3) β”‚     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜           β”‚ Layer 4: HEAD Synthesizerβ”‚
                                                                       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

### Stage 1: Discovery & Extraction (Rule-Based, Free)
- **Discover Pages**: Fetches sitemap.xml or crawls site to find pages
- **Extract Tokens**: Playwright visits each page at 2 viewports (Desktop 1440px, Mobile 375px), extracts computed CSS for colors, typography, spacing, radius, shadows
- **User Review**: Interactive tables with Accept/Reject checkboxes + visual previews

### Stage 2: AI-Powered Analysis (4 Layers)

| Layer | Type | What It Does | Cost |
|-------|------|--------------|------|
| **Layer 1** | Rule Engine | Type scale detection, AA contrast checking, spacing grid analysis, color statistics | FREE |
| **Layer 2** | Benchmark Research | Compare against Material Design 3, Apple HIG, Tailwind, etc. | ~$0.001 |
| **Layer 3** | LLM Agents (x3) | AURORA (Brand ID) + ATLAS (Benchmark) + SENTINEL (Best Practices) | ~$0.002 |
| **Layer 4** | HEAD Synthesizer | NEXUS combines all outputs into final recommendations | ~$0.001 |

### Stage 3: Export
- Apply/reject individual color, typography, spacing recommendations
- Export Figma Tokens Studio-compatible JSON

---

## Agent Roster

| Agent | Codename | Model | Temp | Input | Output | Specialty |
|-------|----------|-------|------|-------|--------|-----------|
| Brand Identifier | **AURORA** | Qwen/Qwen2.5-72B-Instruct | 0.4 | Color tokens + semantic CSS analysis | Brand primary/secondary/accent, palette strategy, cohesion score, semantic names | Creative/visual reasoning, color harmony assessment |
| Benchmark Advisor | **ATLAS** | meta-llama/Llama-3.3-70B-Instruct | 0.25 | User's type scale, spacing, font sizes + benchmark comparison data | Recommended benchmark, alignment changes, pros/cons | 128K context for large benchmark data, comparative reasoning |
| Best Practices Validator | **SENTINEL** | Qwen/Qwen2.5-72B-Instruct | 0.2 | Rule Engine results (typography, accessibility, spacing, color stats) | Overall score (0-100), check results, prioritized fix list | Methodical rule-following, precise judgment |
| HEAD Synthesizer | **NEXUS** | meta-llama/Llama-3.3-70B-Instruct | 0.3 | All 3 agent outputs + Rule Engine facts | Executive summary, scores, top 3 actions, color/type/spacing recs | 128K context for combined inputs, synthesis capability |

### Why These Models

- **Qwen 72B** (AURORA, SENTINEL): Strong creative reasoning for brand analysis; methodical structured output for best practices. Available on HF serverless without gated access.
- **Llama 3.3 70B** (ATLAS, NEXUS): 128K context window handles large combined inputs from multiple agents. Excellent comparative and synthesis reasoning.
- **Fallback**: Qwen/Qwen2.5-7B-Instruct (free tier, available when primary models fail)

### Temperature Rationale

- **0.4** (AURORA): Allows creative interpretation of color stories and palette harmony
- **0.25** (ATLAS): Analytical comparison needs consistency but some flexibility for trade-off reasoning
- **0.2** (SENTINEL): Strict rule evaluation β€” consistency is critical for compliance scoring
- **0.3** (NEXUS): Balanced β€” needs to synthesize creatively but stay grounded in agent data

---

## Evaluation & Scoring

### Self-Evaluation (All Agents)
Each agent includes a `self_evaluation` block in its JSON output:
```json
{
  "confidence": 8,          // 1-10: How confident the agent is
  "reasoning": "Clear usage patterns with 20+ colors",
  "data_quality": "good",   // good | fair | poor
  "flags": []               // e.g., ["insufficient_context", "ambiguous_data"]
}
```

### AURORA Scoring Rubric (Cohesion 1-10)
- **9-10**: Clear harmony rule, distinct brand colors, consistent palette
- **7-8**: Mostly harmonious, clear brand identity
- **5-6**: Some relationships visible but not systematic
- **3-4**: Random palette, no clear strategy
- **1-2**: Conflicting colors, no brand identity

### SENTINEL Scoring Rubric (Overall 0-100)
Weighted checks:
- AA Compliance: 25 points
- Type Scale Consistency: 15 points
- Base Size Accessible: 15 points
- Spacing Grid: 15 points
- Type Scale Standard Ratio: 10 points
- Color Count: 10 points
- No Near-Duplicates: 10 points

### NEXUS Scoring Rubric (Overall 0-100)
- **90-100**: Production-ready, minor polishing only
- **75-89**: Solid foundation, 2-3 targeted improvements
- **60-74**: Functional but needs focused attention
- **40-59**: Significant gaps requiring systematic improvement
- **20-39**: Major rework needed
- **0-19**: Fundamental redesign recommended

### Evaluation Summary (Logged After Analysis)
```
═══════════════════════════════════════════════════
πŸ” AGENT EVALUATION SUMMARY
═══════════════════════════════════════════════════
   🎨 AURORA  (Brand ID):    confidence=8/10, data=good
   🏒 ATLAS   (Benchmark):   confidence=7/10, data=good
   βœ… SENTINEL (Practices):  confidence=9/10, data=good, score=72/100
   🧠 NEXUS   (Synthesis):   confidence=8/10, data=good, overall=65/100
═══════════════════════════════════════════════════
```

---

## User Journey

1. **Enter HF Token** β€” Required for LLM inference (free tier works)
2. **Enter Website URL** β€” The site to extract design tokens from
3. **Discover Pages** β€” Auto-finds pages via sitemap or crawling
4. **Select Pages** β€” Check/uncheck pages to include (max 10)
5. **Extract Tokens** β€” Scans selected pages at Desktop + Mobile viewports
6. **Review Stage 1** β€” Interactive tables: Colors, Typography, Spacing, Radius, Shadows, Semantic Colors. Each tab has a data table + visual preview accordion. Accept/reject individual tokens.
7. **Proceed to Stage 2** β€” Select benchmarks to compare against
8. **Run AI Analysis** β€” 4-layer pipeline executes (Rule Engine -> Benchmarks -> LLM Agents -> Synthesis)
9. **Review Analysis** β€” Dashboard with scores, recommendations, benchmark comparison, color recs
10. **Apply Upgrades** β€” Accept/reject individual recommendations
11. **Export JSON** β€” Download Figma Tokens Studio-compatible JSON

---

## File Structure

| File | Responsibility |
|------|----------------|
| `app.py` | Main Gradio UI β€” all stages, CSS, event bindings, formatting functions |
| `agents/llm_agents.py` | 4 LLM agent classes (AURORA, ATLAS, SENTINEL, NEXUS) + dataclasses |
| `agents/semantic_analyzer.py` | Semantic color categorization (brand, text, background, etc.) |
| `config/settings.py` | Model routing, env var loading, agent-to-model mapping |
| `core/hf_inference.py` | HF Inference API client, model registry, temperature mapping |
| `core/preview_generator.py` | HTML preview generators for Stage 1 visual previews |
| `core/rule_engine.py` | Layer 1: Type scale, AA contrast, spacing grid, color stats |
| `core/benchmarks.py` | Benchmark definitions (Material Design 3, Apple HIG, etc.) |
| `core/extractor.py` | Playwright-based CSS token extraction |
| `core/discovery.py` | Page discovery via sitemap.xml / crawling |

---

## Configuration

### Environment Variables

| Variable | Default | Description |
|----------|---------|-------------|
| `HF_TOKEN` | (required) | HuggingFace API token |
| `BRAND_IDENTIFIER_MODEL` | `Qwen/Qwen2.5-72B-Instruct` | Model for AURORA |
| `BENCHMARK_ADVISOR_MODEL` | `meta-llama/Llama-3.3-70B-Instruct` | Model for ATLAS |
| `BEST_PRACTICES_MODEL` | `Qwen/Qwen2.5-72B-Instruct` | Model for SENTINEL |
| `HEAD_SYNTHESIZER_MODEL` | `meta-llama/Llama-3.3-70B-Instruct` | Model for NEXUS |
| `FALLBACK_MODEL` | `Qwen/Qwen2.5-7B-Instruct` | Fallback when primary fails |
| `HF_MAX_NEW_TOKENS` | `2048` | Max tokens per LLM response |
| `HF_TEMPERATURE` | `0.3` | Global default temperature |
| `MAX_PAGES` | `20` | Max pages to discover |
| `BROWSER_TIMEOUT` | `30000` | Playwright timeout (ms) |

### Model Override Examples
```bash
# Use Llama for all agents
export BRAND_IDENTIFIER_MODEL="meta-llama/Llama-3.3-70B-Instruct"
export BEST_PRACTICES_MODEL="meta-llama/Llama-3.3-70B-Instruct"

# Use budget models
export BRAND_IDENTIFIER_MODEL="Qwen/Qwen2.5-7B-Instruct"
export BENCHMARK_ADVISOR_MODEL="mistralai/Mixtral-8x7B-Instruct-v0.1"
```