File size: 32,726 Bytes
f0ceb42
abab3e7
f0ceb42
abab3e7
f0ceb42
abab3e7
f0ceb42
abab3e7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f0ceb42
abab3e7
f0ceb42
abab3e7
 
 
f0ceb42
 
abab3e7
 
 
 
f0ceb42
abab3e7
 
 
 
 
f0ceb42
abab3e7
 
 
 
 
 
 
 
 
 
f0ceb42
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
abab3e7
 
f0ceb42
abab3e7
 
 
f0ceb42
abab3e7
 
 
 
f0ceb42
abab3e7
f0ceb42
abab3e7
f0ceb42
abab3e7
f0ceb42
abab3e7
f0ceb42
abab3e7
f0ceb42
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
abab3e7
f0ceb42
abab3e7
 
f0ceb42
 
 
 
 
 
 
 
 
 
 
abab3e7
 
f0ceb42
abab3e7
f0ceb42
abab3e7
f0ceb42
abab3e7
f0ceb42
abab3e7
f0ceb42
abab3e7
f0ceb42
abab3e7
 
f0ceb42
 
 
 
 
 
 
 
 
 
 
abab3e7
 
f0ceb42
 
 
abab3e7
f0ceb42
abab3e7
f0ceb42
 
 
 
 
 
 
 
 
 
 
abab3e7
 
 
f0ceb42
 
 
 
 
 
 
 
 
 
 
 
 
abab3e7
f0ceb42
 
 
 
 
 
 
 
 
 
 
 
 
 
 
abab3e7
f0ceb42
 
 
 
 
abab3e7
 
f0ceb42
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
abab3e7
 
f0ceb42
abab3e7
f0ceb42
abab3e7
 
 
f0ceb42
abab3e7
f0ceb42
abab3e7
f0ceb42
abab3e7
 
 
f0ceb42
abab3e7
f0ceb42
abab3e7
f0ceb42
abab3e7
f0ceb42
 
 
 
 
abab3e7
f0ceb42
abab3e7
 
 
f0ceb42
 
 
 
abab3e7
f0ceb42
 
abab3e7
 
 
f0ceb42
abab3e7
 
 
 
 
 
 
 
 
 
 
f0ceb42
abab3e7
 
 
f0ceb42
 
abab3e7
 
f0ceb42
 
 
abab3e7
 
 
 
f0ceb42
 
abab3e7
 
 
f0ceb42
abab3e7
f0ceb42
abab3e7
 
 
 
 
f0ceb42
abab3e7
 
 
 
 
f0ceb42
abab3e7
 
 
f0ceb42
 
 
 
 
 
abab3e7
 
 
 
f0ceb42
abab3e7
 
 
 
 
 
 
 
f0ceb42
 
abab3e7
 
f0ceb42
abab3e7
 
f0ceb42
 
 
 
abab3e7
f0ceb42
abab3e7
 
 
f0ceb42
 
abab3e7
 
 
 
f0ceb42
 
 
 
 
abab3e7
f0ceb42
 
abab3e7
 
 
f0ceb42
abab3e7
 
f0ceb42
 
 
 
abab3e7
 
 
 
f0ceb42
abab3e7
 
 
f0ceb42
 
 
 
 
 
 
 
 
 
 
 
 
d041f14
f0ceb42
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
abab3e7
f0ceb42
abab3e7
f0ceb42
abab3e7
f0ceb42
 
 
 
abab3e7
f0ceb42
abab3e7
 
f0ceb42
 
 
 
 
 
 
 
 
abab3e7
 
f0ceb42
 
abab3e7
 
 
 
 
 
 
 
f0ceb42
 
 
 
 
 
 
abab3e7
 
 
 
 
 
 
f0ceb42
abab3e7
f0ceb42
abab3e7
 
 
f0ceb42
 
 
 
abab3e7
 
 
 
 
 
 
 
 
 
 
f0ceb42
 
abab3e7
f0ceb42
abab3e7
 
 
 
 
 
 
f0ceb42
 
 
 
abab3e7
 
 
 
 
 
 
 
 
f0ceb42
abab3e7
 
 
f0ceb42
 
abab3e7
 
 
 
 
 
 
 
f0ceb42
abab3e7
 
 
 
 
 
f0ceb42
abab3e7
 
f0ceb42
 
 
 
 
abab3e7
 
 
 
 
 
 
 
 
f0ceb42
 
 
abab3e7
f0ceb42
abab3e7
f0ceb42
 
 
 
 
 
abab3e7
 
 
 
f0ceb42
abab3e7
 
 
 
 
 
 
 
f0ceb42
abab3e7
 
 
f0ceb42
abab3e7
f0ceb42
abab3e7
f0ceb42
abab3e7
f0ceb42
abab3e7
 
 
 
 
 
 
f0ceb42
 
 
abab3e7
 
 
 
 
 
f0ceb42
 
abab3e7
 
f0ceb42
 
 
 
 
abab3e7
 
 
 
 
 
 
 
 
f0ceb42
 
abab3e7
 
 
f0ceb42
abab3e7
 
 
f0ceb42
 
 
 
 
 
 
 
 
 
 
 
 
 
 
abab3e7
 
f0ceb42
 
 
 
abab3e7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f0ceb42
abab3e7
 
 
f0ceb42
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
# AI in My Daily Work β€” Episode 6: Reverse-Engineering Design Systems with 4 AI Agents, a Rule-Based Color Classifier & a Free Rule Engine

## A Semi-Automated Workflow: From Website URL to Figma-Ready Design System (v3.2)

*How I built a system that extracts any website's design tokens, classifies colors deterministically, audits them like a senior design team, and generates a visual spec in Figma β€” for ~$0.003 per run.*

[IMAGE: Hero - Complete workflow showing Website -> AI Agents -> Figma Visual Spec]

---

## The Problem Every Designer Knows

I've been managing design systems for consumer-facing apps for over 10 years. And there's one task that never gets easier: **auditing existing websites to extract their design tokens.**

Whether it's analyzing a competitor, inheriting a legacy project, or bringing consistency to a sprawling multi-brand portfolio, the process is always the same:

1. Open DevTools
2. Inspect elements one by one
3. Copy hex codes to a spreadsheet
4. Manually check contrast ratios
5. Try to identify the type scale (is it 1.2? 1.25? Random?)
6. Repeat for spacing, shadows, border radius...
7. Spend days organizing into a coherent system
8. Manually recreate in Figma as variables
9. Manually build a visual spec page

I've done this dozens of times. It takes **3-5 days** for a single website. And by the time you're done, something has already changed.

I wanted a system that could think like a design team:

- a **data engineer** extracting and normalizing every token
- a **color scientist** classifying colors by actual CSS usage (not guessing)
- an **analyst** identifying brand colors and patterns
- a **senior reviewer** benchmarking against industry standards
- and a **chief architect** synthesizing everything into action

So I built one. Three versions later, here's what works.

---

## The Solution (In One Sentence)

I built a 3-layer system β€” deterministic extraction + rule-based color classification + 4 AI agents β€” that acts like an entire design audit team. It outputs W3C DTCG-compliant JSON that feeds directly into Figma via a custom plugin that auto-generates a visual spec page. Cost: ~$0.003 per analysis.

---

## The Complete Workflow

[IMAGE: Full workflow diagram showing all 8 steps]

Here's the end-to-end process I now use:

```
+--------------------------------------------------------------+
|                    MY DESIGN SYSTEM WORKFLOW                    |
+--------------------------------------------------------------+
|                                                                |
|  STEP 1: Extract AS-IS (AI Agent App)                         |
|  ----------------------------------------                     |
|  * Enter website URL                                          |
|  * AI auto-discovers pages                                    |
|  * Extracts colors, typography, spacing, shadows, radius      |
|  * Normalizes: dedup, sort, name (radius, shadows, colors)    |
|  * Color Classifier: deterministic role assignment             |
|  * Rule Engine: WCAG + type scale + spacing grid              |
|  * Download AS-IS JSON (W3C DTCG v1 format)                  |
|                                                                |
|                           |                                    |
|                           v                                    |
|                                                                |
|  STEP 2: Import to Figma (My Plugin)                          |
|  ----------------------------------------                     |
|  * Open Figma                                                 |
|  * Upload AS-IS JSON via custom plugin                        |
|  * Plugin auto-detects DTCG format                            |
|  * Creates Variables + Paint/Text/Effect Styles                |
|  * Auto-generates Visual Spec Page                            |
|                                                                |
|                           |                                    |
|                           v                                    |
|                                                                |
|  STEP 3: View AS-IS Visual Spec (Figma)                       |
|  ----------------------------------------                     |
|  * Typography (Desktop + Mobile) with AA badges               |
|  * Colors organized by semantic role                           |
|  * Spacing scale, Radius display, Shadow elevation            |
|  * Review what exists before modernizing                      |
|                                                                |
|                           |                                    |
|                           v                                    |
|                                                                |
|  STEP 4: AI Analysis (AI Agent App - Stage 2)                 |
|  ----------------------------------------                     |
|  * Free Rule Engine: WCAG, type scale, spacing grid           |
|  * AURORA: Brand color identification (advisory)              |
|  * ATLAS: Industry benchmark comparison (8 systems)           |
|  * SENTINEL: Best practices audit with priorities             |
|  * NEXUS: Final synthesis resolving all contradictions         |
|                                                                |
|                           |                                    |
|                           v                                    |
|                                                                |
|  STEP 5: Accept/Reject Suggestions (AI Agent App)             |
|  ----------------------------------------                     |
|  * Review each recommendation                                 |
|  * Accept or Reject individually                              |
|  * I stay in control of what changes                          |
|                                                                |
|                           |                                    |
|                           v                                    |
|                                                                |
|  STEP 6: Export TO-BE (AI Agent App - Stage 3)                |
|  ----------------------------------------                     |
|  * Generate modernized TO-BE JSON (DTCG compliant)            |
|  * Contains accepted improvements                             |
|  * Download new JSON file                                     |
|                                                                |
|                           |                                    |
|                           v                                    |
|                                                                |
|  STEP 7: Import TO-BE to Figma (My Plugin)                    |
|  ----------------------------------------                     |
|  * Upload TO-BE JSON via same plugin                          |
|  * Figma Variables update with new values                     |
|  * New Visual Spec generated for comparison                   |
|                                                                |
|                           |                                    |
|                           v                                    |
|                                                                |
|  STEP 8: Compare AS-IS vs TO-BE (Figma)                       |
|  ----------------------------------------                     |
|  * Side-by-side visual spec pages                             |
|  * See exactly what changed and why                           |
|  * Ready to use in production                                 |
|                                                                |
+--------------------------------------------------------------+
```

**Total time:** ~15 minutes (vs 3-5 days manual)

---

## Architecture Overview: Three Layers, One Clear Authority Chain

My first attempt (V1) made a classic mistake:
**I used a large language model for everything.**

V1 cost $0.50-1.00 per run, took 15+ seconds for basic math, and LLMs hallucinated contrast ratios.

V2 split the work into rules vs AI. Better, but a new problem emerged: **three competing naming systems** for colors. The normalizer used word-based shades ("blue.light"), the export layer used numeric shades ("blue.500"), and the LLM agent used whatever it felt like ("brand.primary"). The output in Figma was chaos.

V3 fixed this with a clear authority chain and a dedicated color classifier:

> **Rule-based code handles certainty. LLMs handle ambiguity. And there's ONE naming authority.**

[IMAGE: Architecture diagram - Layer 1 (Extraction) -> Layer 2 (Classification + Analysis) -> Layer 3 (4 Named Agents)]

```
+--------------------------------------------------+
|  LAYER 1: EXTRACTION + NORMALIZATION (Free)       |
|  +- Crawler + 7-Source Extractor (Playwright)     |
|  +- Normalizer: colors, radius, shadows, typo     |
|  |   +- Radius: parse, deduplicate, sort, name   |
|  |   +- Shadows: parse, sort by blur, name        |
|  |   +- Colors: hue + numeric shade (50-900)      |
|  +- Firecrawl: deep CSS parsing (bypass CORS)     |
+--------------------------------------------------+
|  LAYER 2: CLASSIFICATION + RULE ENGINE (Free)     |
|  +- Color Classifier (815 lines, deterministic)   |
|  |   +- CSS evidence -> category -> token name    |
|  |   +- Capped: brand(3), text(3), bg(3), etc.   |
|  |   +- Every decision logged with evidence       |
|  +- WCAG Contrast Checker (actual FG/BG pairs)    |
|  +- Type Scale Detection (ratio math)             |
|  +- Spacing Grid Analysis (GCD math)              |
|  +- Color Statistics (deduplication)               |
+--------------------------------------------------+
|  LAYER 3: 4 AI AGENTS (~$0.003)                   |
|  +- AURORA   - Brand Advisor        (Qwen 72B)   |
|  +- ATLAS    - Benchmark Advisor    (Llama 70B)   |
|  +- SENTINEL - Best Practices Audit (Qwen 72B)   |
|  +- NEXUS    - Head Synthesizer     (Llama 70B)   |
+--------------------------------------------------+
```

### The Naming Authority Chain (V3's Key Innovation)

This was the single hardest problem to solve. In V2, three systems produced color names:

| System | Convention | Example | Problem |
|--------|-----------|---------|---------|
| Normalizer | Word shades | `color.blue.light` | Inconsistent |
| Export function | Numeric shades | `color.blue.500` | Conflicts |
| AURORA LLM | Whatever it wants | `brand.primary` | Unpredictable |

**Result in Figma: `blue.300`, `blue.dark`, `blue.light`, `blue.base` in the same export. Unusable.**

V3 established a clear chain:

```
1. Color Classifier (PRIMARY) - deterministic, covers ALL colors
   +- Rule-based: CSS evidence -> category -> token name
   +- 100% reproducible, logged with evidence

2. AURORA LLM (SECONDARY) - semantic role enhancer ONLY
   +- Can promote "color.blue.500" -> "color.brand.primary"
   +- CANNOT rename palette colors
   +- Only brand/text/bg/border/feedback roles accepted

3. Normalizer (FALLBACK) - preliminary hue+shade names
   +- Only used if classifier hasn't run yet
```

One naming authority. No conflicts. Clean Figma output every time.

---

## Layer 1: Extraction + Normalization (No LLM)

### Extraction: 7 Sources

A Playwright-powered browser visits each page at **two viewports** (1440px desktop + 375px mobile) and extracts every design token from **8 sources**:

[IMAGE: 8 Extraction Sources diagram]

```
--- Playwright (7 internal sources) ---
Source 1: Computed Styles   -> What the browser actually renders
Source 2: CSS Variables     -> --primary-color, --spacing-md
Source 3: Inline Styles     -> style="color: #06b2c4"
Source 4: SVG Attributes    -> fill, stroke colors
Source 5: Stylesheets       -> CSS rules, hover states, pseudo-elements
Source 6: External CSS      -> Fetched & parsed CSS files
Source 7: Page Scan         -> Brute-force regex on style blocks

--- Separate deep extraction ---
Source 8: Firecrawl         -> Deep CSS parsing (bypasses CORS)
```

### Normalization: Not Just Dedup

The normalizer in V2 was a major pain point. Colors got named, but radius and shadows were passed through raw. Multi-value CSS like `"0px 0px 16px 16px"` became garbage tokens. Percentage values like `"50%"` couldn't be used in Figma.

V3's normalizer actually processes everything:

**Colors:** Deduplicate by exact hex + RGB distance < 30. Assign hue family + numeric shade (50-900). Never use words like "light" or "dark" for shades. Add role hints from CSS context for the classifier.

**Radius:** Parse multi-value shorthand (take max), convert rem/em/% to px, deduplicate by resolved value, sort by size, name semantically (none/sm/md/lg/xl/2xl/full). A raw extraction of `["8px", "0px 0px 16px 16px", "50%", "1rem"]` becomes:
```
radius.sm   = 4px    (from 0.25rem context)
radius.md   = 8px
radius.xl   = 16px   (max of 0 0 16 16)
radius.full = 9999px (from 50%)
```

**Shadows:** Parse CSS shadow strings into components (offset, blur, spread, color). Filter out spread-only (border simulation) and inset shadows. Sort by blur radius. Deduplicate by blur bucket. Name by elevation (xs/sm/md/lg/xl). If fewer than 5 shadows extracted, interpolate to always produce 5 elevation levels.

**Cost: $0.00 | Runtime: ~90 seconds**

---

## Layer 2: Color Classification + Rule Engine (No LLM)

### The Color Classifier (V3's Biggest Addition)

This is 815 lines of deterministic code that replaced what AURORA used to do badly.

**The problem it solves:** Given 30+ extracted colors, which is the brand primary? Which are text colors? Which are backgrounds?

An LLM can reason about this, but inconsistently. The same color might be called "brand.primary" in one run and "accent.main" in the next. And it only named 10 colors, leaving the rest in chaos.

The classifier uses CSS evidence:

```
CSS Evidence -> Category:
  background-color on <button> + saturated + freq>5 -> BRAND
  color on <p>/<span> + low saturation              -> TEXT
  background-color on <div>/<body> + neutral         -> BG
  border-color + low saturation                      -> BORDER
  red hue + sat>0.6 + low freq                       -> FEEDBACK (error)
  everything else                                    -> PALETTE (by hue.shade)
```

**Key features:**
- **Aggressive deduplication**: Colors within RGB distance < 30 AND same category get merged (13 text grays become 3)
- **Capped categories**: brand (max 3), text (max 3), bg (max 3), border (max 3), feedback (max 4), palette (rest)
- **User-selectable naming convention**: semantic, tailwind, or material
- **Every decision logged with evidence**: `[DEDUP] merged #1a1a1a with #1b1b1b (dist=1.7)`, `[CLASSIFY] #06b2c4 -> brand (background-color on <button>, freq=33)`

**Cost: $0.00 | Reproducible: 100% | Runtime: <100ms**

### The Rule Engine

After classification, the rule engine runs every check that can be done with pure math:

```
TYPE SCALE ANALYSIS
+- Detected Ratio: 1.167
+- Closest Standard: Minor Third (1.2)
+- Consistent: Warning (variance: 0.24)
+- Recommendation: 1.25 (Major Third)

ACCESSIBILITY CHECK (WCAG AA/AAA)
+- Colors Analyzed: 210
+- FG/BG Pairs Checked: 220
+- AA Pass: 143
+- AA Fail (real FG/BG pairs): 67
|  +- fg:#06b2c4 on bg:#ffffff -> Fix: #048391 (4.5:1)
|  +- fg:#999999 on bg:#ffffff -> Fix: #757575 (4.6:1)
|  +- ... and 62 more

SPACING GRID
+- Detected Base: 1px (GCD)
+- Grid Aligned: Warning 0%
+- Recommendation: 8px grid

CONSISTENCY SCORE: 52/100
```

Not just "color vs white" β€” it tests **actual foreground/background pairs** found on the page. And algorithmically generates AA-compliant alternatives.

This entire layer runs **in under 1 second** and costs nothing β€” the single biggest cost optimization in the system.

---

## Layer 3: AI Analysis & Interpretation (4 Named Agents)

This is where language models actually add value β€” tasks that require **context, reasoning, and judgment**. But in V3, they're advisory only. They don't control naming.

[IMAGE: Agent pipeline diagram - AURORA -> ATLAS -> SENTINEL -> NEXUS]

---

### Agent 1: AURORA β€” Brand Color Advisor
**Model:** Qwen 72B (HuggingFace PRO)
**Role change in V3:** Advisory only. Cannot rename colors. Can promote palette colors to semantic roles.

**What AURORA does now:**

The color classifier handles the naming. AURORA's job shifted to:
- Identify brand strategy (complementary? analogous? monochrome?)
- Suggest which palette colors deserve semantic roles (e.g., "color.blue.500 should be color.brand.primary")
- Assess palette cohesion (score 1-10)
- Provide reasoning that helps designers understand the brand's color story

**The key constraint:** `filter_aurora_naming_map()` strips any non-semantic names from AURORA's output. If AURORA tries to rename `color.blue.500` to `color.ocean.primary`, it's rejected. Only `brand.`, `text.`, `bg.`, `border.`, `feedback.` role assignments pass through.

```
AURORA's Analysis:
------------------------------------------
Brand Primary:  #06b2c4 (confidence: HIGH)
  +- 33 buttons, 12 CTAs, dominant accent
  +- Classifier already tagged as brand

Brand Secondary: #c1df1f (confidence: MEDIUM)
  +- 15 accent elements, secondary CTA

Palette Strategy: Complementary
Cohesion Score: 7/10
  +- "Clear hierarchy, accent colors differentiated"
```

---

### Agent 2: ATLAS β€” Benchmark Advisor
**Model:** Llama 3.3 70B (128K context)

**Unique Capability:** Industry benchmarking against **8 design systems** (Material 3, Polaris, Atlassian, Carbon, Apple HIG, Tailwind, Ant, Chakra).

[IMAGE: Benchmark comparison table from the UI]

This agent reasons about **effort vs. value**:

```
ATLAS's Recommendation:
------------------------------------------
1st: Shopify Polaris: 87% match

Alignment Changes:
  +- Type scale: 1.17 -> 1.25 (effort: medium)
  +- Spacing grid: mixed -> 4px (effort: high)
  +- Base size: 16px -> 16px (already aligned)

Pros: Closest match, e-commerce proven, well-documented
Cons: Spacing migration is significant effort

2nd: Material 3 (77% match)
  +- "Stronger mobile patterns, but 8px grid
       requires more restructuring"
```

ATLAS adds the context that turns analysis into action:

> "You're 87% aligned to Polaris already. Closing the gap on type scale takes ~1 hour and makes your system industry-standard."

---

### Agent 3: SENTINEL β€” Best Practices Auditor
**Model:** Qwen 72B
**V3 improvement:** Must cite specific data from rule engine. Cross-reference critic validates that scores match actual data.

SENTINEL prioritizes by **business impact** β€” not just severity:

```
SENTINEL's Audit:
------------------------------------------
Overall Score: 68/100

Checks:
  +- PASS:    Type Scale Standard (1.25 ratio)
  +- WARNING: Type Scale Consistency (variance 0.18)
  +- PASS:    Base Size Accessible (16px)
  +- FAIL:    AA Compliance (67 failures)
  +- WARNING: Spacing Grid (0% aligned)
  +- FAIL:    Near-Duplicates (351 pairs)

Priority Fixes:
  #1 Fix brand color AA compliance
     Impact: HIGH | Effort: 5 min
     -> "Affects 40% of interactive elements"

  #2 Consolidate near-duplicate colors
     Impact: MEDIUM | Effort: 2 hours

  #3 Align spacing to 8px grid
     Impact: MEDIUM | Effort: 1 hour
```

**V3's grounding rule:** If the rule engine says 67 AA failures, SENTINEL's AA check **must** be "fail." A cross-reference critic catches contradictions.

---

### Agent 4: NEXUS β€” Head Synthesizer
**Model:** Llama 3.3 70B (128K context)

NEXUS takes outputs from **all three agents + the rule engine** and synthesizes a final recommendation using a two-perspective evaluation:

- **Perspective A (Accessibility-First):** Weights AA compliance at 40%
- **Perspective B (Balanced):** Equal weights across dimensions

It evaluates both, then picks the perspective that best reflects the actual data.

```
NEXUS Final Synthesis:
------------------------------------------
Executive Summary:
"Your design system scores 68/100. Critical:
67 color pairs fail AA. Top action: fix brand
primary contrast (5 min, high impact)."

Scores:
  +- Overall:       68/100
  +- Accessibility:  45/100
  +- Consistency:    75/100
  +- Organization:   70/100

Top 3 Actions:
  1. Fix brand color AA (#06b2c4 -> #048391)
     Impact: HIGH | Effort: 5 min
  2. Align type scale to 1.25
     Impact: MEDIUM | Effort: 1 hour
  3. Consolidate 143 -> ~20 semantic colors
     Impact: MEDIUM | Effort: 2 hours

Color Recommendations:
  +- PASS:   brand.primary: #06b2c4 -> #048391 (auto-accept)
  +- PASS:   text.secondary: #999999 -> #757575 (auto-accept)
  +- REJECT: brand.accent: #FF6B35 -> #E65100 (user decides)
```

---

## The Figma Bridge: DTCG JSON -> Variables -> Visual Spec

[IMAGE: Figma plugin UI showing import options]

### W3C DTCG v1 Compliance

V3's export follows the W3C Design Tokens Community Group specification (stable October 2025):

```json
{
  "color": {
    "brand": {
      "primary": {
        "$type": "color",
        "$value": "#005aa3",
        "$description": "[classifier] brand: primary_action",
        "$extensions": {
          "com.design-system-automation": {
            "frequency": 47,
            "confidence": "high",
            "category": "brand",
            "evidence": ["background-color on <a>", "background-color on <button>"]
          }
        }
      }
    }
  },
  "radius": {
    "md": { "$type": "dimension", "$value": "8px" }
  },
  "shadow": {
    "sm": {
      "$type": "shadow",
      "$value": {
        "offsetX": "0px", "offsetY": "2px",
        "blur": "8px", "spread": "0px",
        "color": "#00000026"
      }
    }
  }
}
```

Every token includes `$type`, `$value`, and `$description`. Colors include `$extensions` with extraction metadata (frequency, confidence, category, evidence). This means any DTCG-compatible tool can consume our output.

### The Custom Figma Plugin

The plugin closes the loop:

1. **Auto-detects DTCG format** (vs legacy JSON)
2. **Creates Figma Variables** β€” Color, Number, and String variable collections
3. **Creates Styles** β€” Paint styles, Text styles, Effect styles
4. **Generates Visual Spec Page** β€” Separate frames for typography, colors, spacing, radius, shadows

[IMAGE: Figma visual spec page showing all tokens]

```
+-------------------------------------------------------------+
|  BRAND        TEXT           BACKGROUND      FEEDBACK        |
+-------------------------------------------------------------+
|  +----+ +----+   +----+ +----+   +----+ +----+   +----+     |
|  |Prim| |Sec |   |Prim| |Sec |   |Prim| |Sec |   |Err |     |
|  +----+ +----+   +----+ +----+   +----+ +----+   +----+     |
|  #005aa3 #c1df1f #373737 #666666 #fff   #f5f5f5  #dc2626    |
|  AA:Pass AA:Warn AA:Pass AA:Pass                  AA:Pass    |
+-------------------------------------------------------------+
```

The visual spec uses horizontal auto-layout with AA compliance badges on every color swatch. Typography renders in the actual detected font family with size, weight, and line-height metadata.

---

## Comparing AS-IS vs TO-BE

[IMAGE: Side-by-side comparison of AS-IS and TO-BE specimens]

| Token | AS-IS | TO-BE | Change |
|-------|-------|-------|--------|
| Type Scale | ~1.18 (random) | 1.25 (Major Third) | Consistent |
| brand.primary | #06b2c4 | #048391 | AA: 3.2 -> 4.5 |
| Spacing Grid | Mixed | 8px base | Standardized |
| Color Ramps | None | 50-950 | Generated |
| Unique Colors | 143 | ~20 semantic | Consolidated |
| Radius | Raw CSS garbage | none/sm/md/lg/xl/full | Normalized |
| Shadows | Unsorted, unnamed | xs/sm/md/lg/xl (5 levels) | Progressive |

---

## The Numbers

| Metric | Manual Process | My Workflow |
|--------|---------------|-------------|
| Time | 3-5 days | ~15 minutes |
| Cost | Designer salary | ~$0.003 |
| Coverage | ~50 colors | 143 colors (8 sources) |
| Accuracy | Human error | Computed styles (exact) |
| Accessibility | Manual spot checks | Full AA/AAA (all 220 pairs) |
| Benchmarking | Subjective | 8 industry systems compared |
| Color naming | Manual | Deterministic classifier (100% reproducible) |
| Radius/shadows | Copy raw CSS | Normalized, sorted, named |
| Figma ready | Hours more | Instant (DTCG plugin + visual spec) |
| Format | Proprietary | W3C DTCG v1 standard |

---

## Cost & Model Strategy

Different agents use different models β€” intentionally.

[IMAGE: Cost comparison table]

| Agent | Model | Why This Model | Cost |
|-------|-------|---------------|------|
| Normalizer | None | Math doesn't need AI | $0.00 |
| Color Classifier | None (815 lines) | Deterministic, reproducible | $0.00 |
| Rule Engine | None | Math doesn't need AI | $0.00 |
| AURORA | Qwen 72B | Creative brand reasoning | ~Free (HF PRO) |
| ATLAS | Llama 3.3 70B | 128K context for benchmarks | ~Free (HF PRO) |
| SENTINEL | Qwen 72B | Strict, consistent evaluation | ~Free (HF PRO) |
| NEXUS | Llama 3.3 70B | 128K context for synthesis | ~$0.001 |
| **Total** | | | **~$0.003** |

For designer-scale usage (weekly runs), inference costs are effectively negligible, with HuggingFace PRO ($9/month) covering most models.

The V1-to-V3 journey:
- **V1:** LLM for everything. $0.50-1.00/run. Hallucinated contrast ratios.
- **V2:** Rules + LLM split. $0.003/run. But 3 naming systems fighting.
- **V3:** Rules + Classifier + Advisory LLM. $0.003/run. One naming authority. Clean output.

---

## Graceful Degradation

The system **always produces output**, even when components fail:

| If This Fails... | What Happens |
|-------------------|-------------|
| LLM agents down | Color classifier + rule engine still works (free) |
| Firecrawl unavailable | DOM-only extraction (slightly fewer tokens) |
| Benchmark fetch fails | Hardcoded fallback data from 8 systems |
| NEXUS synthesis fails | `create_fallback_synthesis()` from rule engine |
| AURORA returns garbage | `filter_aurora_naming_map()` strips invalid names |
| **Entire AI layer** | **Full classifier + rule-engine-only report - still useful** |

---

## Tech Stack

[IMAGE: Tech stack diagram with logos]

**AI Agent App:**
- Playwright (browser automation, 8-source extraction)
- Firecrawl (deep CSS parsing)
- Gradio (UI framework)
- Qwen/Qwen2.5-72B-Instruct (AURORA + SENTINEL)
- meta-llama/Llama-3.3-70B-Instruct (ATLAS + NEXUS)
- HuggingFace Spaces (hosting) + HF Inference API
- Docker (containerized deployment)
- 148 tests (82 deterministic + 27 agent evals + 35 live evals + 4 pipeline)

**Figma Integration:**
- Custom Figma Plugin (v7)
- W3C DTCG v1 compliant JSON
- Variables API + Paint/Text/Effect Styles
- Auto-generated Visual Spec pages
- Tokens Studio compatible

---

## What I Learned

### 1. Overusing LLMs Is a Design Failure

If rules can do it faster and cheaper β€” use rules. My WCAG checker is 100% accurate. An LLM's contrast ratio calculation? Maybe 85% accurate, and 100x slower.

The rule engine + color classifier do 90% of the work for $0.

### 2. The Naming Authority Problem Is Real

V2's biggest failure wasn't technical β€” it was organizational. Three systems producing color names with no clear hierarchy. The fix wasn't better AI, it was a clear authority chain: classifier is PRIMARY, LLM is SECONDARY (advisory only), normalizer is FALLBACK.

**Lesson:** When multiple systems touch the same data, establish ONE authority. Don't merge competing outputs.

### 3. Industry Benchmarks Are Gold

Without benchmarks: "Your type scale is inconsistent" -- *PM nods*
With benchmarks: "You're 87% aligned to Shopify Polaris. Closing the gap takes 1 hour and makes your system industry-standard." -- *PM schedules meeting*

Time to build benchmark database: 1 day.
Value: Transforms analysis into prioritized action.

### 4. Semi-Automation > Full Automation

I don't want AI to make all decisions. The workflow has human checkpoints:
- Review AS-IS in Figma before modernizing
- Accept/reject each agent suggestion
- Review TO-BE before using in production

AI as **copilot**, not autopilot.

### 5. Specialized Agents > One Big Prompt

One mega-prompt doing brand analysis + benchmark comparison + accessibility audit + synthesis = confused, unfocused output. Four agents, each with a single responsibility = sharp, reliable analysis.

### 6. W3C Standards Matter

Adopting the DTCG v1 spec (October 2025) means our JSON output works with Tokens Studio, Style Dictionary v4, and any tool that follows the standard. Custom formats create lock-in. Standards create ecosystems.

### 7. Deterministic Classification Beats LLM Classification

AURORA (LLM) named 10 colors per run, inconsistently. The color classifier names ALL colors, every time, with logged evidence. For categorization tasks where you have structured input data (CSS properties, element types, frequency), rules beat LLMs on accuracy, speed, cost, and reproducibility.

---

## A Note on the Tech Stack

**On HuggingFace Spaces:** I'm using HF Spaces as the hosting platform with a Gradio frontend running in Docker. The LLM models (Qwen 72B, Llama 3.3 70B) are called via HuggingFace Inference API. Browser automation (Playwright + Chromium) runs inside the container.

**On the Data:** This system works on **live websites** β€” point it at any URL and it extracts real design tokens from the actual DOM. No synthetic data. The architecture, LLM integrations, and rule engine are production-ready with 148 passing tests.

**On the Standard:** The W3C DTCG specification reached stable v1 in October 2025. Our output includes `$type`, `$value`, `$description`, and `$extensions` with namespaced metadata. Any DTCG-compatible tool can consume it.

---

## Try It Yourself

**AI Agent App:**
- Live Demo: [HuggingFace Space link]
- GitHub: [Repository link]

**Workflow:**
1. Enter website URL -> Extract AS-IS
2. Download DTCG JSON -> Import to Figma
3. Review visual spec -> Run AI analysis
4. Accept suggestions -> Export TO-BE
5. Import to Figma -> Compare visual specs

---

## Closing Thought

AI engineering isn't about fancy models or complex architecture. It's about knowing which problems need AI vs good engineering.

It's **compression** β€” compressing days of manual audit, multiple expert perspectives, and industry benchmarking into something a team can act on Monday morning.

Instead of 3-5 days reviewing DevTools, your team gets:
> "Top 3 issues, ranked by impact, with specific fixes, benchmark alignment, and a Figma-ready visual spec to compare before and after."

That's AI amplifying design systems impact.

Full code on GitHub: [link]

---

## What's Next: Automated Component Generation (Part 2)

The token extraction and analysis story is complete. But design systems aren't just tokens β€” they're **components**.

After exhaustive research into 30+ tools (Tokens Studio, Figr Identity, Figma Make, MCP bridges, story.to.design, and more), I found a genuine market gap:

**No production tool takes DTCG JSON and outputs Figma components with proper variants.**

Every tool either:
- Imports tokens as variables (but doesn't create components)
- Creates components from brand config (but can't consume YOUR tokens)
- Uses AI to write to Figma (but is non-deterministic)
- Needs a full Storybook pipeline as intermediary

So I'm building it. The Figma Plugin API supports everything needed: `createComponent()`, `combineAsVariants()`, `setBoundVariable()`. Our existing plugin already imports tokens and creates variables.

**Coming in Episode 7:**
- Auto-generating Figma components from extracted tokens
- Button (60 variants), TextInput (8), Card, Toast, Checkbox/Radio
- Token-to-component binding: `color.brand.primary` -> Button fill, `radius.md` -> Button corners
- Fully deterministic: same tokens in = same components out

---

*This is Episode 6 of "AI in My Daily Work."*

*If you missed the previous episodes:*
- *Episode 5: Building a 7-Agent UX Friction Analysis System in Databricks*
- *Episode 4: Automating UI Regression Testing with AI Agents (Part-1)*
- *Episode 3: Building a Multi-Agent Review Intelligence System*
- *Episode 2: How I Use a Team of AI Agents to Automate Secondary Research*

*What problems are you automating with AI? Drop a comment β€” I'd love to discuss what you're building.*

---

**About the Author**

I'm Riaz, a UX Design Manager with 10+ years of experience in consumer apps. I combine design thinking with AI engineering to build tools that make design decisions faster and more data-driven.

**Connect:**
- LinkedIn: [link]
- Medium: @designwithriaz
- GitHub: [link]

---

#AIAgents #DesignSystems #UXDesign #Figma #MultiAgentSystems #DesignTokens #Automation #AIEngineering #HuggingFace #WCAG #W3CDTCG

---

*Published on Medium - ~12 min read*