riazmo Claude Opus 4.6 commited on
Commit
6b43e51
·
1 Parent(s): 03b215f

feat: W3C DTCG v1 compliance + single naming authority (v3.2)

Browse files

Fix 3 — DTCG strict compliance:
- _to_dtcg_token() now supports $extensions with namespaced metadata
(com.design-system-extractor: {frequency, confidence, category, evidence})
- Color, radius, shadow exports include rich metadata
- Spec-compliant: $type, $value, $description, $extensions

Fix 4 — Resolve naming authority contradiction:
- Color classifier is PRIMARY naming authority (deterministic)
- AURORA is SECONDARY: can only assign semantic roles
(brand/text/bg/border/feedback), cannot override palette names
- _get_semantic_color_overrides() rewritten with clear authority chain
- filter_aurora_naming_map() added to llm_agents.py
- _generate_color_name_from_hex() deprecated to thin wrapper
- semantic_analyzer.py marked deprecated (absorbed elsewhere)
- CLAUDE.md updated to v3.2 with current status and future roadmap

All 113 tests pass.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Files changed (4) hide show
  1. CLAUDE.md +152 -44
  2. agents/llm_agents.py +28 -0
  3. agents/semantic_analyzer.py +9 -0
  4. app.py +86 -80
CLAUDE.md CHANGED
@@ -1,11 +1,77 @@
1
- # Design System Extractor v3.1 — Project Context
2
 
3
  ## Overview
4
 
5
  A multi-agent system that extracts, analyzes, and recommends improvements for design systems from websites. The system operates in two stages:
6
 
7
- 1. **Stage 1 (Deterministic)**: Extract CSS values → Normalize → Rule Engine analysis → **Rule-Based Color Classification** (free, no LLM)
8
- 2. **Stage 2 (LLM-powered, advisory only)**: Brand insights → Benchmark comparison → Best practices → Final synthesis
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
 
10
  ---
11
 
@@ -30,10 +96,11 @@ CSS Evidence → Category:
30
  everything else → PALETTE (named by hue.shade)
31
  ```
32
 
33
- ### What AURORA Does Now (Advisory Only)
34
- - Does NOT output naming_map
35
  - Provides brand insights, palette strategy, cohesion score
36
- - LLM reasoning is shown in logs but doesn't affect token names
 
 
37
 
38
  ### Files Changed in v3.1
39
  - `core/color_classifier.py` — NEW: Rule-based classifier with dedup, caps, naming conventions
@@ -43,6 +110,24 @@ CSS Evidence → Category:
43
 
44
  ---
45
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
46
  ## PREVIOUS STATUS (v3.0 and earlier): BROKEN — RETHINK COMPLETED
47
 
48
  ### What's Wrong (observed from real site tests)
@@ -1008,54 +1093,77 @@ NormalizedTokens:
1008
 
1009
  ---
1010
 
1011
- ## REVISED EXECUTION ORDER (Stage 1 fixes interleaved, not deferred)
1012
-
1013
- The original plan was "fix Stage 2 first, Stage 1 later." But the audit reveals:
1014
- **If normalizer sends word-based shade names to AURORA, AURORA's ReAct naming will STILL conflict with normalizer names in the export merge.**
1015
-
1016
- The pre-processing layer (Step 2 in the old plan) was supposed to fix this. But that's a bandaid — it re-normalizes what the normalizer already normalized. It's cleaner to fix the normalizer itself so it produces correct output from the start.
1017
 
1018
- ### New Execution Order:
1019
 
1020
  ```
1021
- PHASE 1: FIX NORMALIZER (makes Stage 1 output clean)
1022
- 1a. Unify color naming → numeric shades only
1023
- 1b. Add radius normalization (parse, deduplicate, sort, name)
1024
- 1c. Add shadow normalization (parse, sort by blur, name)
1025
- 1d. Feed semantic_analyzer role hints into normalizer
1026
-
1027
- PHASE 2: FIX STAGE 2 (agents can now trust their input)
1028
- 2a. Consolidate two Stage 2 systems into one
1029
- 2b. Rewrite AURORA with ReAct + critic (names ALL colors, not 10)
1030
- 2c. Rewrite SENTINEL with grounded scoring + critic
1031
- 2d. Rewrite NEXUS with ToT
1032
- 2e. Add post-validation layer
1033
-
1034
- PHASE 3: FIX EXPORT (single naming authority)
1035
- 3a. AURORA naming_map is THE authority (not 3-way merge)
1036
- 3b. Radius/shadow export uses normalizer output directly
1037
- 3c. Validation before JSON write
1038
-
1039
- PHASE 4: FIX EXTRACTION (nice-to-have, not blocking)
1040
- 4a. Font family detection improvement
1041
- 4b. Rule engine: radius grid analysis
1042
- 4c. Rule engine: shadow elevation analysis
 
1043
  ```
1044
 
1045
- ### Why this order is better:
1046
 
1047
- 1. **Phase 1 first** because AURORA can't name colors well if the input names are garbage. The ReAct prompt says "observe your naming" but if the LLM sees `color.blue.light` in its input AND is asked to output `color.blue.300`, it gets confused.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1048
 
1049
- 2. **Phase 2 after Phase 1** because now the LLM agents receive clean, consistently-named input. AURORA's job becomes "confirm or improve these names" rather than "fix the mess from normalizer."
1050
 
1051
- 3. **Phase 3 after Phase 2** because the export layer just needs to respect one naming authority (AURORA), not reconcile three.
1052
 
1053
- 4. **Phase 4 last** because font family and enhanced rule engine analysis are improvements, not blockers.
 
 
 
 
 
 
1054
 
1055
- ### Deploy Plan:
1056
- - **Deploy 1**: After Phase 1 (normalizer fixes) — even without Stage 2 improvements, the export will be cleaner
1057
- - **Deploy 2**: After Phase 2 + 3 (full Stage 2 rework + export) the big quality jump
1058
- - **Deploy 3**: After Phase 4 (font family, enhanced analysis) — polish
 
1059
 
1060
  ---
1061
 
 
1
+ # Design System Extractor v3.2 — Project Context
2
 
3
  ## Overview
4
 
5
  A multi-agent system that extracts, analyzes, and recommends improvements for design systems from websites. The system operates in two stages:
6
 
7
+ 1. **Stage 1 (Deterministic)**: Extract CSS values → Normalize (colors, radius, shadows, typography, spacing) → Rule Engine analysis → **Rule-Based Color Classification** (free, no LLM)
8
+ 2. **Stage 2 (LLM-powered)**: Brand identification (AURORA) → Benchmark comparison (ATLAS) → Best practices (SENTINEL) Synthesis (NEXUS)
9
+ 3. **Export**: W3C DTCG v1 compliant JSON → Figma Plugin (visual spec + styles/variables)
10
+
11
+ ---
12
+
13
+ ## CURRENT STATUS: v3.2 (Feb 2026)
14
+
15
+ ### What's Working
16
+
17
+ | Component | Status | Notes |
18
+ |-----------|--------|-------|
19
+ | CSS Extraction (Playwright) | ✅ Working | Desktop + mobile viewports |
20
+ | Color normalization | ✅ Working | Single numeric shade system (50-900) |
21
+ | Color classification | ✅ Working | `core/color_classifier.py` (815 lines, 100% deterministic) |
22
+ | Radius normalization | ✅ Working | Parse, deduplicate, sort, name (none/sm/md/lg/xl/2xl/full) |
23
+ | Shadow normalization | ✅ Working | Parse, sort by blur, deduplicate, name (xs/sm/md/lg/xl) |
24
+ | Typography normalization | ✅ Working | Desktop/mobile split, weight suffix |
25
+ | Spacing normalization | ✅ Working | GCD-based grid detection, base-8 alignment |
26
+ | Rule engine | ✅ Working | Type scale, WCAG AA, spacing grid, color statistics |
27
+ | LLM agents (ReAct) | ✅ Working | AURORA, ATLAS, SENTINEL, NEXUS with critic/retry |
28
+ | W3C DTCG export | ✅ Working | $value, $type, $description, $extensions |
29
+ | Figma plugin - visual spec | ✅ Working | Separate frames, AA badges, horizontal layout |
30
+ | Figma plugin - styles/variables | ✅ Working | Paint, text, effect styles + variable collections |
31
+ | Shadow interpolation | ✅ Working | Always produces 5 levels (xs→xl), interpolates if fewer extracted |
32
+
33
+ ### Architecture Decisions (v3.2)
34
+
35
+ #### Naming Authority Chain (RESOLVED)
36
+ The three-naming-system conflict from v2/v3.0 is resolved:
37
+
38
+ ```
39
+ 1. Color Classifier (PRIMARY) — deterministic, covers ALL colors
40
+ └── Rule-based: CSS evidence → category → token name
41
+ └── 100% reproducible, logged with evidence
42
+
43
+ 2. AURORA LLM (SECONDARY) — semantic role enhancer ONLY
44
+ └── Can promote "color.blue.500" → "color.brand.primary"
45
+ └── CANNOT rename palette colors
46
+ └── Only brand/text/bg/border/feedback roles accepted
47
+ └── filter_aurora_naming_map() enforces this boundary
48
+
49
+ 3. Normalizer (FALLBACK) — preliminary hue+shade names
50
+ └── Only used if classifier hasn't run yet
51
+ └── _generate_preliminary_name() → "color.blue.500"
52
+ ```
53
+
54
+ **app.py `_get_semantic_color_overrides()`** implements this chain:
55
+ - PRIMARY: `state.color_classification.colors` (from color_classifier)
56
+ - SECONDARY: `state.brand_result.naming_map` (from AURORA, filtered to semantic roles only)
57
+
58
+ **`_generate_color_name_from_hex()`** is DEPRECATED — kept as thin wrapper for edge cases.
59
+
60
+ #### W3C DTCG v1 Compliance (2025.10 Spec)
61
+ - `$type` values: `color`, `dimension`, `typography`, `shadow`
62
+ - `$value` for all token values
63
+ - `$description` for human-readable descriptions
64
+ - `$extensions` with namespaced metadata: `com.design-system-extractor`
65
+ - Colors: `{frequency, confidence, category, evidence}`
66
+ - Radius: `{frequency, fitsBase4, fitsBase8}`
67
+ - Shadows: `{frequency, rawCSS, blurPx}`
68
+ - Nested structure (not flat)
69
+ - `_flat_key_to_nested()` prevents nesting inside DTCG leaf nodes
70
+
71
+ #### Deprecated Components
72
+ - `agents/semantic_analyzer.py` — superseded by color_classifier + normalizer._infer_role_hint()
73
+ - `agents/stage2_graph.py` — old LangGraph parallel system, replaced by direct async in app.py
74
+ - `app.py _generate_color_name_from_hex()` — third naming system, now thin wrapper
75
 
76
  ---
77
 
 
96
  everything else → PALETTE (named by hue.shade)
97
  ```
98
 
99
+ ### What AURORA Does Now
 
100
  - Provides brand insights, palette strategy, cohesion score
101
+ - naming_map is filtered to semantic roles only (brand/text/bg/border/feedback)
102
+ - LLM reasoning is shown in logs
103
+ - `filter_aurora_naming_map()` in llm_agents.py enforces the boundary
104
 
105
  ### Files Changed in v3.1
106
  - `core/color_classifier.py` — NEW: Rule-based classifier with dedup, caps, naming conventions
 
110
 
111
  ---
112
 
113
+ ## v3.2 FIX: DTCG COMPLIANCE + NAMING AUTHORITY (Feb 2026)
114
+
115
+ ### What Changed
116
+ 1. **W3C DTCG v1 strict compliance**: `_to_dtcg_token()` now supports `$extensions` with namespaced metadata
117
+ 2. **Single naming authority resolved**: Color classifier is PRIMARY, AURORA is SECONDARY (semantic roles only)
118
+ 3. **`_get_semantic_color_overrides()` rewritten**: Uses classifier as primary, AURORA filtered to role-only names
119
+ 4. **`filter_aurora_naming_map()` added**: In llm_agents.py, strips non-semantic names from AURORA output
120
+ 5. **`_generate_color_name_from_hex()` deprecated**: Thin wrapper using `categorize_color()` from color_utils
121
+ 6. **`semantic_analyzer.py` deprecated**: Marked with deprecation notice, functionality absorbed elsewhere
122
+
123
+ ### Files Changed in v3.2
124
+ - `app.py` — DTCG helpers enhanced, `_get_semantic_color_overrides()` rewritten, hex-name function deprecated
125
+ - `agents/llm_agents.py` — Added `filter_aurora_naming_map()` function
126
+ - `agents/semantic_analyzer.py` — Deprecated with notice
127
+ - `CLAUDE.md` — Updated to current status
128
+
129
+ ---
130
+
131
  ## PREVIOUS STATUS (v3.0 and earlier): BROKEN — RETHINK COMPLETED
132
 
133
  ### What's Wrong (observed from real site tests)
 
1093
 
1094
  ---
1095
 
1096
+ ## EXECUTION STATUS (Updated Feb 2026)
 
 
 
 
 
1097
 
1098
+ ### Phases 1-3: COMPLETED
1099
 
1100
  ```
1101
+ PHASE 1: FIX NORMALIZER DONE
1102
+ 1a. Unify color naming → numeric shades only (_generate_preliminary_name)
1103
+ 1b. Add radius normalization (parse, deduplicate, sort, name) — normalizer.py:626-778
1104
+ 1c. Add shadow normalization (parse, sort by blur, name) — normalizer.py:784-940
1105
+ 1d. Feed role hints into normalizer — normalizer._infer_role_hint()
1106
+
1107
+ PHASE 2: FIX STAGE 2 DONE
1108
+ 2a. Consolidated llm_agents.py is primary, stage2_graph.py deprecated
1109
+ 2b. AURORA with ReAct + critic + retry llm_agents.py:420-470
1110
+ 2c. SENTINEL with grounded scoring + cross-reference critic
1111
+ 2d. NEXUS with ToT (two-perspective evaluation)
1112
+ 2e. Post-validation layer — post_validate_stage2()
1113
+
1114
+ PHASE 3: FIX EXPORT DONE (v3.2)
1115
+ 3a. Color classifier = PRIMARY authority, AURORA = semantic roles only
1116
+ 3b. Radius/shadow export uses normalizer output directly
1117
+ 3c. W3C DTCG v1 compliance with $extensions metadata
1118
+ 3d. ✅ filter_aurora_naming_map() enforces role-only boundary
1119
+
1120
+ PHASE 4: EXTRACTION IMPROVEMENTS (NOT STARTED)
1121
+ 4a. Font family detection — still returns "sans-serif" fallback
1122
+ 4b. Rule engine: radius grid analysis
1123
+ 4c. ❌ Rule engine: shadow elevation analysis
1124
  ```
1125
 
1126
+ ### PHASE 5: COMPONENT GENERATION (FUTURE — NOT STARTED)
1127
 
1128
+ Based on strategic research (Feb 2026), the next major feature is automated component generation in Figma:
1129
+
1130
+ ```
1131
+ PHASE 5: FIGMA COMPONENT GENERATION
1132
+ 5a. Component Definition Schema (JSON defining anatomy + token bindings + variants)
1133
+ 5b. Token-to-Component binding engine
1134
+ 5c. Figma Plugin: createComponent() + combineAsVariants() + setBoundVariable()
1135
+ 5d. MVP Components: Button (60 variants), TextInput (8), Card (2), Toast (4), Checkbox+Radio (12)
1136
+ 5e. Variable Collections: Primitives, Semantic, Spacing, Radius, Typography
1137
+
1138
+ PHASE 6: ECOSYSTEM INTEGRATION
1139
+ 6a. Style Dictionary v4 compatible output (50+ platform formats for free)
1140
+ 6b. Tokens Studio compatible JSON import
1141
+ 6c. Dembrandt JSON as alternative input source
1142
+ 6d. CI/CD GitHub Action for design system regression checks
1143
+
1144
+ PHASE 7: MCP INTEGRATION
1145
+ 7a. Expose extractor as MCP tool server
1146
+ 7b. Claude Desktop: "Extract design system from example.com"
1147
+ 7c. Community Figma MCP bridge for push-to-Figma
1148
+ ```
1149
 
1150
+ ### Strategic Positioning
1151
 
1152
+ **"Lighthouse for Design Systems"** We are NOT a token management platform (Tokens Studio), NOT a documentation platform (Zeroheight), NOT an extraction tool (Dembrandt). We are the **automated audit + bootstrap tool** that sits upstream of all of those.
1153
 
1154
+ **Unique differentiators no competitor has:**
1155
+ - Type scale ratio detection + standard scale matching
1156
+ - Spacing grid detection (GCD-based, base-8 alignment scoring)
1157
+ - LLM brand identification from CSS usage patterns
1158
+ - Holistic design system quality score (0-100)
1159
+ - Visual spec page auto-generated in Figma
1160
+ - Benchmark comparison against established design systems
1161
 
1162
+ **Key competitors to watch:**
1163
+ - Dembrandt (1,300★) — does extraction better, but no analysis
1164
+ - Tokens Studio (264K users) does Figma management better, but no extraction
1165
+ - Knapsack ($10M funding) building ingestion engine, biggest strategic threat
1166
+ - html.to.design — captures layouts but not tokens/variables
1167
 
1168
  ---
1169
 
agents/llm_agents.py CHANGED
@@ -1214,6 +1214,34 @@ def _apply_sentinel_fixes(result: BestPracticesResult, rule_engine, errors: list
1214
  return result
1215
 
1216
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1217
  def post_validate_stage2(
1218
  aurora: BrandIdentification,
1219
  sentinel: BestPracticesResult,
 
1214
  return result
1215
 
1216
 
1217
+ def filter_aurora_naming_map(aurora: BrandIdentification) -> dict:
1218
+ """Filter AURORA naming_map to only keep semantic role assignments.
1219
+
1220
+ AURORA is a secondary naming authority — it can assign semantic roles
1221
+ (brand.primary, text.secondary, bg.primary, feedback.error, etc.)
1222
+ but cannot override palette names (blue.500, neutral.700, etc.).
1223
+
1224
+ The color_classifier is the primary naming authority.
1225
+
1226
+ Returns:
1227
+ Dict of hex -> semantic_name (only role-based names).
1228
+ """
1229
+ SEMANTIC_PREFIXES = ('brand.', 'text.', 'bg.', 'border.', 'feedback.')
1230
+ filtered = {}
1231
+
1232
+ for hex_val, name in (aurora.naming_map or {}).items():
1233
+ hex_clean = str(hex_val).strip().lower()
1234
+ if not hex_clean.startswith('#') or not name:
1235
+ continue
1236
+ clean_name = name if name.startswith('color.') else f'color.{name}'
1237
+ # Extract the part after "color."
1238
+ after_prefix = clean_name[6:] # "brand.primary", "blue.500", etc.
1239
+ if any(after_prefix.startswith(sp) for sp in SEMANTIC_PREFIXES):
1240
+ filtered[hex_clean] = clean_name
1241
+
1242
+ return filtered
1243
+
1244
+
1245
  def post_validate_stage2(
1246
  aurora: BrandIdentification,
1247
  sentinel: BestPracticesResult,
agents/semantic_analyzer.py CHANGED
@@ -2,6 +2,15 @@
2
  Agent 1C: Semantic Color Analyzer
3
  Design System Extractor v2
4
 
 
 
 
 
 
 
 
 
 
5
  Persona: Design System Semanticist
6
 
7
  Responsibilities:
 
2
  Agent 1C: Semantic Color Analyzer
3
  Design System Extractor v2
4
 
5
+ ⚠️ DEPRECATED in v3.2 — Superseded by:
6
+ - core/color_classifier.py (rule-based, primary naming authority)
7
+ - agents/normalizer.py._infer_role_hint() (role hints for AURORA)
8
+ - AURORA agent in llm_agents.py (semantic role enhancement only)
9
+
10
+ This module is kept for backward compatibility but should not be called
11
+ in the main pipeline. Its heuristics have been absorbed into
12
+ normalizer._infer_role_hint() and color_classifier.classify_colors().
13
+
14
  Persona: Design System Semanticist
15
 
16
  Responsibilities:
app.py CHANGED
@@ -2947,20 +2947,29 @@ def _flat_key_to_nested(flat_key: str, value: dict, root: dict):
2947
  current[parts[-1]] = value
2948
 
2949
 
2950
- def _to_dtcg_token(value, token_type: str, description: str = None, source: str = None) -> dict:
2951
- """Wrap value in W3C DTCG format with $value, $type, $description.
 
 
 
2952
 
2953
  Args:
2954
  value: The token value
2955
- token_type: W3C DTCG type (color, typography, dimension, shadow)
 
 
 
2956
  description: Optional human-readable description
2957
  source: Optional source indicator (extracted, recommended, semantic)
 
2958
  """
2959
  token = {"$type": token_type, "$value": value}
2960
- if description:
 
 
2961
  token["$description"] = description
2962
- if source:
2963
- token["$description"] = f"[{source}] {description or ''}"
2964
  return token
2965
 
2966
 
@@ -2982,21 +2991,42 @@ def _shadow_to_dtcg(shadow_dict: dict) -> dict:
2982
  def _get_semantic_color_overrides() -> dict:
2983
  """Build color hex->semantic name map.
2984
 
2985
- v3: AURORA naming_map is the SINGLE naming authority.
2986
- Falls back to normalizer suggested_name, then _generate_color_name_from_hex.
 
 
 
 
 
 
2987
  """
2988
  overrides = {} # hex -> semantic_name
2989
 
2990
- # PRIMARY: AURORA's naming_map (covers ALL colors if critic passed)
 
 
 
 
 
 
 
 
 
 
 
2991
  brand_result = getattr(state, 'brand_result', None)
2992
  if brand_result:
2993
  naming_map = getattr(brand_result, 'naming_map', None)
2994
  if isinstance(naming_map, dict) and naming_map:
2995
  for hex_val, name in naming_map.items():
2996
  hex_clean = str(hex_val).strip().lower()
2997
- if hex_clean.startswith('#') and name:
2998
- # Ensure color. prefix
2999
- clean_name = name if name.startswith('color.') else f'color.{name}'
 
 
 
 
3000
  overrides[hex_clean] = clean_name
3001
 
3002
  return overrides
@@ -3013,90 +3043,48 @@ def _is_valid_hex_color(value: str) -> bool:
3013
 
3014
 
3015
  def _generate_color_name_from_hex(hex_val: str, used_names: set = None) -> str:
3016
- """Generate a semantic color name based on the color's HSL characteristics.
3017
 
3018
- Returns names like: color.neutral.400, color.blue.500, color.red.300
3019
- Uses standard design system naming conventions.
3020
  """
 
3021
  import colorsys
3022
 
3023
  used_names = used_names or set()
3024
 
3025
- # Parse hex to RGB
3026
  hex_clean = hex_val.lstrip('#').lower()
3027
  if len(hex_clean) == 3:
3028
- hex_clean = ''.join([c*2 for c in hex_clean])
3029
 
3030
  try:
3031
  r = int(hex_clean[0:2], 16) / 255
3032
  g = int(hex_clean[2:4], 16) / 255
3033
  b = int(hex_clean[4:6], 16) / 255
3034
  except (ValueError, IndexError):
3035
- return "color.other.base"
3036
 
3037
- # Convert to HSL
3038
  h, l, s = colorsys.rgb_to_hls(r, g, b)
3039
- hue = h * 360
3040
- saturation = s
3041
- lightness = l
3042
-
3043
- # Determine color family based on hue (for saturated colors)
3044
- if saturation < 0.1:
3045
- # Grayscale / neutral
3046
- color_family = "neutral"
3047
- else:
3048
- # Map hue to color name
3049
- if hue < 15 or hue >= 345:
3050
- color_family = "red"
3051
- elif hue < 45:
3052
- color_family = "orange"
3053
- elif hue < 75:
3054
- color_family = "yellow"
3055
- elif hue < 150:
3056
- color_family = "green"
3057
- elif hue < 195:
3058
- color_family = "teal"
3059
- elif hue < 255:
3060
- color_family = "blue"
3061
- elif hue < 285:
3062
- color_family = "purple"
3063
- elif hue < 345:
3064
- color_family = "pink"
3065
- else:
3066
- color_family = "red"
3067
-
3068
- # Determine shade based on lightness (100-900 scale)
3069
- if lightness >= 0.95:
3070
- shade = "50"
3071
- elif lightness >= 0.85:
3072
- shade = "100"
3073
- elif lightness >= 0.75:
3074
- shade = "200"
3075
- elif lightness >= 0.65:
3076
- shade = "300"
3077
- elif lightness >= 0.50:
3078
- shade = "400"
3079
- elif lightness >= 0.40:
3080
- shade = "500"
3081
- elif lightness >= 0.30:
3082
- shade = "600"
3083
- elif lightness >= 0.20:
3084
- shade = "700"
3085
- elif lightness >= 0.10:
3086
- shade = "800"
3087
- else:
3088
- shade = "900"
3089
 
3090
- # Generate base name
3091
  base_name = f"color.{color_family}.{shade}"
3092
-
3093
- # Handle conflicts by adding suffix
3094
  final_name = base_name
3095
  suffix = 1
3096
  while final_name in used_names:
3097
  suffix += 1
3098
  final_name = f"{base_name}_{suffix}"
3099
-
3100
  return final_name
3101
 
3102
 
@@ -3210,7 +3198,14 @@ def export_stage1_json(convention: str = "semantic"):
3210
  log_callback=state.log,
3211
  )
3212
  for c in classification.colors:
3213
- dtcg_token = _to_dtcg_token(c.hex, "color", description=f"Rule-based: {c.category}")
 
 
 
 
 
 
 
3214
  _flat_key_to_nested(c.token_name, dtcg_token, result)
3215
  token_count += 1
3216
 
@@ -3285,7 +3280,7 @@ def export_stage1_json(convention: str = "semantic"):
3285
  token_count += 1
3286
 
3287
  # =========================================================================
3288
- # BORDER RADIUS — Nested structure (DTCG uses "dimension" type for radii)
3289
  # =========================================================================
3290
  if state.desktop_normalized and state.desktop_normalized.radius:
3291
  seen_radius = {}
@@ -3294,7 +3289,14 @@ def export_stage1_json(convention: str = "semantic"):
3294
  if token_name is None:
3295
  continue # Duplicate radius — skip
3296
  flat_key = token_name
3297
- dtcg_token = _to_dtcg_token(r.value, "dimension", description="Extracted from site")
 
 
 
 
 
 
 
3298
  _flat_key_to_nested(flat_key, dtcg_token, result)
3299
  token_count += 1
3300
 
@@ -3302,18 +3304,22 @@ def export_stage1_json(convention: str = "semantic"):
3302
  # SHADOWS — W3C DTCG shadow format
3303
  # =========================================================================
3304
  if state.desktop_normalized and state.desktop_normalized.shadows:
3305
- shadow_names = ["xs", "sm", "md", "lg", "xl", "2xl"]
3306
  sorted_shadows = sorted(
3307
  state.desktop_normalized.shadows.items(),
3308
  key=lambda x: _get_shadow_blur(x[1].value),
3309
  )
3310
  for i, (name, s) in enumerate(sorted_shadows):
3311
- size_name = shadow_names[i] if i < len(shadow_names) else str(i + 1)
3312
  flat_key = f"shadow.{size_name}"
3313
- # Parse CSS shadow and convert to DTCG format
3314
  parsed = _parse_shadow_to_tokens_studio(s.value)
3315
  dtcg_shadow_value = _shadow_to_dtcg(parsed)
3316
- dtcg_token = _to_dtcg_token(dtcg_shadow_value, "shadow", description="Extracted from site")
 
 
 
 
 
3317
  _flat_key_to_nested(flat_key, dtcg_token, result)
3318
  token_count += 1
3319
 
 
2947
  current[parts[-1]] = value
2948
 
2949
 
2950
+ def _to_dtcg_token(value, token_type: str, description: str = None,
2951
+ source: str = None, extensions: dict = None) -> dict:
2952
+ """Wrap value in W3C DTCG v1 (2025.10) format.
2953
+
2954
+ Spec: https://www.designtokens.org/tr/drafts/format/
2955
 
2956
  Args:
2957
  value: The token value
2958
+ token_type: W3C DTCG type must be one of:
2959
+ color, dimension, fontFamily, fontWeight, number,
2960
+ duration, cubicBezier, shadow, strokeStyle, border,
2961
+ transition, gradient, typography
2962
  description: Optional human-readable description
2963
  source: Optional source indicator (extracted, recommended, semantic)
2964
+ extensions: Optional dict for $extensions (custom metadata like frequency, confidence)
2965
  """
2966
  token = {"$type": token_type, "$value": value}
2967
+ if description and source:
2968
+ token["$description"] = f"[{source}] {description}"
2969
+ elif description:
2970
  token["$description"] = description
2971
+ if extensions:
2972
+ token["$extensions"] = {"com.design-system-extractor": extensions}
2973
  return token
2974
 
2975
 
 
2991
  def _get_semantic_color_overrides() -> dict:
2992
  """Build color hex->semantic name map.
2993
 
2994
+ v3.2: Color classifier is the PRIMARY naming authority (deterministic, reproducible).
2995
+ AURORA is a SECONDARY enhancer it can only ADD semantic role names
2996
+ (brand.primary, text.secondary, etc.) but cannot override palette names.
2997
+
2998
+ Authority chain:
2999
+ 1. Color classifier (rule-based, covers ALL colors)
3000
+ 2. AURORA naming_map (LLM, only brand/text/bg/border/feedback roles accepted)
3001
+ 3. Normalizer suggested_name (fallback)
3002
  """
3003
  overrides = {} # hex -> semantic_name
3004
 
3005
+ # PRIMARY: Color classifier (deterministic, covers ALL colors)
3006
+ classified = getattr(state, 'color_classification', None)
3007
+ if classified and hasattr(classified, 'colors'):
3008
+ for cc in classified.colors:
3009
+ hex_clean = cc.hex.strip().lower()
3010
+ if hex_clean.startswith('#') and cc.token_name:
3011
+ overrides[hex_clean] = cc.token_name
3012
+
3013
+ # SECONDARY: AURORA naming_map — ONLY accept semantic role upgrades
3014
+ # AURORA can promote "color.blue.500" to "color.brand.primary"
3015
+ # but cannot rename palette colors to different palette names
3016
+ _SEMANTIC_ROLES = {'brand.', 'text.', 'bg.', 'border.', 'feedback.'}
3017
  brand_result = getattr(state, 'brand_result', None)
3018
  if brand_result:
3019
  naming_map = getattr(brand_result, 'naming_map', None)
3020
  if isinstance(naming_map, dict) and naming_map:
3021
  for hex_val, name in naming_map.items():
3022
  hex_clean = str(hex_val).strip().lower()
3023
+ if not hex_clean.startswith('#') or not name:
3024
+ continue
3025
+ clean_name = name if name.startswith('color.') else f'color.{name}'
3026
+ # Only accept semantic role names from AURORA
3027
+ name_after_color = clean_name[6:] # strip "color."
3028
+ is_semantic_role = any(name_after_color.startswith(r) for r in _SEMANTIC_ROLES)
3029
+ if is_semantic_role:
3030
  overrides[hex_clean] = clean_name
3031
 
3032
  return overrides
 
3043
 
3044
 
3045
  def _generate_color_name_from_hex(hex_val: str, used_names: set = None) -> str:
3046
+ """DEPRECATED: Use normalizer._generate_preliminary_name() instead.
3047
 
3048
+ Kept as thin wrapper for backward compatibility.
3049
+ Delegates to normalizer's naming logic via color_utils.categorize_color().
3050
  """
3051
+ from core.color_utils import categorize_color, parse_color
3052
  import colorsys
3053
 
3054
  used_names = used_names or set()
3055
 
 
3056
  hex_clean = hex_val.lstrip('#').lower()
3057
  if len(hex_clean) == 3:
3058
+ hex_clean = ''.join([c * 2 for c in hex_clean])
3059
 
3060
  try:
3061
  r = int(hex_clean[0:2], 16) / 255
3062
  g = int(hex_clean[2:4], 16) / 255
3063
  b = int(hex_clean[4:6], 16) / 255
3064
  except (ValueError, IndexError):
3065
+ return "color.other.500"
3066
 
 
3067
  h, l, s = colorsys.rgb_to_hls(r, g, b)
3068
+ color_family = categorize_color(hex_val) or "neutral"
3069
+
3070
+ # Numeric shade from lightness (matches normalizer._generate_preliminary_name)
3071
+ if l >= 0.95: shade = "50"
3072
+ elif l >= 0.85: shade = "100"
3073
+ elif l >= 0.75: shade = "200"
3074
+ elif l >= 0.65: shade = "300"
3075
+ elif l >= 0.50: shade = "400"
3076
+ elif l >= 0.40: shade = "500"
3077
+ elif l >= 0.30: shade = "600"
3078
+ elif l >= 0.20: shade = "700"
3079
+ elif l >= 0.10: shade = "800"
3080
+ else: shade = "900"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3081
 
 
3082
  base_name = f"color.{color_family}.{shade}"
 
 
3083
  final_name = base_name
3084
  suffix = 1
3085
  while final_name in used_names:
3086
  suffix += 1
3087
  final_name = f"{base_name}_{suffix}"
 
3088
  return final_name
3089
 
3090
 
 
3198
  log_callback=state.log,
3199
  )
3200
  for c in classification.colors:
3201
+ ext = {"frequency": c.frequency, "confidence": c.confidence, "category": c.category}
3202
+ if c.evidence:
3203
+ ext["evidence"] = c.evidence[:3] # Top 3 evidence items
3204
+ dtcg_token = _to_dtcg_token(
3205
+ c.hex, "color",
3206
+ description=f"{c.category}: {c.role}",
3207
+ extensions=ext,
3208
+ )
3209
  _flat_key_to_nested(c.token_name, dtcg_token, result)
3210
  token_count += 1
3211
 
 
3280
  token_count += 1
3281
 
3282
  # =========================================================================
3283
+ # BORDER RADIUS — W3C DTCG "dimension" type
3284
  # =========================================================================
3285
  if state.desktop_normalized and state.desktop_normalized.radius:
3286
  seen_radius = {}
 
3289
  if token_name is None:
3290
  continue # Duplicate radius — skip
3291
  flat_key = token_name
3292
+ ext = {"frequency": r.frequency}
3293
+ if hasattr(r, 'fits_base_4') and r.fits_base_4 is not None:
3294
+ ext["fitsBase4"] = r.fits_base_4
3295
+ if hasattr(r, 'fits_base_8') and r.fits_base_8 is not None:
3296
+ ext["fitsBase8"] = r.fits_base_8
3297
+ dtcg_token = _to_dtcg_token(r.value, "dimension",
3298
+ description=f"Border radius ({name})",
3299
+ extensions=ext)
3300
  _flat_key_to_nested(flat_key, dtcg_token, result)
3301
  token_count += 1
3302
 
 
3304
  # SHADOWS — W3C DTCG shadow format
3305
  # =========================================================================
3306
  if state.desktop_normalized and state.desktop_normalized.shadows:
3307
+ shadow_tier_names = ["xs", "sm", "md", "lg", "xl", "2xl"]
3308
  sorted_shadows = sorted(
3309
  state.desktop_normalized.shadows.items(),
3310
  key=lambda x: _get_shadow_blur(x[1].value),
3311
  )
3312
  for i, (name, s) in enumerate(sorted_shadows):
3313
+ size_name = shadow_tier_names[i] if i < len(shadow_tier_names) else str(i + 1)
3314
  flat_key = f"shadow.{size_name}"
 
3315
  parsed = _parse_shadow_to_tokens_studio(s.value)
3316
  dtcg_shadow_value = _shadow_to_dtcg(parsed)
3317
+ ext = {"frequency": s.frequency, "rawCSS": s.value}
3318
+ if hasattr(s, 'blur_px') and s.blur_px is not None:
3319
+ ext["blurPx"] = s.blur_px
3320
+ dtcg_token = _to_dtcg_token(dtcg_shadow_value, "shadow",
3321
+ description=f"Elevation {size_name}",
3322
+ extensions=ext)
3323
  _flat_key_to_nested(flat_key, dtcg_token, result)
3324
  token_count += 1
3325