riazmo commited on
Commit
439d3c4
Β·
verified Β·
1 Parent(s): bd004b0

Upload PROJECT_CONTEXT.md

Browse files
Files changed (1) hide show
  1. PROJECT_CONTEXT.md +170 -0
PROJECT_CONTEXT.md ADDED
@@ -0,0 +1,170 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Design System Extractor v2 β€” Project Context
2
+
3
+ ## Architecture Overview
4
+
5
+ ```
6
+ Stage 0: Configuration Stage 1: Discovery & Extraction Stage 2: AI Analysis Stage 3: Export
7
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
8
+ β”‚ HF Token Setup β”‚ ──────> β”‚ URL Discovery (sitemap/ β”‚ ──────> β”‚ Layer 1: Rule Engine β”‚ ──> β”‚ Figma Tokens β”‚
9
+ β”‚ Benchmark Select β”‚ β”‚ crawl) + Token Extraction β”‚ β”‚ Layer 2: Benchmarks β”‚ β”‚ JSON Export β”‚
10
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ (Desktop + Mobile CSS) β”‚ β”‚ Layer 3: LLM Agents (x3) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
11
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ Layer 4: HEAD Synthesizerβ”‚
12
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
13
+ ```
14
+
15
+ ### Stage 1: Discovery & Extraction (Rule-Based, Free)
16
+ - **Discover Pages**: Fetches sitemap.xml or crawls site to find pages
17
+ - **Extract Tokens**: Playwright visits each page at 2 viewports (Desktop 1440px, Mobile 375px), extracts computed CSS for colors, typography, spacing, radius, shadows
18
+ - **User Review**: Interactive tables with Accept/Reject checkboxes + visual previews
19
+
20
+ ### Stage 2: AI-Powered Analysis (4 Layers)
21
+
22
+ | Layer | Type | What It Does | Cost |
23
+ |-------|------|--------------|------|
24
+ | **Layer 1** | Rule Engine | Type scale detection, AA contrast checking, spacing grid analysis, color statistics | FREE |
25
+ | **Layer 2** | Benchmark Research | Compare against Material Design 3, Apple HIG, Tailwind, etc. | ~$0.001 |
26
+ | **Layer 3** | LLM Agents (x3) | AURORA (Brand ID) + ATLAS (Benchmark) + SENTINEL (Best Practices) | ~$0.002 |
27
+ | **Layer 4** | HEAD Synthesizer | NEXUS combines all outputs into final recommendations | ~$0.001 |
28
+
29
+ ### Stage 3: Export
30
+ - Apply/reject individual color, typography, spacing recommendations
31
+ - Export Figma Tokens Studio-compatible JSON
32
+
33
+ ---
34
+
35
+ ## Agent Roster
36
+
37
+ | Agent | Codename | Model | Temp | Input | Output | Specialty |
38
+ |-------|----------|-------|------|-------|--------|-----------|
39
+ | Brand Identifier | **AURORA** | Qwen/Qwen2.5-72B-Instruct | 0.4 | Color tokens + semantic CSS analysis | Brand primary/secondary/accent, palette strategy, cohesion score, semantic names | Creative/visual reasoning, color harmony assessment |
40
+ | Benchmark Advisor | **ATLAS** | meta-llama/Llama-3.3-70B-Instruct | 0.25 | User's type scale, spacing, font sizes + benchmark comparison data | Recommended benchmark, alignment changes, pros/cons | 128K context for large benchmark data, comparative reasoning |
41
+ | Best Practices Validator | **SENTINEL** | Qwen/Qwen2.5-72B-Instruct | 0.2 | Rule Engine results (typography, accessibility, spacing, color stats) | Overall score (0-100), check results, prioritized fix list | Methodical rule-following, precise judgment |
42
+ | HEAD Synthesizer | **NEXUS** | meta-llama/Llama-3.3-70B-Instruct | 0.3 | All 3 agent outputs + Rule Engine facts | Executive summary, scores, top 3 actions, color/type/spacing recs | 128K context for combined inputs, synthesis capability |
43
+
44
+ ### Why These Models
45
+
46
+ - **Qwen 72B** (AURORA, SENTINEL): Strong creative reasoning for brand analysis; methodical structured output for best practices. Available on HF serverless without gated access.
47
+ - **Llama 3.3 70B** (ATLAS, NEXUS): 128K context window handles large combined inputs from multiple agents. Excellent comparative and synthesis reasoning.
48
+ - **Fallback**: Qwen/Qwen2.5-7B-Instruct (free tier, available when primary models fail)
49
+
50
+ ### Temperature Rationale
51
+
52
+ - **0.4** (AURORA): Allows creative interpretation of color stories and palette harmony
53
+ - **0.25** (ATLAS): Analytical comparison needs consistency but some flexibility for trade-off reasoning
54
+ - **0.2** (SENTINEL): Strict rule evaluation β€” consistency is critical for compliance scoring
55
+ - **0.3** (NEXUS): Balanced β€” needs to synthesize creatively but stay grounded in agent data
56
+
57
+ ---
58
+
59
+ ## Evaluation & Scoring
60
+
61
+ ### Self-Evaluation (All Agents)
62
+ Each agent includes a `self_evaluation` block in its JSON output:
63
+ ```json
64
+ {
65
+ "confidence": 8, // 1-10: How confident the agent is
66
+ "reasoning": "Clear usage patterns with 20+ colors",
67
+ "data_quality": "good", // good | fair | poor
68
+ "flags": [] // e.g., ["insufficient_context", "ambiguous_data"]
69
+ }
70
+ ```
71
+
72
+ ### AURORA Scoring Rubric (Cohesion 1-10)
73
+ - **9-10**: Clear harmony rule, distinct brand colors, consistent palette
74
+ - **7-8**: Mostly harmonious, clear brand identity
75
+ - **5-6**: Some relationships visible but not systematic
76
+ - **3-4**: Random palette, no clear strategy
77
+ - **1-2**: Conflicting colors, no brand identity
78
+
79
+ ### SENTINEL Scoring Rubric (Overall 0-100)
80
+ Weighted checks:
81
+ - AA Compliance: 25 points
82
+ - Type Scale Consistency: 15 points
83
+ - Base Size Accessible: 15 points
84
+ - Spacing Grid: 15 points
85
+ - Type Scale Standard Ratio: 10 points
86
+ - Color Count: 10 points
87
+ - No Near-Duplicates: 10 points
88
+
89
+ ### NEXUS Scoring Rubric (Overall 0-100)
90
+ - **90-100**: Production-ready, minor polishing only
91
+ - **75-89**: Solid foundation, 2-3 targeted improvements
92
+ - **60-74**: Functional but needs focused attention
93
+ - **40-59**: Significant gaps requiring systematic improvement
94
+ - **20-39**: Major rework needed
95
+ - **0-19**: Fundamental redesign recommended
96
+
97
+ ### Evaluation Summary (Logged After Analysis)
98
+ ```
99
+ ═══════════════════════════════════════════════════
100
+ πŸ” AGENT EVALUATION SUMMARY
101
+ ═══════════════════════════════════════════════════
102
+ 🎨 AURORA (Brand ID): confidence=8/10, data=good
103
+ 🏒 ATLAS (Benchmark): confidence=7/10, data=good
104
+ βœ… SENTINEL (Practices): confidence=9/10, data=good, score=72/100
105
+ 🧠 NEXUS (Synthesis): confidence=8/10, data=good, overall=65/100
106
+ ═══════════════════════════════════════════════════
107
+ ```
108
+
109
+ ---
110
+
111
+ ## User Journey
112
+
113
+ 1. **Enter HF Token** β€” Required for LLM inference (free tier works)
114
+ 2. **Enter Website URL** β€” The site to extract design tokens from
115
+ 3. **Discover Pages** β€” Auto-finds pages via sitemap or crawling
116
+ 4. **Select Pages** β€” Check/uncheck pages to include (max 10)
117
+ 5. **Extract Tokens** β€” Scans selected pages at Desktop + Mobile viewports
118
+ 6. **Review Stage 1** β€” Interactive tables: Colors, Typography, Spacing, Radius, Shadows, Semantic Colors. Each tab has a data table + visual preview accordion. Accept/reject individual tokens.
119
+ 7. **Proceed to Stage 2** β€” Select benchmarks to compare against
120
+ 8. **Run AI Analysis** β€” 4-layer pipeline executes (Rule Engine -> Benchmarks -> LLM Agents -> Synthesis)
121
+ 9. **Review Analysis** β€” Dashboard with scores, recommendations, benchmark comparison, color recs
122
+ 10. **Apply Upgrades** β€” Accept/reject individual recommendations
123
+ 11. **Export JSON** β€” Download Figma Tokens Studio-compatible JSON
124
+
125
+ ---
126
+
127
+ ## File Structure
128
+
129
+ | File | Responsibility |
130
+ |------|----------------|
131
+ | `app.py` | Main Gradio UI β€” all stages, CSS, event bindings, formatting functions |
132
+ | `agents/llm_agents.py` | 4 LLM agent classes (AURORA, ATLAS, SENTINEL, NEXUS) + dataclasses |
133
+ | `agents/semantic_analyzer.py` | Semantic color categorization (brand, text, background, etc.) |
134
+ | `config/settings.py` | Model routing, env var loading, agent-to-model mapping |
135
+ | `core/hf_inference.py` | HF Inference API client, model registry, temperature mapping |
136
+ | `core/preview_generator.py` | HTML preview generators for Stage 1 visual previews |
137
+ | `core/rule_engine.py` | Layer 1: Type scale, AA contrast, spacing grid, color stats |
138
+ | `core/benchmarks.py` | Benchmark definitions (Material Design 3, Apple HIG, etc.) |
139
+ | `core/extractor.py` | Playwright-based CSS token extraction |
140
+ | `core/discovery.py` | Page discovery via sitemap.xml / crawling |
141
+
142
+ ---
143
+
144
+ ## Configuration
145
+
146
+ ### Environment Variables
147
+
148
+ | Variable | Default | Description |
149
+ |----------|---------|-------------|
150
+ | `HF_TOKEN` | (required) | HuggingFace API token |
151
+ | `BRAND_IDENTIFIER_MODEL` | `Qwen/Qwen2.5-72B-Instruct` | Model for AURORA |
152
+ | `BENCHMARK_ADVISOR_MODEL` | `meta-llama/Llama-3.3-70B-Instruct` | Model for ATLAS |
153
+ | `BEST_PRACTICES_MODEL` | `Qwen/Qwen2.5-72B-Instruct` | Model for SENTINEL |
154
+ | `HEAD_SYNTHESIZER_MODEL` | `meta-llama/Llama-3.3-70B-Instruct` | Model for NEXUS |
155
+ | `FALLBACK_MODEL` | `Qwen/Qwen2.5-7B-Instruct` | Fallback when primary fails |
156
+ | `HF_MAX_NEW_TOKENS` | `2048` | Max tokens per LLM response |
157
+ | `HF_TEMPERATURE` | `0.3` | Global default temperature |
158
+ | `MAX_PAGES` | `20` | Max pages to discover |
159
+ | `BROWSER_TIMEOUT` | `30000` | Playwright timeout (ms) |
160
+
161
+ ### Model Override Examples
162
+ ```bash
163
+ # Use Llama for all agents
164
+ export BRAND_IDENTIFIER_MODEL="meta-llama/Llama-3.3-70B-Instruct"
165
+ export BEST_PRACTICES_MODEL="meta-llama/Llama-3.3-70B-Instruct"
166
+
167
+ # Use budget models
168
+ export BRAND_IDENTIFIER_MODEL="Qwen/Qwen2.5-7B-Instruct"
169
+ export BENCHMARK_ADVISOR_MODEL="mistralai/Mixtral-8x7B-Instruct-v0.1"
170
+ ```