Spaces:

riazmo
/

Design-System-Extractor-2

Sleeping

App Files Files Community

riazmo commited on 13 days ago

Commit

d2d4fa3

verified ·

1 Parent(s): 2627211

Delete docs/CONTEXT.md

Browse files

Files changed (1) hide show

docs/CONTEXT.md +0 -402

docs/CONTEXT.md DELETED Viewed

@@ -1,402 +0,0 @@
-# Design System Extractor v2 — Master Context File
-> **Upload this file to refresh Claude's context when continuing work on this project.**
----
-## 🎯 Project Goal
-Build a **semi-automated, human-in-the-loop agentic system** that:
-1. Reverse-engineers a design system from a live website
-2. Reconstructs and upgrades it into a modern, scalable design system
-3. Outputs production-ready JSON tokens
-**Philosophy:** This is a design-aware co-pilot, NOT a magic button. Humans decide, agents propose.
----
-## 🏗️ Architecture Overview
-```
-┌─────────────────────────────────────────────────────────────────────────────┐
-│                              TECH STACK                                     │
-├─────────────────────────────────────────────────────────────────────────────┤
-│  Frontend:        Gradio (long-scroll, sectioned UI with live preview)      │
-│  Orchestration:   LangGraph (agent state management & workflow)             │
-│  Models:          HuggingFace Inference API (see model assignments below)   │
-│  Hosting:         Hugging Face Spaces                                       │
-│  Storage:         HF Spaces persistent storage                              │
-│  Output:          Platform-agnostic JSON tokens                             │
-└─────────────────────────────────────────────────────────────────────────────┘
-```
----
-## 🧠 Model Assignments
-| Agent | Role | Model | Provider | Why |
-|-------|------|-------|----------|-----|
-| **Agent 1** | Crawler & Extractor | None (Rule-based) | — | Pure CSS extraction, no LLM needed |
-| **Agent 2** | Normalizer | `microsoft/Phi-3.5-mini-instruct` | Microsoft | Fast, great structured output |
-| **Agent 3** | Advisor | `meta-llama/Llama-3.1-70B-Instruct` | Meta | Excellent reasoning, design knowledge |
-| **Agent 4** | Generator | `mistralai/Codestral-22B-v0.1` | Mistral | Code specialist, JSON formatting |
-### Model Presets
-| Preset | Agent 2 | Agent 3 | Agent 4 |
-|--------|---------|---------|---------|
-| **Budget (Free)** | Phi-3.5-mini | Mixtral-8x7B | StarCoder2-15B |
-| **Balanced (Pro)** | Gemma-2-9b | Llama-3.1-70B | Codestral-22B |
-| **Quality (Pro+)** | Gemma-2-27b | Llama-3.1-405B | DeepSeek-Coder-33B |
-| **Diverse** | Microsoft Phi | Cohere Command R+ | Mistral Codestral |
-### Available Providers
-- **Meta**: Llama 3.1 family (8B, 70B, 405B)
-- **Mistral**: Mixtral, Mistral, Codestral
-- **Cohere**: Command R, Command R+
-- **Google**: Gemma 2 family
-- **Microsoft**: Phi 3.5 family
-- **Alibaba**: Qwen 2.5 family
-- **DeepSeek**: DeepSeek Coder, V2.5
-- **BigCode**: StarCoder2
----
-## 🤖 Agent Personas (4 Agents)
-### Agent 1: Website Crawler & Extractor
-- **Persona:** Meticulous Design Archaeologist
-- **Tool:** Playwright
-- **Job:**
-  - Auto-discover 10+ pages from base URL
-  - Crawl Desktop (1440px) + Mobile (375px) separately
-  - Scroll to bottom + wait for network idle
-  - Extract: colors, typography, spacing, radius, shadows
-- **Output:** Raw tokens with frequency, context, confidence
-### Agent 2: Token Normalizer & Structurer
-- **Persona:** Design System Librarian
-- **Job:**
-  - Clean noisy extraction, dedupe
-  - Infer naming patterns
-  - Tag tokens as: `detected` | `inferred` | `low-confidence`
-- **Output:** Structured token sets with metadata
-### Agent 3: Design System Best Practices Advisor
-- **Persona:** Senior Staff Design Systems Architect
-- **Job:**
-  - Research modern DS patterns (Material, Polaris, Carbon, etc.)
-  - Propose upgrade OPTIONS (not decisions)
-  - Suggest: type scales (3 options), spacing (8px), color ramps (AA compliant), naming conventions
-- **Output:** Option sets with rationale
-### Agent 4: Plugin & JSON Generator
-- **Persona:** Automation Engineer
-- **Job:**
-  - Convert finalized tokens to Figma-compatible JSON
-  - Generate: typography, color (with tints/shades), spacing variables
-  - Maintain Desktop + Mobile + version metadata
-- **Output:** Production-ready JSON
----
-## 🖥️ UI Stages (3 Stages)
-### Stage 1: Extraction Review
-- **Purpose:** Trust building
-- **Shows:** Token tables, color swatches, font previews, confidence indicators
-- **Human Actions:** Accept/reject tokens, flag anomalies, toggle Desktop↔Mobile
-### Stage 2: Upgrade Playground (MOST IMPORTANT)
-- **Purpose:** Decision-making through live visuals
-- **Shows:** Side-by-side option selector + live preview
-- **Human Actions:** Select type scale A/B/C, spacing system, color ramps — preview updates instantly
-### Stage 3: Final Review & Export
-- **Purpose:** Confidence before export
-- **Shows:** Token preview, JSON tree, diff view (original vs final)
-- **Human Actions:** Download JSON, save version, label version
----
-## 📁 Project Structure
-```
-design-system-extractor/
-├── app.py                          # Gradio main entry point
-├── requirements.txt
-├── README.md
-│
-├── config/
-│   ├── .env.example                # Environment variables template
-│   ├── agents.yaml                 # Agent personas & configurations
-│   └── settings.py                 # Application settings
-│
-├── agents/
-│   ├── __init__.py
-│   ├── state.py                    # LangGraph state definitions
-│   ├── graph.py                    # LangGraph workflow orchestration
-│   ├── crawler.py                  # Agent 1: Website crawler
-│   ├── extractor.py                # Agent 1: Token extraction
-│   ├── normalizer.py               # Agent 2: Token normalization
-│   ├── advisor.py                  # Agent 3: Best practices
-│   └── generator.py                # Agent 4: JSON generator
-│
-├── core/
-│   ├── __init__.py
-│   ├── browser.py                  # Playwright browser management
-│   ├── css_parser.py               # CSS/computed style extraction
-│   ├── color_utils.py              # Color analysis, contrast, ramps
-│   ├── typography_utils.py         # Type scale detection & generation
-│   ├── spacing_utils.py            # Spacing pattern detection
-│   └── token_schema.py             # Token data structures (Pydantic)
-│
-├── ui/
-│   ├── __init__.py
-│   ├── components.py               # Reusable Gradio components
-│   ├── stage1_extraction.py        # Stage 1 UI
-│   ├── stage2_upgrade.py           # Stage 2 UI
-│   ├── stage3_export.py            # Stage 3 UI
-│   └── preview_generator.py        # HTML preview generation
-│
-├── templates/
-│   ├── preview.html                # Live preview base template
-│   └── specimen.html               # Design system specimen template
-│
-├── storage/
-│   └── persistence.py              # HF Spaces storage management
-│
-├── tests/
-│   ├── test_crawler.py
-│   ├── test_extractor.py
-│   └── test_normalizer.py
-│
-└── docs/
-    ├── CONTEXT.md                  # THIS FILE - upload for context refresh
-    └── API.md                      # API documentation
-```
----
-## 🔧 Key Technical Decisions
-| Decision | Choice | Rationale |
-|----------|--------|-----------|
-| Viewports | Fixed 1440px + 375px | Simplicity, covers main use cases |
-| Scrolling | Bottom + network idle | Captures lazy-loaded content |
-| Infinite scroll | Skip | Avoid complexity |
-| Modals | Manual trigger | User decides what to capture |
-| Color ramps | 5-10 shades, AA compliant | Industry standard |
-| Type scales | 3 options (1.25, 1.333, 1.414) | User selects |
-| Spacing | 8px base system | Modern standard |
-| ML models | Minimal, rule-based preferred | Simplicity, reliability |
-| Versioning | HF Spaces persistent storage | Built-in, free |
-| Preview | Gradio + iframe (best for dynamic) | Smooth updates |
----
-## 📊 Token Schema (Core Data Structures)
-```python
-class TokenSource(Enum):
-    DETECTED = "detected"       # Directly found in CSS
-    INFERRED = "inferred"       # Derived from patterns
-    UPGRADED = "upgraded"       # User-selected improvement
-class Confidence(Enum):
-    HIGH = "high"               # 10+ occurrences
-    MEDIUM = "medium"           # 3-9 occurrences
-    LOW = "low"                 # 1-2 occurrences
-class Viewport(Enum):
-    DESKTOP = "desktop"         # 1440px
-    MOBILE = "mobile"           # 375px
-```
-### Token Types:
-- **ColorToken:** value, frequency, contexts, elements, contrast ratios
-- **TypographyToken:** family, size, weight, line-height, elements
-- **SpacingToken:** value, frequency, contexts, fits_base_8
-- **RadiusToken:** value, frequency, elements
-- **ShadowToken:** value, frequency, elements
----
-## 🔄 LangGraph Workflow
-```
-                    ┌─────────────┐
-                    │   START     │
-                    └──────┬──────┘
-                           │
-                           ▼
-                    ┌─────────────┐
-                    │ URL Input   │
-                    └──────┬──────┘
-                           │
-                           ▼
-              ┌────────────────────────┐
-              │  Agent 1: Discover     │
-              │  (find pages)          │
-              └───────────┬────────────┘
-                          │
-                          ▼
-              ┌────────────────────────┐
-              │  HUMAN: Confirm pages  │◄─── Checkpoint 1
-              └───────────┬────────────┘
-                          │
-                          ▼
-              ┌────────────────────────┐
-              │  Agent 1: Extract      │
-              │  (crawl & extract)     │
-              └───────────┬────────────┘
-                          │
-                          ▼
-              ┌────────────────────────┐
-              │  Agent 2: Normalize    │
-              └───────────┬────────────┘
-                          │
-                          ▼
-              ┌────────────────────────┐
-              │  HUMAN: Review tokens  │◄─── Checkpoint 2 (Stage 1 UI)
-              └───────────┬────────────┘
-                          │
-          ┌───────────────┴───────────────┐
-          │                               │
-          ▼                               ▼
-┌──────────────────┐            ┌──────────────────┐
-│ Agent 3: Advise  │            │  (parallel)      │
-│ (best practices) │            │                  │
-└────────┬─────────┘            └──────────────────┘
-         │
-         ▼
-┌────────────────────────┐
-│  HUMAN: Select options │◄─── Checkpoint 3 (Stage 2 UI)
-└───────────┬────────────┘
-            │
-            ▼
-┌────────────────────────┐
-│  Agent 4: Generate     │
-│  (final JSON)          │
-└───────────┬────────────┘
-            │
-            ▼
-┌────────────────────────┐
-│  HUMAN: Export         │◄─── Checkpoint 4 (Stage 3 UI)
-└───────────┬────────────┘
-            │
-            ▼
-      ┌─────────┐
-      │   END   │
-      └─────────┘
-```
----
-## 🚦 Human-in-the-Loop Rules
-1. **No irreversible automation**
-2. **Agents propose → Humans decide**
-3. **Every auto action must be:**
-   - Visible
-   - Reversible
-   - Previewed
----
-## 📦 Output JSON Format
-```json
-{
-  "metadata": {
-    "source_url": "https://example.com",
-    "extracted_at": "2025-01-23T10:00:00Z",
-    "version": "v1-recovered",
-    "viewport": "desktop"
-  },
-  "colors": {
-    "primary": {
-      "50": { "value": "#e6f2ff", "source": "upgraded" },
-      "500": { "value": "#007bff", "source": "detected" },
-      "900": { "value": "#001a33", "source": "upgraded" }
-    }
-  },
-  "typography": {
-    "heading-xl": {
-      "fontFamily": "Inter",
-      "fontSize": "32px",
-      "fontWeight": 700,
-      "lineHeight": "1.2",
-      "source": "detected"
-    }
-  },
-  "spacing": {
-    "xs": { "value": "4px", "source": "upgraded" },
-    "sm": { "value": "8px", "source": "detected" },
-    "md": { "value": "16px", "source": "detected" }
-  }
-}
-```
----
-## 🛠️ Implementation Phases
-### Phase 1 (Current)
-- [x] Project structure
-- [x] Configuration files
-- [ ] Token schema (Pydantic models)
-- [ ] Agent 1: Crawler
-- [ ] Agent 1: Extractor
-- [ ] Agent 2: Normalizer
-- [ ] Stage 1 UI
-- [ ] LangGraph basic workflow
-### Phase 2
-- [ ] Agent 3: Advisor
-- [ ] Stage 2 UI (Upgrade Playground)
-- [ ] Live preview system
-### Phase 3
-- [ ] Agent 4: Generator
-- [ ] Stage 3 UI
-- [ ] Export functionality
-### Phase 4
-- [ ] Full LangGraph orchestration
-- [ ] HF Spaces deployment
-- [ ] Persistent storage
----
-## 🔑 Environment Variables
-```env
-# Required
-HF_TOKEN=your_huggingface_token
-# Model Configuration (defaults shown — diverse providers)
-AGENT2_MODEL=microsoft/Phi-3.5-mini-instruct      # Microsoft - Fast naming
-AGENT3_MODEL=meta-llama/Llama-3.1-70B-Instruct    # Meta - Strong reasoning
-AGENT4_MODEL=mistralai/Codestral-22B-v0.1         # Mistral - Code/JSON
-# Optional
-DEBUG=true
-LOG_LEVEL=INFO
-```
----
-## 📝 Notes for Claude
-When continuing this project:
-1. **Check current phase** in Implementation Phases section
-2. **Review agent personas** in agents.yaml for consistent behavior
-3. **Follow token schema** defined in core/token_schema.py
-4. **Maintain LangGraph state** consistency across agents
-5. **Use Gradio components** from ui/components.py for consistency
-6. **Test with** real websites before deployment
----
-*Last updated: 2025-01-23*