Spaces:
Sleeping
HF Vision Models & Screenshot Storage Analysis
1. HF Vision Model Usage - Current Status
β Currently NOT Implemented
The system mentions HF vision models in documentation and state schema, but does not actually call them in the current implementation.
Current Detection Methods:
- β Screenshot pixel-level comparison (PIL, NumPy)
- β Color analysis (RGB delta calculation)
- β Structural analysis (edge detection, MSE)
- β HF Vision Model API calls (NOT implemented)
Where HF is Mentioned (But Not Used)
state_schema.py - Line 53:
detection_method: str # "screenshot", "css", "hf_vision", "hybrid"app.py - Line 276-280:
hf_token = gr.Textbox( label="Hugging Face Token (Optional)", placeholder="hf_...", type="password", info="For enhanced vision model analysis" )requirements.txt - Lines 29-31:
huggingface-hub>=0.19.0 transformers>=4.30.0 torch>=2.0.0
What's Missing
To actually use HF vision models, we need to:
Import HF libraries:
from transformers import pipeline from PIL import ImageCreate vision pipeline:
vision_pipeline = pipeline( "image-to-text", model="Salesforce/blip-image-captioning-base", device=0 # GPU device )Analyze images:
figma_caption = vision_pipeline(figma_image) website_caption = vision_pipeline(website_image) # Compare captions for semantic differencesOr use image classification:
classifier = pipeline("image-classification", model="google/vit-base-patch16-224") figma_features = classifier(figma_image) website_features = classifier(website_image)
2. Screenshot Storage - Current Status
β Storage Directories Exist
data/
βββ comparisons/ # Side-by-side comparison images
βββ annotated/ # Screenshots with difference annotations
βββ (raw screenshots) # Original Figma and website captures
Storage Locations in Code
Agent 1 (Figma) -
agents/agent_1_design_inspector.py:- Saves to:
design_screenshots[viewport](in-memory path) - Format: PNG files from Figma API
- Saves to:
Agent 2 (Website) -
agents/agent_2_website_inspector.py:- Saves to:
website_screenshots[viewport](in-memory path) - Format: PNG files from Playwright
- Saves to:
Screenshot Annotator -
screenshot_annotator.py:- Saves to:
data/annotated/directory - Format: PNG with colored circles marking differences
- Saves to:
Comparison Generator -
app.py:- Reads from:
data/comparisons/directory - Displays in Gradio gallery
- Reads from:
Current Storage Issues
Problem 1: Screenshots Not Persisted
- Screenshots are stored in temporary paths
- Not saved to persistent
data/directory - Lost after execution completes
Problem 2: No Raw Screenshot Archive
- Only annotated/comparison images saved
- Original Figma and website captures not archived
- Can't review raw captures later
Problem 3: Storage Space Not Managed
- No cleanup of old screenshots
- No size limits
- Could fill up disk space over time
3. Recommended Improvements
A. Implement HF Vision Model Integration
Option 1: Image Captioning (Recommended)
from transformers import pipeline
class HFVisionAnalyzer:
def __init__(self, hf_token=None):
self.pipeline = pipeline(
"image-to-text",
model="Salesforce/blip-image-captioning-base",
device=0
)
def analyze_image(self, image_path):
"""Generate semantic description of image"""
image = Image.open(image_path)
caption = self.pipeline(image)[0]['generated_text']
return caption
def compare_images(self, figma_path, website_path):
"""Compare semantic content of images"""
figma_caption = self.analyze_image(figma_path)
website_caption = self.analyze_image(website_path)
# Use text similarity to find differences
similarity = calculate_text_similarity(figma_caption, website_caption)
return similarity, figma_caption, website_caption
Option 2: Object Detection
from transformers import pipeline
detector = pipeline("object-detection", model="facebook/detr-resnet50")
figma_objects = detector(figma_image)
website_objects = detector(website_image)
# Compare detected objects
missing_objects = find_missing_objects(figma_objects, website_objects)
Option 3: Visual Question Answering
from transformers import pipeline
vqa = pipeline("visual-question-answering", model="dandelin/vilt-b32-finetuned-vqa")
questions = [
"What is the header height?",
"What color is the button?",
"Are there any icons?",
"What is the text content?"
]
figma_answers = [vqa(figma_image, q) for q in questions]
website_answers = [vqa(website_image, q) for q in questions]
B. Improve Screenshot Storage
Option 1: Persistent Storage with Cleanup
import os
from pathlib import Path
from datetime import datetime, timedelta
class ScreenshotStorage:
def __init__(self, base_dir="data/screenshots"):
self.base_dir = Path(base_dir)
self.base_dir.mkdir(parents=True, exist_ok=True)
def save_screenshot(self, image, execution_id, viewport, screenshot_type):
"""Save screenshot with metadata"""
# Create execution directory
exec_dir = self.base_dir / execution_id
exec_dir.mkdir(exist_ok=True)
# Save with timestamp
filename = f"{viewport}_{screenshot_type}_{datetime.now().isoformat()}.png"
filepath = exec_dir / filename
image.save(filepath)
return str(filepath)
def cleanup_old_screenshots(self, days=7):
"""Remove screenshots older than N days"""
cutoff = datetime.now() - timedelta(days=days)
for exec_dir in self.base_dir.iterdir():
if exec_dir.is_dir():
for screenshot in exec_dir.glob("*.png"):
mtime = datetime.fromtimestamp(screenshot.stat().st_mtime)
if mtime < cutoff:
screenshot.unlink()
def get_execution_screenshots(self, execution_id):
"""Retrieve all screenshots for an execution"""
exec_dir = self.base_dir / execution_id
return list(exec_dir.glob("*.png")) if exec_dir.exists() else []
Option 2: Cloud Storage (S3, GCS)
import boto3
class S3ScreenshotStorage:
def __init__(self, bucket_name, aws_access_key, aws_secret_key):
self.s3 = boto3.client(
's3',
aws_access_key_id=aws_access_key,
aws_secret_access_key=aws_secret_key
)
self.bucket = bucket_name
def save_screenshot(self, image, execution_id, viewport, screenshot_type):
"""Save screenshot to S3"""
key = f"screenshots/{execution_id}/{viewport}_{screenshot_type}.png"
# Convert PIL image to bytes
image_bytes = io.BytesIO()
image.save(image_bytes, format='PNG')
image_bytes.seek(0)
# Upload to S3
self.s3.put_object(
Bucket=self.bucket,
Key=key,
Body=image_bytes.getvalue(),
ContentType='image/png'
)
return f"s3://{self.bucket}/{key}"
4. Implementation Plan
Phase 1: Add HF Vision Analysis (Recommended First)
Files to Modify:
agents/agent_3_difference_analyzer.py- Add HF analysisstate_schema.py- Add HF analysis resultsrequirements.txt- Already has dependencies
Code Changes:
# In agent_3_difference_analyzer.py
from transformers import pipeline
from PIL import Image
class HFVisionAnalyzer:
def __init__(self, hf_token=None):
self.captioner = pipeline(
"image-to-text",
model="Salesforce/blip-image-captioning-base"
)
def analyze_differences(self, figma_path, website_path):
"""Use HF to analyze image differences"""
figma_img = Image.open(figma_path)
website_img = Image.open(website_path)
figma_caption = self.captioner(figma_img)[0]['generated_text']
website_caption = self.captioner(website_img)[0]['generated_text']
# Find semantic differences
differences = self._compare_captions(figma_caption, website_caption)
return differences
Phase 2: Improve Screenshot Storage
Files to Create:
storage_manager.py- Screenshot storage and retrievalcloud_storage.py- Optional cloud integration
Code Changes:
# In agents/agent_1_design_inspector.py and agent_2_website_inspector.py
from storage_manager import ScreenshotStorage
storage = ScreenshotStorage()
# Save screenshot
screenshot_path = storage.save_screenshot(
image=screenshot,
execution_id=state.execution_id,
viewport=viewport,
screenshot_type="figma"
)
state.figma_screenshots[viewport].image_path = screenshot_path
5. Comparison: Current vs. Enhanced
| Feature | Current | Enhanced |
|---|---|---|
| HF Vision | β Not used | β Image captioning |
| Screenshot Storage | β οΈ Temporary | β Persistent |
| Raw Archives | β Not saved | β Saved per execution |
| Storage Cleanup | β Manual | β Automatic |
| Cloud Storage | β No | β Optional (S3/GCS) |
| Detection Methods | 1 (pixel) | 3 (pixel + CSS + HF) |
| Accuracy | ~38% | ~60%+ |
6. Storage Space Estimates
Disk Usage per Test Run
| Item | Size | Count |
|---|---|---|
| Figma screenshot (1440px) | ~200KB | 1 |
| Figma screenshot (375px) | ~50KB | 1 |
| Website screenshot (1440px) | ~300KB | 1 |
| Website screenshot (375px) | ~80KB | 1 |
| Annotated images | ~250KB | 2 |
| Comparison images | ~300KB | 2 |
| Total per run | ~1.2MB | - |
Storage for 100 Test Runs
- 120MB (without cleanup)
- Manageable on most systems
Storage for 1000 Test Runs
- 1.2GB (without cleanup)
- Cleanup recommended after 30 days
7. Recommended Next Steps
Immediate (High Priority)
- β Implement HF Vision image captioning
- β Add persistent screenshot storage
- β Create storage manager module
Short-term (Medium Priority)
- Add automatic cleanup of old screenshots
- Implement storage size monitoring
- Add screenshot retrieval/comparison features
Long-term (Low Priority)
- Add cloud storage integration (S3/GCS)
- Implement advanced HF models (object detection, VQA)
- Add screenshot versioning/history
8. Code Examples
Example 1: Using HF Vision
from transformers import pipeline
from PIL import Image
# Initialize
captioner = pipeline("image-to-text", model="Salesforce/blip-image-captioning-base")
# Analyze
figma_img = Image.open("figma_screenshot.png")
caption = captioner(figma_img)
print(caption[0]['generated_text'])
# Output: "A checkout page with a header, form fields, and a submit button"
Example 2: Persistent Storage
from storage_manager import ScreenshotStorage
storage = ScreenshotStorage(base_dir="data/screenshots")
# Save
path = storage.save_screenshot(image, "exec_001", "desktop", "figma")
# Output: "data/screenshots/exec_001/desktop_figma_2024-01-04T10:30:00.png"
# Retrieve
screenshots = storage.get_execution_screenshots("exec_001")
Summary
| Question | Answer |
|---|---|
| Are we using HF for analysis? | β No (currently), but dependencies are installed |
| Do we have space to save screenshots? | β Yes (data/ directories exist), but not persistent |
| Should we implement HF vision? | β Yes (recommended for better accuracy) |
| Should we improve storage? | β Yes (for better data management) |
Recommendation: Implement both HF Vision integration and persistent storage in the next phase for significant accuracy improvements.