ui-regression-testing-2 / HF_AND_STORAGE_ANALYSIS.md
riazmo's picture
Upload 61 files
6f38c76 verified

HF Vision Models & Screenshot Storage Analysis

1. HF Vision Model Usage - Current Status

❌ Currently NOT Implemented

The system mentions HF vision models in documentation and state schema, but does not actually call them in the current implementation.

Current Detection Methods:

  • βœ… Screenshot pixel-level comparison (PIL, NumPy)
  • βœ… Color analysis (RGB delta calculation)
  • βœ… Structural analysis (edge detection, MSE)
  • ❌ HF Vision Model API calls (NOT implemented)

Where HF is Mentioned (But Not Used)

  1. state_schema.py - Line 53:

    detection_method: str  # "screenshot", "css", "hf_vision", "hybrid"
    
  2. app.py - Line 276-280:

    hf_token = gr.Textbox(
        label="Hugging Face Token (Optional)",
        placeholder="hf_...",
        type="password",
        info="For enhanced vision model analysis"
    )
    
  3. requirements.txt - Lines 29-31:

    huggingface-hub>=0.19.0
    transformers>=4.30.0
    torch>=2.0.0
    

What's Missing

To actually use HF vision models, we need to:

  1. Import HF libraries:

    from transformers import pipeline
    from PIL import Image
    
  2. Create vision pipeline:

    vision_pipeline = pipeline(
        "image-to-text",
        model="Salesforce/blip-image-captioning-base",
        device=0  # GPU device
    )
    
  3. Analyze images:

    figma_caption = vision_pipeline(figma_image)
    website_caption = vision_pipeline(website_image)
    # Compare captions for semantic differences
    
  4. Or use image classification:

    classifier = pipeline("image-classification", model="google/vit-base-patch16-224")
    figma_features = classifier(figma_image)
    website_features = classifier(website_image)
    

2. Screenshot Storage - Current Status

βœ… Storage Directories Exist

data/
β”œβ”€β”€ comparisons/        # Side-by-side comparison images
β”œβ”€β”€ annotated/          # Screenshots with difference annotations
└── (raw screenshots)   # Original Figma and website captures

Storage Locations in Code

  1. Agent 1 (Figma) - agents/agent_1_design_inspector.py:

    • Saves to: design_screenshots[viewport] (in-memory path)
    • Format: PNG files from Figma API
  2. Agent 2 (Website) - agents/agent_2_website_inspector.py:

    • Saves to: website_screenshots[viewport] (in-memory path)
    • Format: PNG files from Playwright
  3. Screenshot Annotator - screenshot_annotator.py:

    • Saves to: data/annotated/ directory
    • Format: PNG with colored circles marking differences
  4. Comparison Generator - app.py:

    • Reads from: data/comparisons/ directory
    • Displays in Gradio gallery

Current Storage Issues

Problem 1: Screenshots Not Persisted

  • Screenshots are stored in temporary paths
  • Not saved to persistent data/ directory
  • Lost after execution completes

Problem 2: No Raw Screenshot Archive

  • Only annotated/comparison images saved
  • Original Figma and website captures not archived
  • Can't review raw captures later

Problem 3: Storage Space Not Managed

  • No cleanup of old screenshots
  • No size limits
  • Could fill up disk space over time

3. Recommended Improvements

A. Implement HF Vision Model Integration

Option 1: Image Captioning (Recommended)

from transformers import pipeline

class HFVisionAnalyzer:
    def __init__(self, hf_token=None):
        self.pipeline = pipeline(
            "image-to-text",
            model="Salesforce/blip-image-captioning-base",
            device=0
        )
    
    def analyze_image(self, image_path):
        """Generate semantic description of image"""
        image = Image.open(image_path)
        caption = self.pipeline(image)[0]['generated_text']
        return caption
    
    def compare_images(self, figma_path, website_path):
        """Compare semantic content of images"""
        figma_caption = self.analyze_image(figma_path)
        website_caption = self.analyze_image(website_path)
        
        # Use text similarity to find differences
        similarity = calculate_text_similarity(figma_caption, website_caption)
        return similarity, figma_caption, website_caption

Option 2: Object Detection

from transformers import pipeline

detector = pipeline("object-detection", model="facebook/detr-resnet50")

figma_objects = detector(figma_image)
website_objects = detector(website_image)

# Compare detected objects
missing_objects = find_missing_objects(figma_objects, website_objects)

Option 3: Visual Question Answering

from transformers import pipeline

vqa = pipeline("visual-question-answering", model="dandelin/vilt-b32-finetuned-vqa")

questions = [
    "What is the header height?",
    "What color is the button?",
    "Are there any icons?",
    "What is the text content?"
]

figma_answers = [vqa(figma_image, q) for q in questions]
website_answers = [vqa(website_image, q) for q in questions]

B. Improve Screenshot Storage

Option 1: Persistent Storage with Cleanup

import os
from pathlib import Path
from datetime import datetime, timedelta

class ScreenshotStorage:
    def __init__(self, base_dir="data/screenshots"):
        self.base_dir = Path(base_dir)
        self.base_dir.mkdir(parents=True, exist_ok=True)
    
    def save_screenshot(self, image, execution_id, viewport, screenshot_type):
        """Save screenshot with metadata"""
        # Create execution directory
        exec_dir = self.base_dir / execution_id
        exec_dir.mkdir(exist_ok=True)
        
        # Save with timestamp
        filename = f"{viewport}_{screenshot_type}_{datetime.now().isoformat()}.png"
        filepath = exec_dir / filename
        image.save(filepath)
        
        return str(filepath)
    
    def cleanup_old_screenshots(self, days=7):
        """Remove screenshots older than N days"""
        cutoff = datetime.now() - timedelta(days=days)
        
        for exec_dir in self.base_dir.iterdir():
            if exec_dir.is_dir():
                for screenshot in exec_dir.glob("*.png"):
                    mtime = datetime.fromtimestamp(screenshot.stat().st_mtime)
                    if mtime < cutoff:
                        screenshot.unlink()
    
    def get_execution_screenshots(self, execution_id):
        """Retrieve all screenshots for an execution"""
        exec_dir = self.base_dir / execution_id
        return list(exec_dir.glob("*.png")) if exec_dir.exists() else []

Option 2: Cloud Storage (S3, GCS)

import boto3

class S3ScreenshotStorage:
    def __init__(self, bucket_name, aws_access_key, aws_secret_key):
        self.s3 = boto3.client(
            's3',
            aws_access_key_id=aws_access_key,
            aws_secret_access_key=aws_secret_key
        )
        self.bucket = bucket_name
    
    def save_screenshot(self, image, execution_id, viewport, screenshot_type):
        """Save screenshot to S3"""
        key = f"screenshots/{execution_id}/{viewport}_{screenshot_type}.png"
        
        # Convert PIL image to bytes
        image_bytes = io.BytesIO()
        image.save(image_bytes, format='PNG')
        image_bytes.seek(0)
        
        # Upload to S3
        self.s3.put_object(
            Bucket=self.bucket,
            Key=key,
            Body=image_bytes.getvalue(),
            ContentType='image/png'
        )
        
        return f"s3://{self.bucket}/{key}"

4. Implementation Plan

Phase 1: Add HF Vision Analysis (Recommended First)

Files to Modify:

  1. agents/agent_3_difference_analyzer.py - Add HF analysis
  2. state_schema.py - Add HF analysis results
  3. requirements.txt - Already has dependencies

Code Changes:

# In agent_3_difference_analyzer.py

from transformers import pipeline
from PIL import Image

class HFVisionAnalyzer:
    def __init__(self, hf_token=None):
        self.captioner = pipeline(
            "image-to-text",
            model="Salesforce/blip-image-captioning-base"
        )
    
    def analyze_differences(self, figma_path, website_path):
        """Use HF to analyze image differences"""
        figma_img = Image.open(figma_path)
        website_img = Image.open(website_path)
        
        figma_caption = self.captioner(figma_img)[0]['generated_text']
        website_caption = self.captioner(website_img)[0]['generated_text']
        
        # Find semantic differences
        differences = self._compare_captions(figma_caption, website_caption)
        return differences

Phase 2: Improve Screenshot Storage

Files to Create:

  1. storage_manager.py - Screenshot storage and retrieval
  2. cloud_storage.py - Optional cloud integration

Code Changes:

# In agents/agent_1_design_inspector.py and agent_2_website_inspector.py

from storage_manager import ScreenshotStorage

storage = ScreenshotStorage()

# Save screenshot
screenshot_path = storage.save_screenshot(
    image=screenshot,
    execution_id=state.execution_id,
    viewport=viewport,
    screenshot_type="figma"
)

state.figma_screenshots[viewport].image_path = screenshot_path

5. Comparison: Current vs. Enhanced

Feature Current Enhanced
HF Vision ❌ Not used βœ… Image captioning
Screenshot Storage ⚠️ Temporary βœ… Persistent
Raw Archives ❌ Not saved βœ… Saved per execution
Storage Cleanup ❌ Manual βœ… Automatic
Cloud Storage ❌ No βœ… Optional (S3/GCS)
Detection Methods 1 (pixel) 3 (pixel + CSS + HF)
Accuracy ~38% ~60%+

6. Storage Space Estimates

Disk Usage per Test Run

Item Size Count
Figma screenshot (1440px) ~200KB 1
Figma screenshot (375px) ~50KB 1
Website screenshot (1440px) ~300KB 1
Website screenshot (375px) ~80KB 1
Annotated images ~250KB 2
Comparison images ~300KB 2
Total per run ~1.2MB -

Storage for 100 Test Runs

  • 120MB (without cleanup)
  • Manageable on most systems

Storage for 1000 Test Runs

  • 1.2GB (without cleanup)
  • Cleanup recommended after 30 days

7. Recommended Next Steps

Immediate (High Priority)

  1. βœ… Implement HF Vision image captioning
  2. βœ… Add persistent screenshot storage
  3. βœ… Create storage manager module

Short-term (Medium Priority)

  1. Add automatic cleanup of old screenshots
  2. Implement storage size monitoring
  3. Add screenshot retrieval/comparison features

Long-term (Low Priority)

  1. Add cloud storage integration (S3/GCS)
  2. Implement advanced HF models (object detection, VQA)
  3. Add screenshot versioning/history

8. Code Examples

Example 1: Using HF Vision

from transformers import pipeline
from PIL import Image

# Initialize
captioner = pipeline("image-to-text", model="Salesforce/blip-image-captioning-base")

# Analyze
figma_img = Image.open("figma_screenshot.png")
caption = captioner(figma_img)
print(caption[0]['generated_text'])
# Output: "A checkout page with a header, form fields, and a submit button"

Example 2: Persistent Storage

from storage_manager import ScreenshotStorage

storage = ScreenshotStorage(base_dir="data/screenshots")

# Save
path = storage.save_screenshot(image, "exec_001", "desktop", "figma")
# Output: "data/screenshots/exec_001/desktop_figma_2024-01-04T10:30:00.png"

# Retrieve
screenshots = storage.get_execution_screenshots("exec_001")

Summary

Question Answer
Are we using HF for analysis? ❌ No (currently), but dependencies are installed
Do we have space to save screenshots? βœ… Yes (data/ directories exist), but not persistent
Should we implement HF vision? βœ… Yes (recommended for better accuracy)
Should we improve storage? βœ… Yes (for better data management)

Recommendation: Implement both HF Vision integration and persistent storage in the next phase for significant accuracy improvements.