Spaces:

riazmo
/

ui-regression-testing-2

Sleeping

App Files Files Community

ui-regression-testing-2 / HF_AND_STORAGE_ANALYSIS.md

riazmo

Upload 61 files

6f38c76 verified 4 months ago

preview code

raw

history blame contribute delete

12.2 kB

HF Vision Models & Screenshot Storage Analysis

1. HF Vision Model Usage - Current Status

❌ Currently NOT Implemented

The system mentions HF vision models in documentation and state schema, but does not actually call them in the current implementation.

Current Detection Methods:

✅ Screenshot pixel-level comparison (PIL, NumPy)
✅ Color analysis (RGB delta calculation)
✅ Structural analysis (edge detection, MSE)
❌ HF Vision Model API calls (NOT implemented)

Where HF is Mentioned (But Not Used)

state_schema.py - Line 53:

detection_method: str  # "screenshot", "css", "hf_vision", "hybrid"

app.py - Line 276-280:

hf_token = gr.Textbox(
    label="Hugging Face Token (Optional)",
    placeholder="hf_...",
    type="password",
    info="For enhanced vision model analysis"
)

requirements.txt - Lines 29-31:

huggingface-hub>=0.19.0
transformers>=4.30.0
torch>=2.0.0

What's Missing

To actually use HF vision models, we need to:

Import HF libraries:

from transformers import pipeline
from PIL import Image

Create vision pipeline:

vision_pipeline = pipeline(
    "image-to-text",
    model="Salesforce/blip-image-captioning-base",
    device=0  # GPU device
)

Analyze images:

figma_caption = vision_pipeline(figma_image)
website_caption = vision_pipeline(website_image)
# Compare captions for semantic differences

Or use image classification:

classifier = pipeline("image-classification", model="google/vit-base-patch16-224")
figma_features = classifier(figma_image)
website_features = classifier(website_image)

2. Screenshot Storage - Current Status

✅ Storage Directories Exist

data/
├── comparisons/        # Side-by-side comparison images
├── annotated/          # Screenshots with difference annotations
└── (raw screenshots)   # Original Figma and website captures

Storage Locations in Code

Agent 1 (Figma) - agents/agent_1_design_inspector.py:
- Saves to: design_screenshots[viewport] (in-memory path)
- Format: PNG files from Figma API
Agent 2 (Website) - agents/agent_2_website_inspector.py:
- Saves to: website_screenshots[viewport] (in-memory path)
- Format: PNG files from Playwright
Screenshot Annotator - screenshot_annotator.py:
- Saves to: data/annotated/ directory
- Format: PNG with colored circles marking differences
Comparison Generator - app.py:
- Reads from: data/comparisons/ directory
- Displays in Gradio gallery

Current Storage Issues

Problem 1: Screenshots Not Persisted

Screenshots are stored in temporary paths
Not saved to persistent data/ directory
Lost after execution completes

Problem 2: No Raw Screenshot Archive

Only annotated/comparison images saved
Original Figma and website captures not archived
Can't review raw captures later

Problem 3: Storage Space Not Managed

No cleanup of old screenshots
No size limits
Could fill up disk space over time

3. Recommended Improvements

A. Implement HF Vision Model Integration

Option 1: Image Captioning (Recommended)

from transformers import pipeline

class HFVisionAnalyzer:
    def __init__(self, hf_token=None):
        self.pipeline = pipeline(
            "image-to-text",
            model="Salesforce/blip-image-captioning-base",
            device=0
        )
    
    def analyze_image(self, image_path):
        """Generate semantic description of image"""
        image = Image.open(image_path)
        caption = self.pipeline(image)[0]['generated_text']
        return caption
    
    def compare_images(self, figma_path, website_path):
        """Compare semantic content of images"""
        figma_caption = self.analyze_image(figma_path)
        website_caption = self.analyze_image(website_path)
        
        # Use text similarity to find differences
        similarity = calculate_text_similarity(figma_caption, website_caption)
        return similarity, figma_caption, website_caption

Option 2: Object Detection

from transformers import pipeline

detector = pipeline("object-detection", model="facebook/detr-resnet50")

figma_objects = detector(figma_image)
website_objects = detector(website_image)

# Compare detected objects
missing_objects = find_missing_objects(figma_objects, website_objects)

Option 3: Visual Question Answering

from transformers import pipeline

vqa = pipeline("visual-question-answering", model="dandelin/vilt-b32-finetuned-vqa")

questions = [
    "What is the header height?",
    "What color is the button?",
    "Are there any icons?",
    "What is the text content?"
]

figma_answers = [vqa(figma_image, q) for q in questions]
website_answers = [vqa(website_image, q) for q in questions]

B. Improve Screenshot Storage

Option 1: Persistent Storage with Cleanup

import os
from pathlib import Path
from datetime import datetime, timedelta

class ScreenshotStorage:
    def __init__(self, base_dir="data/screenshots"):
        self.base_dir = Path(base_dir)
        self.base_dir.mkdir(parents=True, exist_ok=True)
    
    def save_screenshot(self, image, execution_id, viewport, screenshot_type):
        """Save screenshot with metadata"""
        # Create execution directory
        exec_dir = self.base_dir / execution_id
        exec_dir.mkdir(exist_ok=True)
        
        # Save with timestamp
        filename = f"{viewport}_{screenshot_type}_{datetime.now().isoformat()}.png"
        filepath = exec_dir / filename
        image.save(filepath)
        
        return str(filepath)
    
    def cleanup_old_screenshots(self, days=7):
        """Remove screenshots older than N days"""
        cutoff = datetime.now() - timedelta(days=days)
        
        for exec_dir in self.base_dir.iterdir():
            if exec_dir.is_dir():
                for screenshot in exec_dir.glob("*.png"):
                    mtime = datetime.fromtimestamp(screenshot.stat().st_mtime)
                    if mtime < cutoff:
                        screenshot.unlink()
    
    def get_execution_screenshots(self, execution_id):
        """Retrieve all screenshots for an execution"""
        exec_dir = self.base_dir / execution_id
        return list(exec_dir.glob("*.png")) if exec_dir.exists() else []

Option 2: Cloud Storage (S3, GCS)

import boto3

class S3ScreenshotStorage:
    def __init__(self, bucket_name, aws_access_key, aws_secret_key):
        self.s3 = boto3.client(
            's3',
            aws_access_key_id=aws_access_key,
            aws_secret_access_key=aws_secret_key
        )
        self.bucket = bucket_name
    
    def save_screenshot(self, image, execution_id, viewport, screenshot_type):
        """Save screenshot to S3"""
        key = f"screenshots/{execution_id}/{viewport}_{screenshot_type}.png"
        
        # Convert PIL image to bytes
        image_bytes = io.BytesIO()
        image.save(image_bytes, format='PNG')
        image_bytes.seek(0)
        
        # Upload to S3
        self.s3.put_object(
            Bucket=self.bucket,
            Key=key,
            Body=image_bytes.getvalue(),
            ContentType='image/png'
        )
        
        return f"s3://{self.bucket}/{key}"

4. Implementation Plan

Phase 1: Add HF Vision Analysis (Recommended First)

Files to Modify:

agents/agent_3_difference_analyzer.py - Add HF analysis
state_schema.py - Add HF analysis results
requirements.txt - Already has dependencies

Code Changes:

# In agent_3_difference_analyzer.py

from transformers import pipeline
from PIL import Image

class HFVisionAnalyzer:
    def __init__(self, hf_token=None):
        self.captioner = pipeline(
            "image-to-text",
            model="Salesforce/blip-image-captioning-base"
        )
    
    def analyze_differences(self, figma_path, website_path):
        """Use HF to analyze image differences"""
        figma_img = Image.open(figma_path)
        website_img = Image.open(website_path)
        
        figma_caption = self.captioner(figma_img)[0]['generated_text']
        website_caption = self.captioner(website_img)[0]['generated_text']
        
        # Find semantic differences
        differences = self._compare_captions(figma_caption, website_caption)
        return differences

Phase 2: Improve Screenshot Storage

Files to Create:

storage_manager.py - Screenshot storage and retrieval
cloud_storage.py - Optional cloud integration

Code Changes:

# In agents/agent_1_design_inspector.py and agent_2_website_inspector.py

from storage_manager import ScreenshotStorage

storage = ScreenshotStorage()

# Save screenshot
screenshot_path = storage.save_screenshot(
    image=screenshot,
    execution_id=state.execution_id,
    viewport=viewport,
    screenshot_type="figma"
)

state.figma_screenshots[viewport].image_path = screenshot_path

5. Comparison: Current vs. Enhanced

Feature	Current	Enhanced
HF Vision	❌ Not used	✅ Image captioning
Screenshot Storage	⚠️ Temporary	✅ Persistent
Raw Archives	❌ Not saved	✅ Saved per execution
Storage Cleanup	❌ Manual	✅ Automatic
Cloud Storage	❌ No	✅ Optional (S3/GCS)
Detection Methods	1 (pixel)	3 (pixel + CSS + HF)
Accuracy	~38%	~60%+

6. Storage Space Estimates

Disk Usage per Test Run

Item	Size	Count
Figma screenshot (1440px)	~200KB	1
Figma screenshot (375px)	~50KB	1
Website screenshot (1440px)	~300KB	1
Website screenshot (375px)	~80KB	1
Annotated images	~250KB	2
Comparison images	~300KB	2
Total per run	~1.2MB	-

Storage for 100 Test Runs

120MB (without cleanup)
Manageable on most systems

Storage for 1000 Test Runs

1.2GB (without cleanup)
Cleanup recommended after 30 days

7. Recommended Next Steps

Immediate (High Priority)

✅ Implement HF Vision image captioning
✅ Add persistent screenshot storage
✅ Create storage manager module

Short-term (Medium Priority)

Add automatic cleanup of old screenshots
Implement storage size monitoring
Add screenshot retrieval/comparison features

Long-term (Low Priority)

Add cloud storage integration (S3/GCS)
Implement advanced HF models (object detection, VQA)
Add screenshot versioning/history

8. Code Examples

Example 1: Using HF Vision

from transformers import pipeline
from PIL import Image

# Initialize
captioner = pipeline("image-to-text", model="Salesforce/blip-image-captioning-base")

# Analyze
figma_img = Image.open("figma_screenshot.png")
caption = captioner(figma_img)
print(caption[0]['generated_text'])
# Output: "A checkout page with a header, form fields, and a submit button"

Example 2: Persistent Storage

from storage_manager import ScreenshotStorage

storage = ScreenshotStorage(base_dir="data/screenshots")

# Save
path = storage.save_screenshot(image, "exec_001", "desktop", "figma")
# Output: "data/screenshots/exec_001/desktop_figma_2024-01-04T10:30:00.png"

# Retrieve
screenshots = storage.get_execution_screenshots("exec_001")

Summary

Question	Answer
Are we using HF for analysis?	❌ No (currently), but dependencies are installed
Do we have space to save screenshots?	✅ Yes (data/ directories exist), but not persistent
Should we implement HF vision?	✅ Yes (recommended for better accuracy)
Should we improve storage?	✅ Yes (for better data management)

Recommendation: Implement both HF Vision integration and persistent storage in the next phase for significant accuracy improvements.