Spaces:

prthm11
/

Scratch_Vision_Game

Sleeping

App Files Files Community

prthm11 commited on Sep 25, 2025

Commit

c37925f

verified ·

1 Parent(s): 9fa149c

Upload README2.md

Browse files

Files changed (1) hide show

README2.md +765 -0

README2.md ADDED Viewed

	@@ -0,0 +1,765 @@

+# Scratch Vision Game - Technical Documentation
+## Overview
+The Scratch Vision Game is an AI-powered system that converts visual Scratch programming blocks from images/PDFs into functional Scratch 3.0 projects (.sb3 files). The system uses computer vision, OCR, and large language models to analyze, interpret, and reconstruct Scratch programs from visual inputs.
+## System Architecture
+### Core Components
+1. **Image Processing Pipeline** (`app.py`)
+   - PDF extraction and image preprocessing
+   - Multi-modal image enhancement using OpenCV
+   - OCR text extraction with Tesseract
+   - Visual similarity matching using multiple algorithms
+2. **Block Recognition System** (`utils/block_relation_builder.py`)
+   - Scratch block catalog management
+   - Pseudocode to JSON conversion
+   - Block relationship building and validation
+   - Project structure generation
+3. **AI Processing Layer**
+   - LLM-based code interpretation using Groq/LLaMA
+   - Multi-modal vision models for image captioning
+   - Semantic understanding of Scratch programming concepts
+## Process Flow & System Tree Structure
+### Complete User Journey Tree
+```
+USER INPUT (PDF File via Web Interface)
+│
+├── 📁 /process_pdf [POST] - Flask Route Handler
+│   │
+│   ├── 🔍 PDF Validation & Security
+│   │   ├── secure_filename() - Sanitize filename
+│   │   ├── tempfile.mkdtemp() - Create temp directory
+│   │   └── pdf_file.save() - Save to temp location
+│   │
+│   ├── 📄 PDF Processing Pipeline
+│   │   │
+│   │   ├── 🎯 extract_images_from_pdf()
+│   │   │   ├── partition_pdf() - Unstructured library extraction
+│   │   │   │   ├── strategy="hi_res"
+│   │   │   │   ├── extract_image_block_types=["Image"]
+│   │   │   │   └── extract_image_block_to_payload=True
+│   │   │   │
+│   │   │   ├── 💾 Save extracted.json
+│   │   │   │   └── /outputs/EXTRACTED_JSON/{pdf_name}/extracted.json
+│   │   │   │
+│   │   │   └── 🔄 For Each Extracted Image:
+│   │   │       │
+│   │   │       ├── 🖼️ Image Processing Branch
+│   │   │       │   ├── base64.b64decode() - Decode image data
+│   │   │       │   ├── Image.open() - PIL image creation
+│   │   │       │   ├── image.save() - Save as PNG
+│   │   │       │   └── /outputs/DETECTED_IMAGE/{pdf_name}/Sprite_{i}.png
+│   │   │       │
+│   │   │       └── 🤖 AI Analysis Branch (Parallel)
+│   │   │           │
+│   │   │           ├── 📝 Description Generation
+│   │   │           │   ├── LangGraph Agent (Groq LLaMA)
+│   │   │           │   ├── Prompt: "Give a brief Captioning."
+│   │   │           │   └── response["messages"][-1].content
+│   │   │           │
+│   │   │           ├── 🏷️ Name Generation
+│   │   │           │   ├── LangGraph Agent (Groq LLaMA)
+│   │   │           │   ├── Prompt: "give a short name caption"
+│   │   │           │   └── response["messages"][-1].content
+│   │   │           │
+│   │   │           └── 📋 Metadata Assembly
+│   │   │               └── extracted_sprites.json
+│   │   │                   ├── "Sprite {count}": {
+│   │   │                   │   ├── "name": AI_generated_name
+│   │   │                   │   ├── "base64": image_data
+│   │   │                   │   ├── "file-path": pdf_directory
+│   │   │                   │   └── "description": AI_description
+│   │   │                   └── }
+│   │
+│   └── 🎮 Project Generation Pipeline
+│       │
+│       ├── 🔍 similarity_matching()
+│       │   │
+│       │   ├── 📊 Embedding Generation Branch
+│       │   │   │
+│       │   │   ├── 🎯 Query Processing
+│       │   │   │   ├── base64.b64decode() - Decode sprite images
+│       │   │   │   ├── tempfile.mkdtemp() - Create temp workspace
+│       │   │   │   └── Image.save() - Save temp sprite files
+│       │   │   │
+│       │   │   ├── 🧠 CLIP Embeddings
+│       │   │   │   ├── OpenCLIPEmbeddings() - Initialize embedder
+│       │   │   │   ├── clip_embd.embed_image() - Generate embeddings
+│       │   │   │   └── sprite_features = np.array()
+│       │   │   │
+│       │   │   └── 📈 Similarity Computation
+│       │   │       ├── Load: /outputs/embeddings.json
+│       │   │       ├── np.matmul(sprite_matrix, img_matrix.T)
+│       │   │       └── np.argmax(similarity, axis=1)
+│       │   │
+│       │   ├── 🎨 Asset Matching & Collection
+│       │   │   │
+│       │   │   ├── 🧙‍♂️ Sprite Assets Branch
+│       │   │   │   ├── Match: /blocks/sprites/{matched_folder}/
+│       │   │   │   ├── Load: sprite.json
+│       │   │   │   ├── Copy: All files except matched image & sprite.json
+│       │   │   │   └── Append to: project_data[]
+│       │   │   │
+│       │   │   └── 🌄 Backdrop Assets Branch (Parallel)
+│       │   │       ├── Match: /blocks/Backdrops/{matched_folder}/
+│       │   │       ├── Load: project.json
+│       │   │       ├── Copy: All files except matched image & project.json
+│       │   │       └── Extract: Stage targets → backdrop_data[]
+│       │   │
+│       │   └── 🏗️ Project Assembly
+│       │       │
+│       │       ├── 📋 JSON Structure Creation
+│       │       │   ├── final_project = {
+│       │       │   │   ├── "targets": []
+│       │       │   │   ├── "monitors": []
+│       │       │   │   ├── "extensions": []
+│       │       │   │   └── "meta": {...}
+│       │       │   └── }
+│       │       │
+│       │       ├── 🧙‍♂️ Sprite Integration
+│       │       │   └── For sprite in project_data:
+│       │       │       └── if not sprite.get("isStage"):
+│       │       │           └── final_project["targets"].append(sprite)
+│       │       │
+│       │       ├── 🌄 Stage/Backdrop Integration
+│       │       │   └── if backdrop_data:
+│       │       │       ├── Merge: all_costumes.extend()
+│       │       │       ├── Merge: sounds from first backdrop
+│       │       │       └── Create: Stage target with merged assets
+│       │       │
+│       │       └── 💾 Final Output
+│       │           ├── /outputs/project_{uuid}/project.json
+│       │           └── Return: project_json_path
+│
+├── 📤 Response Generation
+│   └── JSON Response:
+│       ├── "message": "✅ PDF processed successfully"
+│       ├── "output_json": extracted_sprites_path
+│       ├── "sprites": sprite_metadata
+│       ├── "project_output_json": final_project_path
+│       └── "test_url": download_link
+│
+└── 📥 /download_sb3/{project_id} [GET] - Download Endpoint
+    ├── Locate: /game_samples/{project_id}.sb3
+    ├── Validate: File existence
+    └── send_from_directory() - Serve .sb3 file
+```
+### Parallel Processing Branches
+```
+🔄 CONCURRENT OPERATIONS DURING PDF PROCESSING:
+├── 🖼️ Image Processing Thread
+│   ├── OpenCV Enhancement Pipeline
+│   │   ├── upscale_image_cv() - 2x cubic interpolation
+│   │   ├── reduce_noise_cv() - Non-local means denoising
+│   │   ├── sharpen_cv() - Kernel-based sharpening
+│   │   └── enhance_contrast_cv() - Contrast enhancement
+│   │
+│   └── Multi-Algorithm Similarity Matching
+│       ├── DINOv2 Embeddings (Semantic)
+│       ├── PHash (Perceptual Hashing)
+│       └── Image Signatures (Goldberg Algorithm)
+├── 🤖 AI Processing Thread
+│   ├── SmolVLM Vision Model
+│   │   ├── Image Captioning
+│   │   └── Name Generation
+│   │
+│   └── Groq LLaMA Language Model
+│       ├── OCR Text Refinement
+│       ├── Pseudocode Generation
+│       └── JSON Structure Validation
+└── 💾 I/O Operations Thread
+    ├── File System Operations
+    │   ├── Directory Creation
+    │   ├── Image Saving/Loading
+    │   └── JSON Serialization
+    │
+    └── Asset Management
+        ├── Reference Asset Loading
+        ├── Project Asset Copying
+        └── Final Project Assembly
+```
+### Data Flow Diagram
+```
+📊 DATA TRANSFORMATION PIPELINE:
+PDF Bytes → Images → Enhanced Images → Embeddings → Similarities → Assets → .sb3
+    ↓           ↓            ↓             ↓            ↓          ↓       ↓
+[Binary]   [PIL.Image]  [np.ndarray]  [np.float32]  [indices]  [JSON]  [ZIP]
+    │           │            │             │            │          │       │
+    ├─ OCR ─────┼─ AI ───────┼─ Models ────┼─ Search ───┼─ Match ──┼─ Build┤
+    │           │            │             │            │          │       │
+    └─ Text ────┴─ Metadata ─┴─ Features ──┴─ Ranking ──┴─ Select ─┴─ Pack ┘
+```
+### Key Processing Functions
+**Input Processing:**
+- `extract_images_from_pdf()` - Extracts images from PDF using unstructured library
+- `process_image_cv2_from_pil()` - Enhances images using OpenCV (upscaling, denoising, sharpening)
+### 2. Visual Similarity Matching
+```
+Query Image → Multi-Algorithm Matching → Asset Selection → Project Assembly
+```
+**Algorithms Used:**
+- **DINOv2 Embeddings**: Deep learning-based semantic similarity
+- **Perceptual Hashing (PHash)**: Structural image comparison
+- **Image Signatures**: Goldberg algorithm for visual fingerprinting
+**Implementation:**
+```python
+def run_query_search_flow(query_b64, embeddings_dict, hash_dict, signature_obj_map):
+    # 1. Preprocess query image
+    enhanced_query_pil = process_image_cv2_from_pil(query_from_b64, scale=2)
+    # 2. Generate embeddings
+    query_emb = get_dinov2_embedding_from_pil(prepped)
+    query_phash = phash.encode_image(image_array=query_hash_arr)
+    query_sig = gis.generate_signature(query_sig_path)
+    # 3. Compute similarities
+    emb_sim = cosine_similarity(query_emb, stored_emb)
+    ph_sim = 1.0 - (hamming_distance / MAX_PHASH_BITS)
+    im_sim = 1.0 - gis.normalized_distance(stored_sig, query_sig)
+    # 4. Combine scores
+    combined = (emb_clamped + ph_sim + im_sim) / 3.0
+```
+### 3. Code Block Recognition
+```
+OCR Text → LLM Processing → Pseudocode → Block Mapping → JSON Generation
+```
+**LLM System Prompt:**
+```python
+SYSTEM_PROMPT = """Your task is to process OCR-extracted text from images of Scratch 3.0 code blocks and produce precisely formatted pseudocode JSON.
+### Core Role
+- Treat this as an OCR refinement task: the input may contain typos or spacing issues.
+- Intelligently correct OCR mistakes to align with valid Scratch 3.0 block syntax.
+### Universal Rules
+1. Code Detection: If no Scratch blocks are detected, the `pseudocode` value must be "No Code-blocks".
+2. Script Ownership: Determine the target from "Script for:". If it matches a `Stage_costumes` name, set `name_variable` to "Stage".
+3. Pseudocode Structure: The pseudocode must be a single JSON string with `\n` for newlines.
+"""
+```
+### 4. Project Generation
+```
+Pseudocode → Block Definitions → Relationship Building → .sb3 Assembly
+```
+## Libraries and Dependencies
+### Core Libraries
+#### Computer Vision & Image Processing
+- **OpenCV** (`cv2`): Image enhancement, filtering, and preprocessing
+- **PIL/Pillow**: Image manipulation and format conversion
+- **imagededup**: Perceptual hashing for duplicate detection
+- **image-match**: Visual similarity using Goldberg signatures
+#### Machine Learning & AI
+- **transformers**: Hugging Face models (DINOv2, SmolVLM)
+- **torch**: PyTorch for deep learning inference
+- **sentence-transformers**: Text and image embeddings
+- **faiss-cpu**: Fast similarity search and clustering
+- **open_clip_torch**: OpenAI CLIP embeddings
+#### Language Models
+- **langchain**: LLM orchestration and chaining
+- **langchain-groq**: Groq API integration
+- **langgraph**: Graph-based agent workflows
+#### Document Processing
+- **unstructured**: PDF parsing and content extraction
+- **pdf2image**: PDF to image conversion
+- **pytesseract**: OCR text extraction
+- **PyPDF2**: PDF manipulation
+#### Web Framework
+- **Flask**: Web application framework
+- **Flask-SocketIO**: Real-time communication
+- **gunicorn**: WSGI HTTP server
+### Model Specifications
+#### Vision Models
+```python
+# DINOv2 for semantic image understanding
+DINOV2_MODEL = "facebook/dinov2-small"
+dinov2_processor = AutoImageProcessor.from_pretrained(DINOV2_MODEL)
+dinov2_model = AutoModel.from_pretrained(DINOV2_MODEL)
+# SmolVLM for image captioning
+smolvlm256m_processor = AutoProcessor.from_pretrained("HuggingFaceTB/SmolVLM-256M-Instruct")
+smolvlm256m_model = AutoModelForVision2Seq.from_pretrained("HuggingFaceTB/SmolVLM-256M-Instruct")
+```
+#### Language Model
+```python
+# Groq LLaMA for code interpretation
+llm = ChatGroq(
+    model="meta-llama/llama-4-scout-17b-16e-instruct",
+    temperature=0,
+    max_tokens=None,
+)
+```
+## Technical Approaches
+### 1. Multi-Modal Image Enhancement
+**OpenCV Pipeline:**
+```python
+def process_image_cv2_from_pil(pil_img, scale=2):
+    bgr = pil_to_bgr_np(pil_img)
+    bgr = upscale_image_cv(bgr, scale=scale)  # Cubic interpolation
+    bgr = reduce_noise_cv(bgr)                # Non-local means denoising
+    bgr = sharpen_cv(bgr)                     # Kernel-based sharpening
+    bgr = enhance_contrast_cv(bgr)            # Contrast enhancement
+    return bgr_np_to_pil(bgr)
+```
+### 2. Hybrid Similarity Scoring
+**Multi-Algorithm Consensus:**
+```python
+def choose_top_candidates(embedding_results, phash_results, imgmatch_results):
+    # Method A: Normalized weighted average
+    weighted_scores[p] = (w_emb * emb_norm[p] + w_ph * ph_norm[p] + w_im * im_norm[p])
+    # Method B: Rank-sum (Borda count)
+    rank_sum[p] = rank_emb[p] + rank_ph[p] + rank_im[p]
+    # Method C: Harmonic mean (penalizes missing values)
+    harm = 3.0 / ((1.0/a) + (1.0/b) + (1.0/c))
+```
+### 3. Block Relationship Building
+**Scratch Block Catalog System:**
+```python
+def generate_blocks_from_opcodes(opcode_counts, all_block_definitions):
+    """
+    Generates Scratch blocks with proper parent-child relationships
+    - Hat blocks: topLevel=True, parent=None
+    - Stack blocks: Linked via 'next' field
+    - C-blocks: Contains SUBSTACK inputs
+    - Shadow blocks: Linked as input values
+    """
+```
+### 4. Project Assembly Pipeline
+**JSON Structure Generation:**
+```python
+final_project = {
+    "targets": [],      # Sprites and Stage
+    "monitors": [],     # Variable/list monitors
+    "extensions": [],   # Scratch extensions
+    "meta": {
+        "semver": "3.0.0",
+        "vm": "11.3.0",
+        "agent": "OpenAI ScratchVision Agent"
+    }
+}
+```
+## File System Architecture
+### Project Directory Structure
+```
+📁 scratch-vision-game/
+├── 🐍 app.py                          # Main Flask application (PRIMARY)
+├── 📋 requirements.txt                # Python dependencies
+├── 🐳 Dockerfile                      # Container configuration
+├── 📖 README.md                       # Basic project info
+├── 📖 README2.md                      # Technical documentation
+│
+├── 📁 utils/                          # Core processing utilities
+│   └── 🔧 block_relation_builder.py   # Scratch block logic & JSON generation
+│
+├── 📁 blocks/                         # Scratch block definitions & assets
+│   ├── 📊 blocks.json                 # Main block catalog
+│   ├── 📊 boolean_blocks.json         # Boolean/condition blocks
+│   ├── 📊 cap_blocks.json            # Terminal blocks (stop, delete clone)
+│   ├── 📊 c_blocks.json              # Control flow blocks (if, repeat, forever)
+│   ├── 📊 control_blocks.json        # Control category blocks
+│   ├── 📊 data_blocks.json           # Variables and lists blocks
+│   ├── 📊 event_blocks.json          # Event/trigger blocks
+│   ├── 📊 hat_blocks.json            # Script starter blocks
+│   ├── 📊 looks_blocks.json          # Appearance blocks
+│   ├── 📊 motion_blocks.json         # Movement blocks
+│   ├── 📊 operator_blocks.json       # Math and logic operators
+│   ├── 📊 reporter_blocks.json       # Value reporter blocks
+│   ├── 📊 sensing_blocks.json        # Sensor blocks
+│   ├── 📊 sound_blocks.json          # Audio blocks
+│   ├── 📊 stack_blocks.json          # Sequential action blocks
+│   │
+│   ├── 📁 sprites/                    # Reference sprite assets
+│   │   ├── 📁 {sprite_name}/
+│   │   │   ├── 🖼️ {sprite_image}.png
+│   │   │   ├── 📊 sprite.json         # Sprite definition
+│   │   │   └── 🎵 {sounds}.wav
+│   │   └── ...
+│   │
+│   ├── 📁 Backdrops/                  # Reference backdrop assets
+│   │   ├── 📁 {backdrop_name}/
+│   │   │   ├── 🖼️ {backdrop_image}.png
+│   │   │   ├── 📊 project.json        # Stage definition
+│   │   │   └── 🎵 {sounds}.wav
+│   │   └── ...
+│   │
+│   └── 📁 sound/                      # Audio assets library
+│       └── 🎵 *.wav
+│
+├── 📁 templates/                      # Flask HTML templates
+│   └── 🌐 *.html
+│
+├── 📁 static/                         # Web static assets
+│   ├── 🎨 css/
+│   ├── 📜 js/
+│   └── 🖼️ images/
+│
+├── 📁 game_samples/                   # Pre-built .sb3 files
+│   └── 🎮 *.sb3
+│
+├── 📁 generated_projects/             # Runtime generated projects
+│   └── 📁 project_{uuid}/
+│       ├── 📊 project.json
+│       ├── 🖼️ *.png
+│       └── 🎵 *.wav
+│
+└── 📁 outputs/                        # Processing outputs (Runtime)
+    ├── 📁 DETECTED_IMAGE/             # Extracted & processed images
+    │   └── 📁 {pdf_name}/
+    │       └── 🖼️ Sprite_*.png
+    │
+    ├── 📁 SCANNED_IMAGE/              # Original scanned images
+    │
+    ├── 📁 EXTRACTED_JSON/             # Intermediate JSON data
+    │   └── 📁 {pdf_name}/
+    │       ├── 📊 extracted.json      # Raw PDF extraction
+    │       └── 📊 extracted_sprites.json  # AI-processed sprites
+    │
+    └── 📊 embeddings.json             # Pre-computed embeddings cache
+```
+### Runtime Directory Creation Flow
+```
+🏗️ DYNAMIC DIRECTORY CREATION:
+User Upload → PDF Processing → Directory Structure
+     │              │                    │
+     ├─ temp_dir ───┼─ pdf_filename ─────┼─ /outputs/DETECTED_IMAGE/{pdf_name}/
+     │              │                    ├─ /outputs/EXTRACTED_JSON/{pdf_name}/
+     │              │                    └─ /generated_projects/project_{uuid}/
+     │              │
+     └─ secure_filename() ──────────────────→ Sanitized paths
+```
+### Data Persistence Locations
+```
+💾 PERSISTENT DATA STORAGE:
+├── 🔄 Input Processing
+│   ├── /tmp/{random}/ - Temporary PDF storage
+│   ├── /outputs/DETECTED_IMAGE/ - Extracted sprite images
+│   ├── /outputs/EXTRACTED_JSON/ - Processing metadata
+│   └── /outputs/embeddings.json - Similarity search cache
+│
+├── 🎯 Asset Matching
+│   ├── /blocks/sprites/ - Reference sprite library
+│   ├── /blocks/Backdrops/ - Reference backdrop library
+│   └── /blocks/*.json - Block definition catalogs
+│
+└── 🎮 Final Output
+    ├── /generated_projects/project_{uuid}/ - Assembled project
+    ├── /game_samples/{project_id}.sb3 - Downloadable Scratch file
+    └── /logs/app.log - Application logs
+```
+## API Endpoints
+### `/process_pdf` (POST)
+Processes uploaded PDF files containing Scratch code blocks.
+**Request:**
+```
+Content-Type: multipart/form-data
+pdf_file: <PDF file>
+```
+**Response:**
+```json
+{
+    "message": "✅ PDF processed successfully",
+    "output_json": "path/to/extracted.json",
+    "sprites": {...},
+    "project_output_json": "path/to/project.json"
+}
+```
+### `/download_sb3/<project_id>` (GET)
+Downloads generated Scratch 3.0 project files.
+## Processing Timeline & Performance
+### Execution Timeline Tree
+```
+⏱️ PROCESSING TIMELINE (Typical PDF with 5 images):
+📤 User Upload (0.0s)
+│
+├── 🔍 PDF Validation (0.1s)
+│   └── File security & temp storage
+│
+├── 📄 PDF Extraction (2-5s)
+│   ├── partition_pdf() - Unstructured processing
+│   ├── Image extraction & base64 encoding
+│   └── extracted.json creation
+│
+├── 🤖 AI Processing (10-15s per image)
+│   ├── 📝 Description Generation (5-7s)
+│   │   ├── LangGraph agent initialization
+│   │   ├── Groq API call
+│   │   └── Response processing
+│   │
+│   ├── 🏷️ Name Generation (5-7s)
+│   │   ├── Second LangGraph agent call
+│   │   ├── Groq API call
+│   │   └── Response processing
+│   │
+│   └── 📋 Metadata Assembly (0.1s)
+│       └── JSON structure creation
+│
+├── 🔍 Similarity Matching (3-8s)
+│   ├── 🎯 Image Decoding (0.5s)
+│   ├── 🧠 CLIP Embeddings (2-3s)
+│   ├── 📈 Similarity Computation (0.5s)
+│   └── 🎨 Asset Matching (2-4s)
+│
+├── 🏗️ Project Assembly (1-2s)
+│   ├── JSON merging
+│   ├── Asset copying
+│   └── Final project creation
+│
+└── 📤 Response Generation (0.1s)
+    └── JSON response formatting
+TOTAL: ~60-90 seconds for 5-image PDF
+```
+### Performance Bottlenecks & Optimizations
+```
+🚀 PERFORMANCE OPTIMIZATION STRATEGIES:
+├── 🧠 Model Loading (Startup Cost)
+│   ├── ✅ Pre-loaded global models
+│   │   ├── DINOv2: ~2GB VRAM
+│   │   ├── SmolVLM: ~1GB VRAM
+│   │   └── CLIP: ~500MB VRAM
+│   │
+│   ├── ✅ GPU Acceleration (when available)
+│   │   └── torch.device("cuda" if torch.cuda.is_available() else "cpu")
+│   │
+│   └── ✅ CPU Optimization
+│       └── torch.set_num_threads(4)
+│
+├── 🖼️ Image Processing Pipeline
+│   ├── ✅ Efficient NumPy Operations
+│   │   ├── Vectorized computations
+│   │   ├── In-place operations where possible
+│   │   └── Memory-mapped file access
+│   │
+│   ├── ✅ OpenCV Optimizations
+│   │   ├── Multi-threaded operations
+│   │   ├── SIMD instructions
+│   │   └── Optimized algorithms
+│   │
+│   └── ✅ Memory Management
+│       ├── Garbage collection hints
+│       ├── Temporary file cleanup
+│       └── Buffer reuse
+│
+├── 🔍 Similarity Search Acceleration
+│   ├── ✅ Pre-computed Embeddings Cache
+│   │   └── /outputs/embeddings.json (persistent)
+│   │
+│   ├── ✅ Normalized Embeddings
+│   │   ├── Cosine similarity via dot product
+│   │   └── L2 normalization preprocessing
+│   │
+│   └── ✅ Parallel Algorithm Execution
+│       ├── DINOv2, PHash, ImageMatch concurrent
+│       └── Multi-threaded similarity computation
+│
+└── 🌐 API & I/O Optimizations
+    ├── ✅ Async File Operations
+    ├── ✅ Streaming Responses
+    ├── ✅ Connection Pooling
+    └── ✅ Compression (gzip)
+```
+### Memory Usage Profile
+```
+💾 MEMORY CONSUMPTION BREAKDOWN:
+├── 🧠 AI Models (Peak: ~4GB)
+│   ├── DINOv2 Model: ~2GB
+│   ├── SmolVLM Model: ~1GB
+│   ├── CLIP Embeddings: ~500MB
+│   └── Groq API Client: ~100MB
+│
+├── 🖼️ Image Processing (Peak: ~500MB per image)
+│   ├── Original PIL Images: ~50MB each
+│   ├── Enhanced Images: ~100MB each
+│   ├── OpenCV Buffers: ~200MB each
+│   └── Embedding Vectors: ~2KB each
+│
+├── 📊 Data Structures (Peak: ~200MB)
+│   ├── Block Definitions: ~50MB
+│   ├── Asset Metadata: ~100MB
+│   ├── Similarity Matrices: ~50MB
+│   └── JSON Structures: ~10MB
+│
+└── 🌐 Web Framework (Baseline: ~100MB)
+    ├── Flask Application: ~50MB
+    ├── Request Buffers: ~30MB
+    └── Response Caching: ~20MB
+TOTAL PEAK: ~5GB (with GPU models loaded)
+TOTAL BASELINE: ~1GB (CPU-only, no active processing)
+```
+### Performance Optimizations
+### 1. Model Caching
+- Pre-loaded models with global variables
+- GPU acceleration when available
+- Batch processing for multiple images
+### 2. Image Processing
+- Efficient numpy operations
+- OpenCV optimizations
+- Memory management for large images
+### 3. Similarity Search
+- FAISS indexing for fast nearest neighbor search
+- Normalized embeddings for cosine similarity
+- Parallel processing of multiple algorithms
+## Error Handling
+### 1. Graceful Degradation
+```python
+def process_image_cv2_from_pil(pil_img, scale=2):
+    try:
+        # OpenCV enhancement pipeline
+        return enhanced_image
+    except Exception as e:
+        print(f"Enhancement failed: {e}")
+        return original_image  # Fallback to original
+```
+### 2. JSON Validation
+```python
+agent_json_resolver = create_react_agent(
+    model=llm,
+    prompt=SYSTEM_PROMPT_JSON_CORRECTOR
+)
+```
+## Deployment
+### Docker Configuration
+```dockerfile
+FROM python:3.11-slim
+# System dependencies: tesseract-ocr, poppler-utils, libgl1
+# Python dependencies: requirements.txt
+# Environment: Flask production mode
+EXPOSE 7860
+CMD ["python", "app.py"]
+```
+### Environment Variables
+- `GROQ_API_KEY`: API key for Groq language model
+- `TRANSFORMERS_CACHE`: Model cache directory
+- `HF_HOME`: Hugging Face cache directory
+## Future Enhancements
+1. **Real-time Processing**: WebSocket integration for live feedback
+2. **Advanced OCR**: Custom trained models for Scratch block recognition
+3. **Multi-language Support**: International Scratch block recognition
+4. **Collaborative Features**: Multi-user project editing
+5. **Performance Monitoring**: Detailed analytics and optimization metrics
+## Contributing
+The system is designed with modularity in mind:
+- Add new block definitions in `blocks/` directory
+- Extend similarity algorithms in the matching pipeline
+- Enhance OCR accuracy with custom preprocessing
+- Improve LLM prompts for better code interpretation
+## License
+Apache 2.0 License - See project repository for full details.