Spaces:
Sleeping
Sleeping
Upload README2.md
Browse files- README2.md +765 -0
README2.md
ADDED
|
@@ -0,0 +1,765 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Scratch Vision Game - Technical Documentation
|
| 2 |
+
|
| 3 |
+
## Overview
|
| 4 |
+
|
| 5 |
+
The Scratch Vision Game is an AI-powered system that converts visual Scratch programming blocks from images/PDFs into functional Scratch 3.0 projects (.sb3 files). The system uses computer vision, OCR, and large language models to analyze, interpret, and reconstruct Scratch programs from visual inputs.
|
| 6 |
+
|
| 7 |
+
## System Architecture
|
| 8 |
+
|
| 9 |
+
### Core Components
|
| 10 |
+
|
| 11 |
+
1. **Image Processing Pipeline** (`app.py`)
|
| 12 |
+
|
| 13 |
+
- PDF extraction and image preprocessing
|
| 14 |
+
- Multi-modal image enhancement using OpenCV
|
| 15 |
+
- OCR text extraction with Tesseract
|
| 16 |
+
- Visual similarity matching using multiple algorithms
|
| 17 |
+
|
| 18 |
+
2. **Block Recognition System** (`utils/block_relation_builder.py`)
|
| 19 |
+
|
| 20 |
+
- Scratch block catalog management
|
| 21 |
+
- Pseudocode to JSON conversion
|
| 22 |
+
- Block relationship building and validation
|
| 23 |
+
- Project structure generation
|
| 24 |
+
|
| 25 |
+
3. **AI Processing Layer**
|
| 26 |
+
- LLM-based code interpretation using Groq/LLaMA
|
| 27 |
+
- Multi-modal vision models for image captioning
|
| 28 |
+
- Semantic understanding of Scratch programming concepts
|
| 29 |
+
|
| 30 |
+
## Process Flow & System Tree Structure
|
| 31 |
+
|
| 32 |
+
### Complete User Journey Tree
|
| 33 |
+
|
| 34 |
+
```
|
| 35 |
+
USER INPUT (PDF File via Web Interface)
|
| 36 |
+
│
|
| 37 |
+
├── 📁 /process_pdf [POST] - Flask Route Handler
|
| 38 |
+
│ │
|
| 39 |
+
│ ├── 🔍 PDF Validation & Security
|
| 40 |
+
│ │ ├── secure_filename() - Sanitize filename
|
| 41 |
+
│ │ ├── tempfile.mkdtemp() - Create temp directory
|
| 42 |
+
│ │ └── pdf_file.save() - Save to temp location
|
| 43 |
+
│ │
|
| 44 |
+
│ ├── 📄 PDF Processing Pipeline
|
| 45 |
+
│ │ │
|
| 46 |
+
│ │ ├── 🎯 extract_images_from_pdf()
|
| 47 |
+
│ │ │ ├── partition_pdf() - Unstructured library extraction
|
| 48 |
+
│ │ │ │ ├── strategy="hi_res"
|
| 49 |
+
│ │ │ │ ├── extract_image_block_types=["Image"]
|
| 50 |
+
│ │ │ │ └── extract_image_block_to_payload=True
|
| 51 |
+
│ │ │ │
|
| 52 |
+
│ │ │ ├── 💾 Save extracted.json
|
| 53 |
+
│ │ │ │ └── /outputs/EXTRACTED_JSON/{pdf_name}/extracted.json
|
| 54 |
+
│ │ │ │
|
| 55 |
+
│ │ │ └── 🔄 For Each Extracted Image:
|
| 56 |
+
│ │ │ │
|
| 57 |
+
│ │ │ ├── 🖼️ Image Processing Branch
|
| 58 |
+
│ │ │ │ ├── base64.b64decode() - Decode image data
|
| 59 |
+
│ │ │ │ ├── Image.open() - PIL image creation
|
| 60 |
+
│ │ │ │ ├── image.save() - Save as PNG
|
| 61 |
+
│ │ │ │ └── /outputs/DETECTED_IMAGE/{pdf_name}/Sprite_{i}.png
|
| 62 |
+
│ │ │ │
|
| 63 |
+
│ │ │ └── 🤖 AI Analysis Branch (Parallel)
|
| 64 |
+
│ │ │ │
|
| 65 |
+
│ │ │ ├── 📝 Description Generation
|
| 66 |
+
│ │ │ │ ├── LangGraph Agent (Groq LLaMA)
|
| 67 |
+
│ │ │ │ ├── Prompt: "Give a brief Captioning."
|
| 68 |
+
│ │ │ │ └── response["messages"][-1].content
|
| 69 |
+
│ │ │ │
|
| 70 |
+
│ │ │ ├── 🏷️ Name Generation
|
| 71 |
+
│ │ │ │ ├── LangGraph Agent (Groq LLaMA)
|
| 72 |
+
│ │ │ │ ├── Prompt: "give a short name caption"
|
| 73 |
+
│ │ │ │ └── response["messages"][-1].content
|
| 74 |
+
│ │ │ │
|
| 75 |
+
│ │ │ └── 📋 Metadata Assembly
|
| 76 |
+
│ │ │ └── extracted_sprites.json
|
| 77 |
+
│ │ │ ├── "Sprite {count}": {
|
| 78 |
+
│ │ │ │ ├── "name": AI_generated_name
|
| 79 |
+
│ │ │ │ ├── "base64": image_data
|
| 80 |
+
│ │ │ │ ├── "file-path": pdf_directory
|
| 81 |
+
│ │ │ │ └── "description": AI_description
|
| 82 |
+
│ │ │ └── }
|
| 83 |
+
│ │
|
| 84 |
+
│ └── 🎮 Project Generation Pipeline
|
| 85 |
+
│ │
|
| 86 |
+
│ ├── 🔍 similarity_matching()
|
| 87 |
+
│ │ │
|
| 88 |
+
│ │ ├── 📊 Embedding Generation Branch
|
| 89 |
+
│ │ │ │
|
| 90 |
+
│ │ │ ├── 🎯 Query Processing
|
| 91 |
+
│ │ │ │ ├── base64.b64decode() - Decode sprite images
|
| 92 |
+
│ │ │ │ ├── tempfile.mkdtemp() - Create temp workspace
|
| 93 |
+
│ │ │ │ └── Image.save() - Save temp sprite files
|
| 94 |
+
│ │ │ │
|
| 95 |
+
│ │ │ ├── 🧠 CLIP Embeddings
|
| 96 |
+
│ │ │ │ ├── OpenCLIPEmbeddings() - Initialize embedder
|
| 97 |
+
│ │ │ │ ├── clip_embd.embed_image() - Generate embeddings
|
| 98 |
+
│ │ │ │ └── sprite_features = np.array()
|
| 99 |
+
│ │ │ │
|
| 100 |
+
│ │ │ └── 📈 Similarity Computation
|
| 101 |
+
│ │ │ ├── Load: /outputs/embeddings.json
|
| 102 |
+
│ │ │ ├── np.matmul(sprite_matrix, img_matrix.T)
|
| 103 |
+
│ │ │ └── np.argmax(similarity, axis=1)
|
| 104 |
+
│ │ │
|
| 105 |
+
│ │ ├── 🎨 Asset Matching & Collection
|
| 106 |
+
│ │ │ │
|
| 107 |
+
│ │ │ ├── 🧙♂️ Sprite Assets Branch
|
| 108 |
+
│ │ │ │ ├── Match: /blocks/sprites/{matched_folder}/
|
| 109 |
+
│ │ │ │ ├── Load: sprite.json
|
| 110 |
+
│ │ │ │ ├── Copy: All files except matched image & sprite.json
|
| 111 |
+
│ │ │ │ └── Append to: project_data[]
|
| 112 |
+
│ │ │ │
|
| 113 |
+
│ │ │ └── 🌄 Backdrop Assets Branch (Parallel)
|
| 114 |
+
│ │ │ ├── Match: /blocks/Backdrops/{matched_folder}/
|
| 115 |
+
│ │ │ ├── Load: project.json
|
| 116 |
+
│ │ │ ├── Copy: All files except matched image & project.json
|
| 117 |
+
│ │ │ └── Extract: Stage targets → backdrop_data[]
|
| 118 |
+
│ │ │
|
| 119 |
+
│ │ └── 🏗️ Project Assembly
|
| 120 |
+
│ │ │
|
| 121 |
+
│ │ ├── 📋 JSON Structure Creation
|
| 122 |
+
│ │ │ ├── final_project = {
|
| 123 |
+
│ │ │ │ ├── "targets": []
|
| 124 |
+
│ │ │ │ ├── "monitors": []
|
| 125 |
+
│ │ │ │ ├── "extensions": []
|
| 126 |
+
│ │ │ │ └── "meta": {...}
|
| 127 |
+
│ │ │ └── }
|
| 128 |
+
│ │ │
|
| 129 |
+
│ │ ├── 🧙♂️ Sprite Integration
|
| 130 |
+
│ │ │ └── For sprite in project_data:
|
| 131 |
+
│ │ │ └── if not sprite.get("isStage"):
|
| 132 |
+
│ │ │ └── final_project["targets"].append(sprite)
|
| 133 |
+
│ │ │
|
| 134 |
+
│ │ ├── 🌄 Stage/Backdrop Integration
|
| 135 |
+
│ │ │ └── if backdrop_data:
|
| 136 |
+
│ │ │ ├── Merge: all_costumes.extend()
|
| 137 |
+
│ │ │ ├── Merge: sounds from first backdrop
|
| 138 |
+
│ │ │ └── Create: Stage target with merged assets
|
| 139 |
+
│ │ │
|
| 140 |
+
│ │ └── 💾 Final Output
|
| 141 |
+
│ │ ├── /outputs/project_{uuid}/project.json
|
| 142 |
+
│ │ └── Return: project_json_path
|
| 143 |
+
│
|
| 144 |
+
├── 📤 Response Generation
|
| 145 |
+
│ └── JSON Response:
|
| 146 |
+
│ ├── "message": "✅ PDF processed successfully"
|
| 147 |
+
│ ├── "output_json": extracted_sprites_path
|
| 148 |
+
│ ├── "sprites": sprite_metadata
|
| 149 |
+
│ ├── "project_output_json": final_project_path
|
| 150 |
+
│ └── "test_url": download_link
|
| 151 |
+
│
|
| 152 |
+
└── 📥 /download_sb3/{project_id} [GET] - Download Endpoint
|
| 153 |
+
├── Locate: /game_samples/{project_id}.sb3
|
| 154 |
+
├── Validate: File existence
|
| 155 |
+
└── send_from_directory() - Serve .sb3 file
|
| 156 |
+
```
|
| 157 |
+
|
| 158 |
+
### Parallel Processing Branches
|
| 159 |
+
|
| 160 |
+
```
|
| 161 |
+
🔄 CONCURRENT OPERATIONS DURING PDF PROCESSING:
|
| 162 |
+
|
| 163 |
+
├── 🖼️ Image Processing Thread
|
| 164 |
+
│ ├── OpenCV Enhancement Pipeline
|
| 165 |
+
│ │ ├── upscale_image_cv() - 2x cubic interpolation
|
| 166 |
+
│ │ ├── reduce_noise_cv() - Non-local means denoising
|
| 167 |
+
│ │ ├── sharpen_cv() - Kernel-based sharpening
|
| 168 |
+
│ │ └── enhance_contrast_cv() - Contrast enhancement
|
| 169 |
+
│ │
|
| 170 |
+
│ └── Multi-Algorithm Similarity Matching
|
| 171 |
+
│ ├── DINOv2 Embeddings (Semantic)
|
| 172 |
+
│ ├── PHash (Perceptual Hashing)
|
| 173 |
+
│ └── Image Signatures (Goldberg Algorithm)
|
| 174 |
+
|
| 175 |
+
├── 🤖 AI Processing Thread
|
| 176 |
+
│ ├── SmolVLM Vision Model
|
| 177 |
+
│ │ ├── Image Captioning
|
| 178 |
+
│ │ └── Name Generation
|
| 179 |
+
│ │
|
| 180 |
+
│ └── Groq LLaMA Language Model
|
| 181 |
+
│ ├── OCR Text Refinement
|
| 182 |
+
│ ├── Pseudocode Generation
|
| 183 |
+
│ └── JSON Structure Validation
|
| 184 |
+
|
| 185 |
+
└── 💾 I/O Operations Thread
|
| 186 |
+
├── File System Operations
|
| 187 |
+
│ ├── Directory Creation
|
| 188 |
+
│ ├── Image Saving/Loading
|
| 189 |
+
│ └── JSON Serialization
|
| 190 |
+
│
|
| 191 |
+
└── Asset Management
|
| 192 |
+
├── Reference Asset Loading
|
| 193 |
+
├── Project Asset Copying
|
| 194 |
+
└── Final Project Assembly
|
| 195 |
+
```
|
| 196 |
+
|
| 197 |
+
### Data Flow Diagram
|
| 198 |
+
|
| 199 |
+
```
|
| 200 |
+
📊 DATA TRANSFORMATION PIPELINE:
|
| 201 |
+
|
| 202 |
+
PDF Bytes → Images → Enhanced Images → Embeddings → Similarities → Assets → .sb3
|
| 203 |
+
↓ ↓ ↓ ↓ ↓ ↓ ↓
|
| 204 |
+
[Binary] [PIL.Image] [np.ndarray] [np.float32] [indices] [JSON] [ZIP]
|
| 205 |
+
│ │ │ │ │ │ │
|
| 206 |
+
├─ OCR ─────┼─ AI ───────┼─ Models ────┼─ Search ───┼─ Match ──┼─ Build┤
|
| 207 |
+
│ │ │ │ │ │ │
|
| 208 |
+
└─ Text ────┴─ Metadata ─┴─ Features ──┴─ Ranking ──┴─ Select ─┴─ Pack ┘
|
| 209 |
+
```
|
| 210 |
+
|
| 211 |
+
### Key Processing Functions
|
| 212 |
+
|
| 213 |
+
**Input Processing:**
|
| 214 |
+
|
| 215 |
+
- `extract_images_from_pdf()` - Extracts images from PDF using unstructured library
|
| 216 |
+
- `process_image_cv2_from_pil()` - Enhances images using OpenCV (upscaling, denoising, sharpening)
|
| 217 |
+
|
| 218 |
+
### 2. Visual Similarity Matching
|
| 219 |
+
|
| 220 |
+
```
|
| 221 |
+
Query Image → Multi-Algorithm Matching → Asset Selection → Project Assembly
|
| 222 |
+
```
|
| 223 |
+
|
| 224 |
+
**Algorithms Used:**
|
| 225 |
+
|
| 226 |
+
- **DINOv2 Embeddings**: Deep learning-based semantic similarity
|
| 227 |
+
- **Perceptual Hashing (PHash)**: Structural image comparison
|
| 228 |
+
- **Image Signatures**: Goldberg algorithm for visual fingerprinting
|
| 229 |
+
|
| 230 |
+
**Implementation:**
|
| 231 |
+
|
| 232 |
+
```python
|
| 233 |
+
def run_query_search_flow(query_b64, embeddings_dict, hash_dict, signature_obj_map):
|
| 234 |
+
# 1. Preprocess query image
|
| 235 |
+
enhanced_query_pil = process_image_cv2_from_pil(query_from_b64, scale=2)
|
| 236 |
+
|
| 237 |
+
# 2. Generate embeddings
|
| 238 |
+
query_emb = get_dinov2_embedding_from_pil(prepped)
|
| 239 |
+
query_phash = phash.encode_image(image_array=query_hash_arr)
|
| 240 |
+
query_sig = gis.generate_signature(query_sig_path)
|
| 241 |
+
|
| 242 |
+
# 3. Compute similarities
|
| 243 |
+
emb_sim = cosine_similarity(query_emb, stored_emb)
|
| 244 |
+
ph_sim = 1.0 - (hamming_distance / MAX_PHASH_BITS)
|
| 245 |
+
im_sim = 1.0 - gis.normalized_distance(stored_sig, query_sig)
|
| 246 |
+
|
| 247 |
+
# 4. Combine scores
|
| 248 |
+
combined = (emb_clamped + ph_sim + im_sim) / 3.0
|
| 249 |
+
```
|
| 250 |
+
|
| 251 |
+
### 3. Code Block Recognition
|
| 252 |
+
|
| 253 |
+
```
|
| 254 |
+
OCR Text → LLM Processing → Pseudocode → Block Mapping → JSON Generation
|
| 255 |
+
```
|
| 256 |
+
|
| 257 |
+
**LLM System Prompt:**
|
| 258 |
+
|
| 259 |
+
```python
|
| 260 |
+
SYSTEM_PROMPT = """Your task is to process OCR-extracted text from images of Scratch 3.0 code blocks and produce precisely formatted pseudocode JSON.
|
| 261 |
+
|
| 262 |
+
### Core Role
|
| 263 |
+
- Treat this as an OCR refinement task: the input may contain typos or spacing issues.
|
| 264 |
+
- Intelligently correct OCR mistakes to align with valid Scratch 3.0 block syntax.
|
| 265 |
+
|
| 266 |
+
### Universal Rules
|
| 267 |
+
1. Code Detection: If no Scratch blocks are detected, the `pseudocode` value must be "No Code-blocks".
|
| 268 |
+
2. Script Ownership: Determine the target from "Script for:". If it matches a `Stage_costumes` name, set `name_variable` to "Stage".
|
| 269 |
+
3. Pseudocode Structure: The pseudocode must be a single JSON string with `\n` for newlines.
|
| 270 |
+
"""
|
| 271 |
+
```
|
| 272 |
+
|
| 273 |
+
### 4. Project Generation
|
| 274 |
+
|
| 275 |
+
```
|
| 276 |
+
Pseudocode → Block Definitions → Relationship Building → .sb3 Assembly
|
| 277 |
+
```
|
| 278 |
+
|
| 279 |
+
## Libraries and Dependencies
|
| 280 |
+
|
| 281 |
+
### Core Libraries
|
| 282 |
+
|
| 283 |
+
#### Computer Vision & Image Processing
|
| 284 |
+
|
| 285 |
+
- **OpenCV** (`cv2`): Image enhancement, filtering, and preprocessing
|
| 286 |
+
- **PIL/Pillow**: Image manipulation and format conversion
|
| 287 |
+
- **imagededup**: Perceptual hashing for duplicate detection
|
| 288 |
+
- **image-match**: Visual similarity using Goldberg signatures
|
| 289 |
+
|
| 290 |
+
#### Machine Learning & AI
|
| 291 |
+
|
| 292 |
+
- **transformers**: Hugging Face models (DINOv2, SmolVLM)
|
| 293 |
+
- **torch**: PyTorch for deep learning inference
|
| 294 |
+
- **sentence-transformers**: Text and image embeddings
|
| 295 |
+
- **faiss-cpu**: Fast similarity search and clustering
|
| 296 |
+
- **open_clip_torch**: OpenAI CLIP embeddings
|
| 297 |
+
|
| 298 |
+
#### Language Models
|
| 299 |
+
|
| 300 |
+
- **langchain**: LLM orchestration and chaining
|
| 301 |
+
- **langchain-groq**: Groq API integration
|
| 302 |
+
- **langgraph**: Graph-based agent workflows
|
| 303 |
+
|
| 304 |
+
#### Document Processing
|
| 305 |
+
|
| 306 |
+
- **unstructured**: PDF parsing and content extraction
|
| 307 |
+
- **pdf2image**: PDF to image conversion
|
| 308 |
+
- **pytesseract**: OCR text extraction
|
| 309 |
+
- **PyPDF2**: PDF manipulation
|
| 310 |
+
|
| 311 |
+
#### Web Framework
|
| 312 |
+
|
| 313 |
+
- **Flask**: Web application framework
|
| 314 |
+
- **Flask-SocketIO**: Real-time communication
|
| 315 |
+
- **gunicorn**: WSGI HTTP server
|
| 316 |
+
|
| 317 |
+
### Model Specifications
|
| 318 |
+
|
| 319 |
+
#### Vision Models
|
| 320 |
+
|
| 321 |
+
```python
|
| 322 |
+
# DINOv2 for semantic image understanding
|
| 323 |
+
DINOV2_MODEL = "facebook/dinov2-small"
|
| 324 |
+
dinov2_processor = AutoImageProcessor.from_pretrained(DINOV2_MODEL)
|
| 325 |
+
dinov2_model = AutoModel.from_pretrained(DINOV2_MODEL)
|
| 326 |
+
|
| 327 |
+
# SmolVLM for image captioning
|
| 328 |
+
smolvlm256m_processor = AutoProcessor.from_pretrained("HuggingFaceTB/SmolVLM-256M-Instruct")
|
| 329 |
+
smolvlm256m_model = AutoModelForVision2Seq.from_pretrained("HuggingFaceTB/SmolVLM-256M-Instruct")
|
| 330 |
+
```
|
| 331 |
+
|
| 332 |
+
#### Language Model
|
| 333 |
+
|
| 334 |
+
```python
|
| 335 |
+
# Groq LLaMA for code interpretation
|
| 336 |
+
llm = ChatGroq(
|
| 337 |
+
model="meta-llama/llama-4-scout-17b-16e-instruct",
|
| 338 |
+
temperature=0,
|
| 339 |
+
max_tokens=None,
|
| 340 |
+
)
|
| 341 |
+
```
|
| 342 |
+
|
| 343 |
+
## Technical Approaches
|
| 344 |
+
|
| 345 |
+
### 1. Multi-Modal Image Enhancement
|
| 346 |
+
|
| 347 |
+
**OpenCV Pipeline:**
|
| 348 |
+
|
| 349 |
+
```python
|
| 350 |
+
def process_image_cv2_from_pil(pil_img, scale=2):
|
| 351 |
+
bgr = pil_to_bgr_np(pil_img)
|
| 352 |
+
bgr = upscale_image_cv(bgr, scale=scale) # Cubic interpolation
|
| 353 |
+
bgr = reduce_noise_cv(bgr) # Non-local means denoising
|
| 354 |
+
bgr = sharpen_cv(bgr) # Kernel-based sharpening
|
| 355 |
+
bgr = enhance_contrast_cv(bgr) # Contrast enhancement
|
| 356 |
+
return bgr_np_to_pil(bgr)
|
| 357 |
+
```
|
| 358 |
+
|
| 359 |
+
### 2. Hybrid Similarity Scoring
|
| 360 |
+
|
| 361 |
+
**Multi-Algorithm Consensus:**
|
| 362 |
+
|
| 363 |
+
```python
|
| 364 |
+
def choose_top_candidates(embedding_results, phash_results, imgmatch_results):
|
| 365 |
+
# Method A: Normalized weighted average
|
| 366 |
+
weighted_scores[p] = (w_emb * emb_norm[p] + w_ph * ph_norm[p] + w_im * im_norm[p])
|
| 367 |
+
|
| 368 |
+
# Method B: Rank-sum (Borda count)
|
| 369 |
+
rank_sum[p] = rank_emb[p] + rank_ph[p] + rank_im[p]
|
| 370 |
+
|
| 371 |
+
# Method C: Harmonic mean (penalizes missing values)
|
| 372 |
+
harm = 3.0 / ((1.0/a) + (1.0/b) + (1.0/c))
|
| 373 |
+
```
|
| 374 |
+
|
| 375 |
+
### 3. Block Relationship Building
|
| 376 |
+
|
| 377 |
+
**Scratch Block Catalog System:**
|
| 378 |
+
|
| 379 |
+
```python
|
| 380 |
+
def generate_blocks_from_opcodes(opcode_counts, all_block_definitions):
|
| 381 |
+
"""
|
| 382 |
+
Generates Scratch blocks with proper parent-child relationships
|
| 383 |
+
- Hat blocks: topLevel=True, parent=None
|
| 384 |
+
- Stack blocks: Linked via 'next' field
|
| 385 |
+
- C-blocks: Contains SUBSTACK inputs
|
| 386 |
+
- Shadow blocks: Linked as input values
|
| 387 |
+
"""
|
| 388 |
+
```
|
| 389 |
+
|
| 390 |
+
### 4. Project Assembly Pipeline
|
| 391 |
+
|
| 392 |
+
**JSON Structure Generation:**
|
| 393 |
+
|
| 394 |
+
```python
|
| 395 |
+
final_project = {
|
| 396 |
+
"targets": [], # Sprites and Stage
|
| 397 |
+
"monitors": [], # Variable/list monitors
|
| 398 |
+
"extensions": [], # Scratch extensions
|
| 399 |
+
"meta": {
|
| 400 |
+
"semver": "3.0.0",
|
| 401 |
+
"vm": "11.3.0",
|
| 402 |
+
"agent": "OpenAI ScratchVision Agent"
|
| 403 |
+
}
|
| 404 |
+
}
|
| 405 |
+
```
|
| 406 |
+
|
| 407 |
+
## File System Architecture
|
| 408 |
+
|
| 409 |
+
### Project Directory Structure
|
| 410 |
+
|
| 411 |
+
```
|
| 412 |
+
📁 scratch-vision-game/
|
| 413 |
+
├── 🐍 app.py # Main Flask application (PRIMARY)
|
| 414 |
+
├── 📋 requirements.txt # Python dependencies
|
| 415 |
+
├── 🐳 Dockerfile # Container configuration
|
| 416 |
+
├── 📖 README.md # Basic project info
|
| 417 |
+
├── 📖 README2.md # Technical documentation
|
| 418 |
+
│
|
| 419 |
+
├── 📁 utils/ # Core processing utilities
|
| 420 |
+
│ └── 🔧 block_relation_builder.py # Scratch block logic & JSON generation
|
| 421 |
+
│
|
| 422 |
+
├── 📁 blocks/ # Scratch block definitions & assets
|
| 423 |
+
│ ├── 📊 blocks.json # Main block catalog
|
| 424 |
+
│ ├── 📊 boolean_blocks.json # Boolean/condition blocks
|
| 425 |
+
│ ├── 📊 cap_blocks.json # Terminal blocks (stop, delete clone)
|
| 426 |
+
│ ├── 📊 c_blocks.json # Control flow blocks (if, repeat, forever)
|
| 427 |
+
│ ├── 📊 control_blocks.json # Control category blocks
|
| 428 |
+
│ ├── 📊 data_blocks.json # Variables and lists blocks
|
| 429 |
+
│ ├── 📊 event_blocks.json # Event/trigger blocks
|
| 430 |
+
│ ├── 📊 hat_blocks.json # Script starter blocks
|
| 431 |
+
│ ├── 📊 looks_blocks.json # Appearance blocks
|
| 432 |
+
│ ├── 📊 motion_blocks.json # Movement blocks
|
| 433 |
+
│ ├── 📊 operator_blocks.json # Math and logic operators
|
| 434 |
+
│ ├── 📊 reporter_blocks.json # Value reporter blocks
|
| 435 |
+
│ ├── 📊 sensing_blocks.json # Sensor blocks
|
| 436 |
+
│ ├── 📊 sound_blocks.json # Audio blocks
|
| 437 |
+
│ ├── 📊 stack_blocks.json # Sequential action blocks
|
| 438 |
+
│ │
|
| 439 |
+
│ ├── 📁 sprites/ # Reference sprite assets
|
| 440 |
+
│ │ ├── 📁 {sprite_name}/
|
| 441 |
+
│ │ │ ├── 🖼️ {sprite_image}.png
|
| 442 |
+
│ │ │ ├── 📊 sprite.json # Sprite definition
|
| 443 |
+
│ │ │ └── 🎵 {sounds}.wav
|
| 444 |
+
│ │ └── ...
|
| 445 |
+
│ │
|
| 446 |
+
│ ├── 📁 Backdrops/ # Reference backdrop assets
|
| 447 |
+
│ │ ├── 📁 {backdrop_name}/
|
| 448 |
+
│ │ │ ├── 🖼️ {backdrop_image}.png
|
| 449 |
+
│ │ │ ├── 📊 project.json # Stage definition
|
| 450 |
+
│ │ │ └── 🎵 {sounds}.wav
|
| 451 |
+
│ │ └── ...
|
| 452 |
+
│ │
|
| 453 |
+
│ └── 📁 sound/ # Audio assets library
|
| 454 |
+
│ └── 🎵 *.wav
|
| 455 |
+
│
|
| 456 |
+
├── 📁 templates/ # Flask HTML templates
|
| 457 |
+
│ └── 🌐 *.html
|
| 458 |
+
│
|
| 459 |
+
├── 📁 static/ # Web static assets
|
| 460 |
+
│ ├── 🎨 css/
|
| 461 |
+
│ ├── 📜 js/
|
| 462 |
+
│ └── 🖼️ images/
|
| 463 |
+
│
|
| 464 |
+
├── 📁 game_samples/ # Pre-built .sb3 files
|
| 465 |
+
│ └── 🎮 *.sb3
|
| 466 |
+
│
|
| 467 |
+
├── 📁 generated_projects/ # Runtime generated projects
|
| 468 |
+
│ └── 📁 project_{uuid}/
|
| 469 |
+
│ ├── 📊 project.json
|
| 470 |
+
│ ├── 🖼️ *.png
|
| 471 |
+
│ └── 🎵 *.wav
|
| 472 |
+
│
|
| 473 |
+
└── 📁 outputs/ # Processing outputs (Runtime)
|
| 474 |
+
├── 📁 DETECTED_IMAGE/ # Extracted & processed images
|
| 475 |
+
│ └── 📁 {pdf_name}/
|
| 476 |
+
│ └── 🖼️ Sprite_*.png
|
| 477 |
+
│
|
| 478 |
+
├── 📁 SCANNED_IMAGE/ # Original scanned images
|
| 479 |
+
│
|
| 480 |
+
├── 📁 EXTRACTED_JSON/ # Intermediate JSON data
|
| 481 |
+
│ └── 📁 {pdf_name}/
|
| 482 |
+
│ ├── 📊 extracted.json # Raw PDF extraction
|
| 483 |
+
│ └── 📊 extracted_sprites.json # AI-processed sprites
|
| 484 |
+
│
|
| 485 |
+
└── 📊 embeddings.json # Pre-computed embeddings cache
|
| 486 |
+
```
|
| 487 |
+
|
| 488 |
+
### Runtime Directory Creation Flow
|
| 489 |
+
|
| 490 |
+
```
|
| 491 |
+
🏗️ DYNAMIC DIRECTORY CREATION:
|
| 492 |
+
|
| 493 |
+
User Upload → PDF Processing → Directory Structure
|
| 494 |
+
│ │ │
|
| 495 |
+
├─ temp_dir ───┼─ pdf_filename ─────┼─ /outputs/DETECTED_IMAGE/{pdf_name}/
|
| 496 |
+
│ │ ├─ /outputs/EXTRACTED_JSON/{pdf_name}/
|
| 497 |
+
│ │ └─ /generated_projects/project_{uuid}/
|
| 498 |
+
│ │
|
| 499 |
+
└─ secure_filename() ──────────────────→ Sanitized paths
|
| 500 |
+
```
|
| 501 |
+
|
| 502 |
+
### Data Persistence Locations
|
| 503 |
+
|
| 504 |
+
```
|
| 505 |
+
💾 PERSISTENT DATA STORAGE:
|
| 506 |
+
|
| 507 |
+
├── 🔄 Input Processing
|
| 508 |
+
│ ├── /tmp/{random}/ - Temporary PDF storage
|
| 509 |
+
│ ├── /outputs/DETECTED_IMAGE/ - Extracted sprite images
|
| 510 |
+
│ ├── /outputs/EXTRACTED_JSON/ - Processing metadata
|
| 511 |
+
│ └── /outputs/embeddings.json - Similarity search cache
|
| 512 |
+
│
|
| 513 |
+
├── 🎯 Asset Matching
|
| 514 |
+
│ ├── /blocks/sprites/ - Reference sprite library
|
| 515 |
+
│ ├── /blocks/Backdrops/ - Reference backdrop library
|
| 516 |
+
│ └── /blocks/*.json - Block definition catalogs
|
| 517 |
+
│
|
| 518 |
+
└── 🎮 Final Output
|
| 519 |
+
├── /generated_projects/project_{uuid}/ - Assembled project
|
| 520 |
+
├── /game_samples/{project_id}.sb3 - Downloadable Scratch file
|
| 521 |
+
└── /logs/app.log - Application logs
|
| 522 |
+
```
|
| 523 |
+
|
| 524 |
+
## API Endpoints
|
| 525 |
+
|
| 526 |
+
### `/process_pdf` (POST)
|
| 527 |
+
|
| 528 |
+
Processes uploaded PDF files containing Scratch code blocks.
|
| 529 |
+
|
| 530 |
+
**Request:**
|
| 531 |
+
|
| 532 |
+
```
|
| 533 |
+
Content-Type: multipart/form-data
|
| 534 |
+
pdf_file: <PDF file>
|
| 535 |
+
```
|
| 536 |
+
|
| 537 |
+
**Response:**
|
| 538 |
+
|
| 539 |
+
```json
|
| 540 |
+
{
|
| 541 |
+
"message": "✅ PDF processed successfully",
|
| 542 |
+
"output_json": "path/to/extracted.json",
|
| 543 |
+
"sprites": {...},
|
| 544 |
+
"project_output_json": "path/to/project.json"
|
| 545 |
+
}
|
| 546 |
+
```
|
| 547 |
+
|
| 548 |
+
### `/download_sb3/<project_id>` (GET)
|
| 549 |
+
|
| 550 |
+
Downloads generated Scratch 3.0 project files.
|
| 551 |
+
|
| 552 |
+
## Processing Timeline & Performance
|
| 553 |
+
|
| 554 |
+
### Execution Timeline Tree
|
| 555 |
+
|
| 556 |
+
```
|
| 557 |
+
⏱️ PROCESSING TIMELINE (Typical PDF with 5 images):
|
| 558 |
+
|
| 559 |
+
📤 User Upload (0.0s)
|
| 560 |
+
│
|
| 561 |
+
├── 🔍 PDF Validation (0.1s)
|
| 562 |
+
│ └── File security & temp storage
|
| 563 |
+
│
|
| 564 |
+
├── 📄 PDF Extraction (2-5s)
|
| 565 |
+
│ ├── partition_pdf() - Unstructured processing
|
| 566 |
+
│ ├── Image extraction & base64 encoding
|
| 567 |
+
│ └── extracted.json creation
|
| 568 |
+
│
|
| 569 |
+
├── 🤖 AI Processing (10-15s per image)
|
| 570 |
+
│ ├── 📝 Description Generation (5-7s)
|
| 571 |
+
│ │ ├── LangGraph agent initialization
|
| 572 |
+
│ │ ├── Groq API call
|
| 573 |
+
│ │ └── Response processing
|
| 574 |
+
│ │
|
| 575 |
+
│ ├── 🏷️ Name Generation (5-7s)
|
| 576 |
+
│ │ ├── Second LangGraph agent call
|
| 577 |
+
│ │ ├── Groq API call
|
| 578 |
+
│ │ └── Response processing
|
| 579 |
+
│ │
|
| 580 |
+
│ └── 📋 Metadata Assembly (0.1s)
|
| 581 |
+
│ └── JSON structure creation
|
| 582 |
+
│
|
| 583 |
+
├── 🔍 Similarity Matching (3-8s)
|
| 584 |
+
│ ├── 🎯 Image Decoding (0.5s)
|
| 585 |
+
│ ├── 🧠 CLIP Embeddings (2-3s)
|
| 586 |
+
│ ├── 📈 Similarity Computation (0.5s)
|
| 587 |
+
│ └── 🎨 Asset Matching (2-4s)
|
| 588 |
+
│
|
| 589 |
+
├── 🏗️ Project Assembly (1-2s)
|
| 590 |
+
│ ├── JSON merging
|
| 591 |
+
│ ├── Asset copying
|
| 592 |
+
│ └── Final project creation
|
| 593 |
+
│
|
| 594 |
+
└── 📤 Response Generation (0.1s)
|
| 595 |
+
└── JSON response formatting
|
| 596 |
+
|
| 597 |
+
TOTAL: ~60-90 seconds for 5-image PDF
|
| 598 |
+
```
|
| 599 |
+
|
| 600 |
+
### Performance Bottlenecks & Optimizations
|
| 601 |
+
|
| 602 |
+
```
|
| 603 |
+
🚀 PERFORMANCE OPTIMIZATION STRATEGIES:
|
| 604 |
+
|
| 605 |
+
├── 🧠 Model Loading (Startup Cost)
|
| 606 |
+
│ ├── ✅ Pre-loaded global models
|
| 607 |
+
│ │ ├── DINOv2: ~2GB VRAM
|
| 608 |
+
│ │ ├── SmolVLM: ~1GB VRAM
|
| 609 |
+
│ │ └── CLIP: ~500MB VRAM
|
| 610 |
+
│ │
|
| 611 |
+
│ ├── ✅ GPU Acceleration (when available)
|
| 612 |
+
│ │ └── torch.device("cuda" if torch.cuda.is_available() else "cpu")
|
| 613 |
+
│ │
|
| 614 |
+
│ └── ✅ CPU Optimization
|
| 615 |
+
│ └── torch.set_num_threads(4)
|
| 616 |
+
│
|
| 617 |
+
├── 🖼️ Image Processing Pipeline
|
| 618 |
+
│ ├── ✅ Efficient NumPy Operations
|
| 619 |
+
│ │ ├── Vectorized computations
|
| 620 |
+
│ │ ├── In-place operations where possible
|
| 621 |
+
│ │ └── Memory-mapped file access
|
| 622 |
+
│ │
|
| 623 |
+
│ ├── ✅ OpenCV Optimizations
|
| 624 |
+
│ │ ├── Multi-threaded operations
|
| 625 |
+
│ │ ├── SIMD instructions
|
| 626 |
+
│ │ └── Optimized algorithms
|
| 627 |
+
│ │
|
| 628 |
+
│ └── ✅ Memory Management
|
| 629 |
+
│ ├── Garbage collection hints
|
| 630 |
+
│ ├── Temporary file cleanup
|
| 631 |
+
│ └── Buffer reuse
|
| 632 |
+
│
|
| 633 |
+
├── 🔍 Similarity Search Acceleration
|
| 634 |
+
│ ├── ✅ Pre-computed Embeddings Cache
|
| 635 |
+
│ │ └── /outputs/embeddings.json (persistent)
|
| 636 |
+
│ │
|
| 637 |
+
│ ├── ✅ Normalized Embeddings
|
| 638 |
+
│ │ ├── Cosine similarity via dot product
|
| 639 |
+
│ │ └── L2 normalization preprocessing
|
| 640 |
+
│ │
|
| 641 |
+
│ └── ✅ Parallel Algorithm Execution
|
| 642 |
+
│ ├── DINOv2, PHash, ImageMatch concurrent
|
| 643 |
+
│ └── Multi-threaded similarity computation
|
| 644 |
+
│
|
| 645 |
+
└── 🌐 API & I/O Optimizations
|
| 646 |
+
├── ✅ Async File Operations
|
| 647 |
+
├── ✅ Streaming Responses
|
| 648 |
+
├── ✅ Connection Pooling
|
| 649 |
+
└── ✅ Compression (gzip)
|
| 650 |
+
```
|
| 651 |
+
|
| 652 |
+
### Memory Usage Profile
|
| 653 |
+
|
| 654 |
+
```
|
| 655 |
+
💾 MEMORY CONSUMPTION BREAKDOWN:
|
| 656 |
+
|
| 657 |
+
├── 🧠 AI Models (Peak: ~4GB)
|
| 658 |
+
│ ├── DINOv2 Model: ~2GB
|
| 659 |
+
│ ├── SmolVLM Model: ~1GB
|
| 660 |
+
│ ├── CLIP Embeddings: ~500MB
|
| 661 |
+
│ └── Groq API Client: ~100MB
|
| 662 |
+
│
|
| 663 |
+
├── 🖼️ Image Processing (Peak: ~500MB per image)
|
| 664 |
+
│ ├── Original PIL Images: ~50MB each
|
| 665 |
+
│ ├── Enhanced Images: ~100MB each
|
| 666 |
+
│ ├── OpenCV Buffers: ~200MB each
|
| 667 |
+
│ └── Embedding Vectors: ~2KB each
|
| 668 |
+
│
|
| 669 |
+
├── 📊 Data Structures (Peak: ~200MB)
|
| 670 |
+
│ ├── Block Definitions: ~50MB
|
| 671 |
+
│ ├── Asset Metadata: ~100MB
|
| 672 |
+
│ ├── Similarity Matrices: ~50MB
|
| 673 |
+
│ └── JSON Structures: ~10MB
|
| 674 |
+
│
|
| 675 |
+
└── 🌐 Web Framework (Baseline: ~100MB)
|
| 676 |
+
├── Flask Application: ~50MB
|
| 677 |
+
├── Request Buffers: ~30MB
|
| 678 |
+
└── Response Caching: ~20MB
|
| 679 |
+
|
| 680 |
+
TOTAL PEAK: ~5GB (with GPU models loaded)
|
| 681 |
+
TOTAL BASELINE: ~1GB (CPU-only, no active processing)
|
| 682 |
+
```
|
| 683 |
+
|
| 684 |
+
### Performance Optimizations
|
| 685 |
+
|
| 686 |
+
### 1. Model Caching
|
| 687 |
+
|
| 688 |
+
- Pre-loaded models with global variables
|
| 689 |
+
- GPU acceleration when available
|
| 690 |
+
- Batch processing for multiple images
|
| 691 |
+
|
| 692 |
+
### 2. Image Processing
|
| 693 |
+
|
| 694 |
+
- Efficient numpy operations
|
| 695 |
+
- OpenCV optimizations
|
| 696 |
+
- Memory management for large images
|
| 697 |
+
|
| 698 |
+
### 3. Similarity Search
|
| 699 |
+
|
| 700 |
+
- FAISS indexing for fast nearest neighbor search
|
| 701 |
+
- Normalized embeddings for cosine similarity
|
| 702 |
+
- Parallel processing of multiple algorithms
|
| 703 |
+
|
| 704 |
+
## Error Handling
|
| 705 |
+
|
| 706 |
+
### 1. Graceful Degradation
|
| 707 |
+
|
| 708 |
+
```python
|
| 709 |
+
def process_image_cv2_from_pil(pil_img, scale=2):
|
| 710 |
+
try:
|
| 711 |
+
# OpenCV enhancement pipeline
|
| 712 |
+
return enhanced_image
|
| 713 |
+
except Exception as e:
|
| 714 |
+
print(f"Enhancement failed: {e}")
|
| 715 |
+
return original_image # Fallback to original
|
| 716 |
+
```
|
| 717 |
+
|
| 718 |
+
### 2. JSON Validation
|
| 719 |
+
|
| 720 |
+
```python
|
| 721 |
+
agent_json_resolver = create_react_agent(
|
| 722 |
+
model=llm,
|
| 723 |
+
prompt=SYSTEM_PROMPT_JSON_CORRECTOR
|
| 724 |
+
)
|
| 725 |
+
```
|
| 726 |
+
|
| 727 |
+
## Deployment
|
| 728 |
+
|
| 729 |
+
### Docker Configuration
|
| 730 |
+
|
| 731 |
+
```dockerfile
|
| 732 |
+
FROM python:3.11-slim
|
| 733 |
+
# System dependencies: tesseract-ocr, poppler-utils, libgl1
|
| 734 |
+
# Python dependencies: requirements.txt
|
| 735 |
+
# Environment: Flask production mode
|
| 736 |
+
EXPOSE 7860
|
| 737 |
+
CMD ["python", "app.py"]
|
| 738 |
+
```
|
| 739 |
+
|
| 740 |
+
### Environment Variables
|
| 741 |
+
|
| 742 |
+
- `GROQ_API_KEY`: API key for Groq language model
|
| 743 |
+
- `TRANSFORMERS_CACHE`: Model cache directory
|
| 744 |
+
- `HF_HOME`: Hugging Face cache directory
|
| 745 |
+
|
| 746 |
+
## Future Enhancements
|
| 747 |
+
|
| 748 |
+
1. **Real-time Processing**: WebSocket integration for live feedback
|
| 749 |
+
2. **Advanced OCR**: Custom trained models for Scratch block recognition
|
| 750 |
+
3. **Multi-language Support**: International Scratch block recognition
|
| 751 |
+
4. **Collaborative Features**: Multi-user project editing
|
| 752 |
+
5. **Performance Monitoring**: Detailed analytics and optimization metrics
|
| 753 |
+
|
| 754 |
+
## Contributing
|
| 755 |
+
|
| 756 |
+
The system is designed with modularity in mind:
|
| 757 |
+
|
| 758 |
+
- Add new block definitions in `blocks/` directory
|
| 759 |
+
- Extend similarity algorithms in the matching pipeline
|
| 760 |
+
- Enhance OCR accuracy with custom preprocessing
|
| 761 |
+
- Improve LLM prompts for better code interpretation
|
| 762 |
+
|
| 763 |
+
## License
|
| 764 |
+
|
| 765 |
+
Apache 2.0 License - See project repository for full details.
|