deepvision-prompt-builder

Runtime error

App Files Files Community

deepvision-prompt-builder / README.md

Salman Abjam

Fix: Revert to Gradio 4.44.0 for HF Spaces compatibility

ecd082f 6 months ago

preview code

raw

history blame contribute delete

3.21 kB

A newer version of the Gradio SDK is available: 6.12.0

Upgrade

metadata

title: DeepVision Prompt Builder
emoji: 🎯
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
license: mit

🎯 DeepVision Prompt Builder

AI-Powered Image & Video Analysis with Automatic JSON Prompt Generation

Overview

DeepVision is a modular AI system that analyzes images and videos to generate structured JSON prompts. Perfect for:

📸 Automated image tagging
🎬 Video content analysis
🤖 AI training data preparation
📊 Media cataloging
🎨 Creative prompt generation

Features

Available Plugins

🎨 Color Analyzer (Fast): Extract dominant colors, color schemes, brightness, and saturation
🔍 Object Detector (CLIP): Zero-shot object detection with confidence scores
💬 Caption Generator (BLIP-2): Natural language image descriptions

Supported Formats

Images: JPG, PNG, WebP, BMP, GIF
Videos: MP4, AVI, MOV, MKV

Usage

Upload an image or video file
Select which analysis plugins to use
Click "Analyze" to process
View results in formatted or JSON format
Download JSON output for use in other systems

Performance Notes

Color Analyzer: ~1-2 seconds per image, lightweight
Object Detector: First use downloads ~2GB CLIP model, then ~5-10 seconds per image
Caption Generator: First use downloads ~2-5GB BLIP-2 model, then ~8-15 seconds per image
Video Analysis: Processes N keyframes (configurable 1-20 frames)

Example Output

{
  "results": {
    "color_analyzer": {
      "dominant_colors": [
        {"color": [45, 85, 125], "percentage": 35.2, "name": "blue"}
      ],
      "color_scheme": "cool",
      "average_brightness": 128.5,
      "average_saturation": 0.65
    }
  },
  "metadata": {
    "file": {
      "filename": "example.jpg",
      "size_mb": 2.4,
      "width": 1920,
      "height": 1080
    },
    "processing": {
      "duration_seconds": 1.234,
      "plugins_used": ["color_analyzer"]
    }
  }
}

Technology Stack

Framework: Python 3.10+
UI: Gradio 4.44+
CV: OpenCV, PIL, NumPy
AI Models: CLIP, BLIP-2 (via HuggingFace Transformers)
Logging: Loguru

Architecture

DeepVision uses a plugin-based architecture:

Core Engine: Orchestrates analysis pipeline
Plugin System: Modular, extensible analysis components
Result Manager: Aggregates and formats outputs

Local Development

# Clone repository
git clone https://huggingface.co/spaces/YOUR_USERNAME/deepvision
cd deepvision

# Install dependencies
pip install -r requirements.txt

# Run locally
python app.py

License

MIT License - Free to use and modify

Credits

Built by AI Dev Collective v9.0

Astro (Lead Developer)
Lyra (Research)
Nexus (Code Quality)
CryptoX (Security)
NOVA (UI/UX)
Echo (Performance)
Sage (Documentation)
Pulse (DevOps)

Links

Version: 0.1.0
Last Updated: January 2025