Salman Abjam
Fix: Revert to Gradio 4.44.0 for HF Spaces compatibility
ecd082f

A newer version of the Gradio SDK is available: 6.12.0

Upgrade
metadata
title: DeepVision Prompt Builder
emoji: 🎯
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
license: mit

🎯 DeepVision Prompt Builder

AI-Powered Image & Video Analysis with Automatic JSON Prompt Generation

Overview

DeepVision is a modular AI system that analyzes images and videos to generate structured JSON prompts. Perfect for:

  • πŸ“Έ Automated image tagging
  • 🎬 Video content analysis
  • πŸ€– AI training data preparation
  • πŸ“Š Media cataloging
  • 🎨 Creative prompt generation

Features

Available Plugins

  • 🎨 Color Analyzer (Fast): Extract dominant colors, color schemes, brightness, and saturation
  • πŸ” Object Detector (CLIP): Zero-shot object detection with confidence scores
  • πŸ’¬ Caption Generator (BLIP-2): Natural language image descriptions

Supported Formats

  • Images: JPG, PNG, WebP, BMP, GIF
  • Videos: MP4, AVI, MOV, MKV

Usage

  1. Upload an image or video file
  2. Select which analysis plugins to use
  3. Click "Analyze" to process
  4. View results in formatted or JSON format
  5. Download JSON output for use in other systems

Performance Notes

  • Color Analyzer: ~1-2 seconds per image, lightweight
  • Object Detector: First use downloads ~2GB CLIP model, then ~5-10 seconds per image
  • Caption Generator: First use downloads ~2-5GB BLIP-2 model, then ~8-15 seconds per image
  • Video Analysis: Processes N keyframes (configurable 1-20 frames)

Example Output

{
  "results": {
    "color_analyzer": {
      "dominant_colors": [
        {"color": [45, 85, 125], "percentage": 35.2, "name": "blue"}
      ],
      "color_scheme": "cool",
      "average_brightness": 128.5,
      "average_saturation": 0.65
    }
  },
  "metadata": {
    "file": {
      "filename": "example.jpg",
      "size_mb": 2.4,
      "width": 1920,
      "height": 1080
    },
    "processing": {
      "duration_seconds": 1.234,
      "plugins_used": ["color_analyzer"]
    }
  }
}

Technology Stack

  • Framework: Python 3.10+
  • UI: Gradio 4.44+
  • CV: OpenCV, PIL, NumPy
  • AI Models: CLIP, BLIP-2 (via HuggingFace Transformers)
  • Logging: Loguru

Architecture

DeepVision uses a plugin-based architecture:

  • Core Engine: Orchestrates analysis pipeline
  • Plugin System: Modular, extensible analysis components
  • Result Manager: Aggregates and formats outputs

Local Development

# Clone repository
git clone https://huggingface.co/spaces/YOUR_USERNAME/deepvision
cd deepvision

# Install dependencies
pip install -r requirements.txt

# Run locally
python app.py

License

MIT License - Free to use and modify

Credits

Built by AI Dev Collective v9.0

  • Astro (Lead Developer)
  • Lyra (Research)
  • Nexus (Code Quality)
  • CryptoX (Security)
  • NOVA (UI/UX)
  • Echo (Performance)
  • Sage (Documentation)
  • Pulse (DevOps)

Links


Version: 0.1.0
Last Updated: January 2025