promptris_refined / README.md
athul020's picture
Update README.md
163d6b3 verified
metadata
license: mit
language:
  - en
tags:
  - deep-learning
  - computer-vision
  - vision-language
  - segmentation
  - multimodal
  - pytorch
library_name: pytorch

DEEP – Vision-Language Intelligence Framework

πŸ”₯ Overview

DEEP is a multimodal AI framework that integrates computer vision and language understanding to perform intelligent visual reasoning tasks.

The system is designed for:

  • 🧠 Vision-Language Understanding
  • πŸ–Ό Image Segmentation
  • πŸ“ Visual Question Answering
  • πŸ” Prompt-driven Object Localization
  • πŸ€– AI Agent-based Visual Reasoning

This repository contains model weights, training scripts, and inference pipeline.


πŸ— Architecture

The architecture integrates:

  • Vision Encoder (CNN / ViT)
  • Text Encoder (Transformer-based)
  • Cross-Modal Attention Fusion
  • Task-specific Heads (Segmentation / QA / Classification)

Pipeline Flow:

Image β†’ Vision Encoder
Text Prompt β†’ Text Encoder
Fusion β†’ Cross Attention
Output β†’ Task Head


πŸ“Š Training Details

  • Framework: PyTorch
  • Optimizer: AdamW
  • Loss: Cross-Entropy / Contrastive Loss
  • Training Strategy: Supervised Learning
  • Hardware: GPU-based Training

πŸš€ Usage

Install Dependencies

pip install torch torchvision transformers