Spaces:
Sleeping
Sleeping
Update README.md
Browse files
README.md
CHANGED
|
@@ -1,140 +1,9 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
## 🎯 Project Overview
|
| 11 |
-
|
| 12 |
-
**VisionExtract** is a specialized machine learning solution designed to automatically detect and extract the main subject from any given image. Built for professional automation, the system isolates the foreground subject and renders the background pixels as complete black, creating a high-fidelity "cutout" for use in digital art, photography, and augmented reality.
|
| 13 |
-
|
| 14 |
-
### 📝 Project Statement
|
| 15 |
-
> "The goal of this project is to build a machine learning model capable of automatically extracting the main subject from an image. The output is a new image where only the subject is displayed as in the original photo, while the rest of the pixels are set to black."
|
| 16 |
-
|
| 17 |
-
---
|
| 18 |
-
|
| 19 |
-
## 🚀 Key Features
|
| 20 |
-
|
| 21 |
-
* **⚡ Automated Subject Isolation**: Intelligent detection and extraction of primary subjects across diverse categories.
|
| 22 |
-
* **🧩 Aspect-Ratio Awareness**: Advanced preprocessing using **LongestMaxSize** to ensure subjects maintain their natural proportions without distortion.
|
| 23 |
-
* **🔄 Virtual Background Integration**: Real-world application of isolation technology allowing real-time subject matting onto Office, Nature, and Studio environments.
|
| 24 |
-
* **🖼️ High-Fidelity Alpha Blending**: Smooth, anti-aliased edge transitions for professional-grade matting.
|
| 25 |
-
* **📊 Production Dashboard**: A premium Streamlit-based interface featuring real-time performance metrics and batch processing capabilities.
|
| 26 |
-
|
| 27 |
-
---
|
| 28 |
-
|
| 29 |
-
## 🛠️ Technical Stack
|
| 30 |
-
|
| 31 |
-
* **Architecture**: ResNet34-UNet (Transfer Learning)
|
| 32 |
-
* **Framework**: PyTorch
|
| 33 |
-
* **Preprocessing**: Albumentations (Standardized Evaluation Pipeline)
|
| 34 |
-
* **Frontend**: Streamlit (AI Showcase Dashboard)
|
| 35 |
-
* **Acceleration**: CUDA Support with AMP (Automatic Mixed Precision)
|
| 36 |
-
|
| 37 |
-
---
|
| 38 |
-
|
| 39 |
-
## 📖 Implementation Workflow
|
| 40 |
-
|
| 41 |
-
1. **Architecture**: Utilizes a deep **UNet** structure with a pre-trained **ResNet34 backbone** for high-precision spatial feature extraction.
|
| 42 |
-
2. **Training**: Optimized over **110 epochs** using **IoU-based checkpointing** to ensure the most accurate weights.
|
| 43 |
-
3. **Inference**: A standardized 256px resolution pipeline ensures architectural consistency and sub-second processing speeds.
|
| 44 |
-
|
| 45 |
-
---
|
| 46 |
-
|
| 47 |
-
## 📉 Performance Benchmarks
|
| 48 |
-
|
| 49 |
-
Following a **110-epoch training cycle** (including a 10-epoch **Refinement Phase** at an optimized learning rate of `0.00005`), the model achieved the following benchmarks:
|
| 50 |
-
|
| 51 |
-
| Metric | Achievement |
|
| 52 |
-
| :--- | :--- |
|
| 53 |
-
| **Model Architecture** | **ResNet34-UNet** |
|
| 54 |
-
| **Mean IoU** | **0.64+** |
|
| 55 |
-
| **Dice Score** | **0.78+** |
|
| 56 |
-
| **Inference Time** | **~0.15s (GPU accelerated)** |
|
| 57 |
-
|
| 58 |
-
### 📊 Model Comparison
|
| 59 |
-
| Model | IoU |
|
| 60 |
-
| :--- | :--- |
|
| 61 |
-
| **UNet** | **0.47** |
|
| 62 |
-
| **ResNetUNet** | **0.62** |
|
| 63 |
-
|
| 64 |
-
---
|
| 65 |
-
|
| 66 |
-
## 🖼️ Visual Results (Gallery)
|
| 67 |
-
|
| 68 |
-
The following samples from the `outputs/` folder demonstrate the final refined output:
|
| 69 |
-
|
| 70 |
-
| Input Image | Isolated Subject (VisionExtract) |
|
| 71 |
-
| :---: | :---: |
|
| 72 |
-
|  |  |
|
| 73 |
-
|  |  |
|
| 74 |
-
|  |  |
|
| 75 |
-
|
| 76 |
-
---
|
| 77 |
-
|
| 78 |
-
## 📂 Project Structure
|
| 79 |
-
|
| 80 |
-
```text
|
| 81 |
-
VisionExtract/
|
| 82 |
-
├── src/ # Production Logic (Model, Training, Inference, App)
|
| 83 |
-
├── outputs/ # Sample Results (Inputs & Predicted Cutouts)
|
| 84 |
-
├── docs/ # Project Assets (Banners, Backgrounds, Documentation)
|
| 85 |
-
├── checkpoints/ # Trained Model Weights (.pth)
|
| 86 |
-
├── requirements.txt # Dependency Configuration
|
| 87 |
-
└── README.md # Technical Overview
|
| 88 |
-
```
|
| 89 |
-
|
| 90 |
-
---
|
| 91 |
-
|
| 92 |
-
## 🏃 Getting Started
|
| 93 |
-
|
| 94 |
-
### 1. Environment Setup
|
| 95 |
-
```bash
|
| 96 |
-
git clone https://github.com/biswajeet111/VisionExtract.git
|
| 97 |
-
cd VisionExtract
|
| 98 |
-
python -m venv venv
|
| 99 |
-
venv\Scripts\activate
|
| 100 |
-
pip install -r requirements.txt
|
| 101 |
-
```
|
| 102 |
-
|
| 103 |
-
### 2. Launching the Web Showcase
|
| 104 |
-
Experience the real-time extraction engine and background switcher.
|
| 105 |
-
```bash
|
| 106 |
-
streamlit run src/app.py
|
| 107 |
-
```
|
| 108 |
-
|
| 109 |
-
### 3. Command Line Interface (CLI)
|
| 110 |
-
```bash
|
| 111 |
-
# Single Image Processing
|
| 112 |
-
python src/inference.py --image path/to/sample.jpg --display
|
| 113 |
-
```
|
| 114 |
-
|
| 115 |
-
### 4. Model Training & Refinement
|
| 116 |
-
```bash
|
| 117 |
-
# Full Training Cycle
|
| 118 |
-
python src/train.py
|
| 119 |
-
```
|
| 120 |
-
|
| 121 |
-
---
|
| 122 |
-
|
| 123 |
-
## ⚠️ Limitations
|
| 124 |
-
|
| 125 |
-
While VisionExtract is highly effective, it has certain constraints:
|
| 126 |
-
- **Struggles on extremely crowded scenes**: Multiple overlapping subjects can lead to merged or incomplete masks.
|
| 127 |
-
- **High-resolution increases inference time**: Processing images significantly larger than the base 256px resolution requires more VRAM and compute.
|
| 128 |
-
- **Small object segmentation may vary**: Tiny details (like thin strands of hair or distant objects) may be smoothed out during upscale.
|
| 129 |
-
|
| 130 |
-
---
|
| 131 |
-
|
| 132 |
-
## 👤 Author
|
| 133 |
-
|
| 134 |
-
**Biswajeet Kumar**
|
| 135 |
-
* **Portfolio**: [GitHub](https://github.com/biswajeet111)
|
| 136 |
-
* **Connect**: [LinkedIn](https://www.linkedin.com/in/biswajeet-kumar-a70043362)
|
| 137 |
-
|
| 138 |
-
---
|
| 139 |
-
|
| 140 |
-
*Developed as a high-performance solution for Automated Subject Isolation and AI Segmentation.*
|
|
|
|
| 1 |
+
---
|
| 2 |
+
title: VisionExtract AI
|
| 3 |
+
emoji: 🧠
|
| 4 |
+
colorFrom: blue
|
| 5 |
+
colorTo: purple
|
| 6 |
+
sdk: docker
|
| 7 |
+
app_file: app.py
|
| 8 |
+
pinned: false
|
| 9 |
+
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|