Spaces:

manhteky123
/

EmoVIT

Configuration error

App Files Files Community

manhteky123 commited on Sep 23, 2025

Commit

643395e

verified ·

1 Parent(s): b181820

Upload 23 files

Browse files

Files changed (2) hide show

Dockerfile +67 -156
README.md +1 -208

Dockerfile CHANGED Viewed

@@ -1,156 +1,67 @@
-# Start with NVIDIA CUDA base image
-FROM nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu20.04
-# Set environment variables
-ENV PYTHONDONTWRITEBYTECODE=1
-ENV PYTHONUNBUFFERED=1
-ENV DEBIAN_FRONTEND=noninteractive
-# Set working directory
-WORKDIR /app
-# Install system dependencies
-RUN apt-get update && apt-get install -y \
-    git \
-    wget \
-    python3-pip \
-    python3-dev \
-    && rm -rf /var/lib/apt/lists/*
-# Create symlink for python
-RUN ln -sf /usr/bin/python3 /usr/bin/python
-# Copy requirements files
-COPY requirements_lavis.txt /app/
-COPY requirements_emo.txt /app/
-# Install Python dependencies
-RUN pip3 install --no-cache-dir --upgrade pip
-RUN pip3 install --no-cache-dir -r requirements_lavis.txt
-RUN pip3 install --no-cache-dir gradio
-# Install PyTorch with CUDA 11.8
-RUN pip3 install torch==2.0.0 torchvision==0.15.1 torchaudio==2.0.1 --index-url https://download.pytorch.org/whl/cu118
-# Clone LAVIS repository
-RUN git clone https://github.com/salesforce/LAVIS.git
-WORKDIR /app/LAVIS
-RUN pip3 install -e .
-WORKDIR /app
-# Create directories for model weights
-RUN mkdir -p /app/LAVIS/lavis/weight/vicuna-7b-2/
-RUN mkdir -p /app/LAVIS/lavis/models/blip2_models/
-# Copy model files
-COPY blip2_vicuna_instruct.py /app/LAVIS/lavis/models/blip2_models/
-COPY FT.yaml /app/LAVIS/
-# Download trained weights
-RUN mkdir -p /app/weights
-# Create Gradio app
-COPY app.py /app/app.py
-# Create start script with model setup
-RUN echo '#!/bin/bash\n\
-# Download Vicuna model if not present\n\
-MODEL_PATH="/app/LAVIS/lavis/weight/vicuna-7b-2"\n\
-WEIGHTS_URL="https://drive.google.com/file/d/1zaYOSlt3mLVMdiNfAKdJcwvVc-4LHfdr/view?usp=drive_link"\n\
-\n\
-# Check if we need to download Vicuna model weights\n\
-if [ ! -f "$MODEL_PATH/config.json" ]; then\n\
-    echo "Downloading Vicuna-7b model weights..."\n\
-    apt-get update && apt-get install -y git-lfs && rm -rf /var/lib/apt/lists/*\n\
-    git lfs install\n\
-    git clone https://huggingface.co/lmsys/vicuna-7b-v1.1 $MODEL_PATH\n\
-    echo "Vicuna model downloaded successfully!"\n\
-fi\n\
-\n\
-# Download EmoVIT trained weights if not present\n\
-if [ ! -f "/app/weights/model_weights1.pth" ]; then\n\
-    echo "Downloading EmoVIT trained weights..."\n\
-    apt-get update && apt-get install -y curl gdown && rm -rf /var/lib/apt/lists/*\n\
-    gdown --id 1zaYOSlt3mLVMdiNfAKdJcwvVc-4LHfdr -O /app/weights/model_weights1.pth\n\
-    echo "EmoVIT weights downloaded successfully!"\n\
-fi\n\
-\n\
-# Start the app\n\
-python /app/app.py\n'\
-> /app/start.sh
-RUN chmod +x /app/start.sh
-# Create a proper app.py with Gradio interface
-RUN echo 'import gradio as gr\n\
-import torch\n\
-import os\n\
-from PIL import Image\n\
-from lavis.models import load_model_and_preprocess\n\
-\n\
-# Set device\n\
-device = torch.device("cuda" if torch.cuda.is_available() else "cpu")\n\
-print(f"Using device: {device}")\n\
-\n\
-# Set model path\n\
-os.environ["TORCH_HOME"] = "/app/weights"\n\
-\n\
-# Load the model\n\
-print("Loading EmoVIT model...")\n\
-model, vis_processors, txt_processors = load_model_and_preprocess(\n\
-    name="blip2_vicuna_instruct",\n\
-    model_type="vicuna7b",\n\
-    is_eval=True,\n\
-    device=device\n\
-)\n\
-\n\
-# Load the fine-tuned weights\n\
-if os.path.exists("/app/weights/model_weights1.pth"):\n\
-    print("Loading fine-tuned weights...")\n\
-    model.load_state_dict(torch.load("/app/weights/model_weights1.pth", map_location=device))\n\
-    print("Fine-tuned weights loaded successfully!")\n\
-else:\n\
-    print("Warning: Fine-tuned weights not found!")\n\
-\n\
-print("Model initialization complete!")\n\
-\n\
-def predict(image, prompt):\n\
-    if image is None:\n\
-        return "Please upload an image."\n\
-    \n\
-    # Process the image\n\
-    image_tensor = vis_processors["eval"](image).unsqueeze(0).to(device)\n\
-    \n\
-    # For emotion reasoning, format prompt if needed\n\
-    if "reason" in prompt.lower() and not prompt.lower().startswith("predicted emotion"):\n\
-        prompt = f"Predicted emotion: [emotion]. Reason: [explanation]. {prompt}"\n\
-    \n\
-    # Generate response\n\
-    with torch.no_grad():\n\
-        response = model.generate({{"image": image_tensor, "prompt": prompt}})\n\
-    \n\
-    return response[0]\n\
-\n\
-# Define Gradio interface with examples\n\
-examples = [\n\
-    ["example_image.jpg", "What emotion is expressed in this image?"],\n\
-    ["example_image.jpg", "Predicted emotion: [emotion]. Reason: [explanation]."],\n\
-]\n\
-\n\
-demo = gr.Interface(\n\
-    fn=predict,\n\
-    inputs=[\n\
-        gr.Image(type="pil", label="Upload Image"),\n\
-        gr.Textbox(lines=2, placeholder="Enter your prompt here...", label="Prompt")\n\
-    ],\n\
-    outputs=gr.Textbox(label="Model Response"),\n\
-    title="EmoVIT: Visual Emotion Analysis with Instruction Tuning",\n\
-    description="Upload an image and enter a prompt to analyze emotions. For emotion reasoning, format your prompt as: \\"Predicted emotion: [emotion]. Reason: [explanation].\\"")\n\
-\n\
-# Launch the app\n\
-if __name__ == "__main__":\n\
-    demo.launch(server_name="0.0.0.0", server_port=7860)\n'\
-> /app/app.py
-# Set the entry point
-ENTRYPOINT ["/app/start.sh"]

+FROM nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu20.04
+WORKDIR /app
+# Set environment variables
+ENV PYTHONDONTWRITEBYTECODE=1 \
+    PYTHONUNBUFFERED=1 \
+    DEBIAN_FRONTEND=noninteractive \
+    TRANSFORMERS_CACHE=/app/.cache/transformers \
+    HF_HOME=/app/.cache/huggingface \
+    TORCH_HOME=/app/.cache/torch \
+    HF_DATASETS_CACHE=/app/.cache/datasets
+# Install basic dependencies
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    python3.8 \
+    python3.8-dev \
+    python3-pip \
+    python3-setuptools \
+    git \
+    wget \
+    ca-certificates \
+    && apt-get clean \
+    && rm -rf /var/lib/apt/lists/*
+# Create symbolic link for python
+RUN ln -sf /usr/bin/python3.8 /usr/bin/python
+# Upgrade pip
+RUN pip install --no-cache-dir --upgrade pip
+# Create cache directories
+RUN mkdir -p /app/.cache/transformers \
+    /app/.cache/huggingface \
+    /app/.cache/torch \
+    /app/.cache/datasets
+# Clone LAVIS repository to temp directory for installation
+RUN git clone https://github.com/salesforce/LAVIS.git /tmp/LAVIS \
+    && cd /tmp/LAVIS \
+    && sed -i '/open3d/d' requirements.txt \
+    && pip install --no-cache-dir -e . \
+    && cd / \
+    && cp -r /tmp/LAVIS/lavis /app/LAVIS/ \
+    && rm -rf /tmp/LAVIS
+# Install PyTorch with CUDA support
+RUN pip install --no-cache-dir torch==2.0.0 torchvision==0.15.1 torchaudio==2.0.1 --extra-index-url https://download.pytorch.org/whl/cu118
+# Copy requirements files and install dependencies
+COPY requirements_lavis.txt requirements_emo.txt ./
+RUN pip install --no-cache-dir -r requirements_lavis.txt -r requirements_emo.txt
+# Copy model and application files
+COPY app.py blip2_vicuna_instruct.py ./
+COPY static/ ./static/
+COPY templates/ ./templates/
+COPY LAVIS/ ./LAVIS/
+# Create directory for model weights (to be mounted or downloaded at runtime)
+RUN mkdir -p ./LAVIS/lavis/weight/vicuna-7b-2/
+# Set up a volume for persistent cache
+VOLUME /app/.cache
+# Set the default command to run the Flask app
+CMD ["python", "app.py"]

README.md CHANGED Viewed

@@ -1,211 +1,4 @@
----
-title: EmoVIT
-emoji: 😻
-colorFrom: gray
-colorTo: purple
-sdk: docker
-pinned: false
----
-# EmoVIT - Emotion Detection with BLIP2-Vicuna
-🚀 **AI-Powered Emotion Detection Web Application**
-EmoVIT is a sophisticated emotion detection application that leverages the power of BLIP2-Vicuna model to analyze emotions in images through natural language understanding.
-## 🌟 Features
-- **🖼️ Image Upload**: Easy drag-and-drop or click-to-upload interface
-- **🧠 AI Analysis**: Advanced emotion detection using BLIP2-Vicuna model
-- **💬 Custom Prompts**: Personalize your analysis with custom text prompts
-- **🎨 Beautiful UI**: Modern, responsive design with smooth animations
-- **⚡ Real-time Processing**: Fast inference with optimized model loading
-- **📱 Mobile Friendly**: Works seamlessly on all devices
-## 🛠️ Technology Stack
-- **Backend**: Flask (Python web framework)
-- **AI Model**: BLIP2-Vicuna (Vision-Language model)
-- **Frontend**: HTML5, CSS3, JavaScript, Bootstrap 5
-- **Deployment**: Docker + Hugging Face Spaces
-## 🚀 Quick Start
-### Local Development
-1. **Clone the repository**
-   ```bash
-   git clone <your-repo-url>
-   cd EmoVIT
-   ```
-2. **Install dependencies**
-   ```bash
-   pip install -r requirements.txt
-   ```
-3. **Run the application**
-   ```bash
-   python app.py
-   ```
-4. **Open in browser**
-   Navigate to `http://localhost:7860`
-### Docker Deployment
-1. **Build the Docker image**
-   ```bash
-   docker build -t emovit .
-   ```
-2. **Run the container**
-   ```bash
-   docker run -p 7860:7860 emovit
-   ```
-## 🌐 Hugging Face Spaces Deployment
-This application is configured for seamless deployment on Hugging Face Spaces:
-1. **Create a new Space** on [Hugging Face Spaces](https://huggingface.co/spaces)
-2. **Select Docker** as the SDK
-3. **Upload your files** to the Space repository
-4. **The app will automatically deploy** using the provided Dockerfile
-### Required Files for HF Spaces:
-- `app.py` - Main Flask application
-- `Dockerfile` - Container configuration
-- `requirements.txt` - Python dependencies
-- `templates/` - HTML templates
-- `static/` - CSS and static assets
-- `blip2_vicuna_instruct.py` - Model implementation
-## 📁 Project Structure
-```
-EmoVIT/
-├── app.py                     # Main Flask application
-├── blip2_vicuna_instruct.py   # BLIP2-Vicuna model implementation
-├── requirements.txt           # Python dependencies
-├── Dockerfile                 # Docker configuration
-├── README.md                  # This file
-├── templates/
-│   └── index.html            # Main HTML template
-├── static/
-│   └── css/
-│       └── style.css         # Custom CSS styles
-└── emo/                      # Emotion datasets and utilities
-    ├── train.json
-    ├── val.json
-    └── test.json
-```
-## 🎯 How It Works
-1. **Upload Image**: Users upload an image through the web interface
-2. **Enter Prompt**: Optionally customize the analysis prompt
-3. **AI Processing**: The BLIP2-Vicuna model processes the image and prompt
-4. **Results Display**: Emotion analysis results are displayed with the original image
-## 🔧 Configuration
-### Model Configuration
-The model can be configured in `app.py`:
-```python
-model_config = {
-    "vit_model": "eva_clip_g",
-    "img_size": 224,
-    "num_query_token": 32,
-    "llm_model": "vicuna-7b-v1.1",
-    "max_txt_len": 128,
-    "max_output_txt_len": 256,
-    # ... other configurations
-}
-```
-### Environment Variables
-- `PORT`: Application port (default: 7860)
-- `FLASK_ENV`: Flask environment (production/development)
-## 🤖 Model Details
-**BLIP2-Vicuna** combines:
-- **Vision Encoder**: EVA-CLIP for image understanding
-- **Q-Former**: Querying transformer for cross-modal alignment
-- **Language Model**: Vicuna (LLaMA-based) for text generation
-This architecture enables sophisticated vision-language understanding for emotion detection tasks.
-## 📊 Performance & Optimization
-- **GPU Support**: Automatic CUDA detection and utilization
-- **Memory Efficient**: Optimized model loading and inference
-- **Caching**: Smart caching for improved response times
-- **Error Handling**: Robust error handling and user feedback
-## 🎨 UI/UX Features
-- **Responsive Design**: Works on desktop, tablet, and mobile
-- **Modern Aesthetics**: Clean, professional interface
-- **Smooth Animations**: Engaging user interactions
-- **Loading States**: Clear feedback during processing
-- **Error Handling**: User-friendly error messages
-## 🔒 Security Features
-- **File Size Limits**: 16MB maximum upload size
-- **File Type Validation**: Only image files accepted
-- **Input Sanitization**: Secure handling of user inputs
-- **CORS Protection**: Appropriate cross-origin policies
-## 🚀 Deployment Options
-### 1. Hugging Face Spaces (Recommended)
-- Zero-configuration deployment
-- Automatic scaling
-- Free tier available
-- Built-in GPU support
-### 2. Docker
-- Consistent environments
-- Easy scaling
-- Platform independent
-### 3. Local Development
-- Quick testing
-- Development workflow
-- Custom configurations
-## 🛠️ Development
-### Adding New Features
-1. Update `app.py` for backend changes
-2. Modify `templates/index.html` for UI changes
-3. Update `static/css/style.css` for styling
-4. Test locally before deployment
-### Model Updates
-1. Update `blip2_vicuna_instruct.py`
-2. Adjust configuration in `app.py`
-3. Update requirements if needed
-## 📄 License
-This project is open-source and available under the MIT License.
-## 🤝 Contributing
-Contributions are welcome! Please feel free to submit a Pull Request.
-## 📞 Support
-For questions or support, please open an issue in the repository.
----
-**Built with ❤️ using BLIP2-Vicuna and modern web technologies**
 Official code for the paper **"EmoVIT: Revolutionizing Emotion Insights with Visual Instruction Tuning"** | CVPR 2024
 ## 🔄 Update Log – 2025/04/07

+# EmoVIT
 Official code for the paper **"EmoVIT: Revolutionizing Emotion Insights with Visual Instruction Tuning"** | CVPR 2024
 ## 🔄 Update Log – 2025/04/07