Spaces:
Sleeping
Sleeping
| title: LLaVA Chat | |
| emoji: 🖼️ | |
| colorFrom: blue | |
| colorTo: indigo | |
| sdk: gradio | |
| sdk_version: 4.19.2 | |
| app_file: app.py | |
| pinned: false | |
| license: mit | |
| # LLaVA Chat | |
| A lightweight implementation of LLaVA (Large Language and Vision Assistant) optimized for Hugging Face Spaces deployment. | |
| ## Features | |
| - Efficient model loading with 8-bit quantization | |
| - Memory-optimized inference | |
| - FastAPI backend with Gradio interface | |
| - Support for image understanding and visual conversations | |
| - Optimized for deployment on Hugging Face Spaces | |
| ## Quick Start | |
| 1. Visit the [Hugging Face Space](https://huggingface.co/spaces/Prashant26am/llava-chat) | |
| 2. Upload an image | |
| 3. Ask questions about the image | |
| 4. Get AI-powered responses | |
| ## Local Development | |
| 1. Clone the repository: | |
| ```bash | |
| git clone https://github.com/Prashant-ambati/llava-implementation.git | |
| cd llava-implementation | |
| ``` | |
| 2. Install dependencies: | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| 3. Run the application: | |
| ```bash | |
| python llava-chat/app.py | |
| ``` | |
| ## Model Architecture | |
| - Vision Model: CLIP ViT-Base | |
| - Language Model: TinyLlama-1.1B-Chat | |
| - Projection Layer: MLP with configurable hidden dimensions | |
| ## Memory Optimization | |
| The implementation includes several memory optimization techniques: | |
| - 8-bit quantization for language model | |
| - Efficient image processing | |
| - Gradient checkpointing | |
| - Memory-efficient attention | |
| - Automatic mixed precision | |
| ## API Endpoints | |
| - `POST /process_image`: Process an image with a prompt | |
| - `GET /status`: Check model and application status | |
| ## License | |
| This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details. | |
| ## Acknowledgments | |
| - Based on the paper "Visual Instruction Tuning" (NeurIPS 2023) | |
| - Uses models from Hugging Face Transformers | |
| - Built with FastAPI and Gradio | |