--- title: LLaVA Chat emoji: 🖼️ colorFrom: blue colorTo: indigo sdk: gradio sdk_version: 4.19.2 app_file: app.py pinned: false license: mit --- # LLaVA Chat A lightweight implementation of LLaVA (Large Language and Vision Assistant) optimized for Hugging Face Spaces deployment. ## Features - Efficient model loading with 8-bit quantization - Memory-optimized inference - FastAPI backend with Gradio interface - Support for image understanding and visual conversations - Optimized for deployment on Hugging Face Spaces ## Quick Start 1. Visit the [Hugging Face Space](https://huggingface.co/spaces/Prashant26am/llava-chat) 2. Upload an image 3. Ask questions about the image 4. Get AI-powered responses ## Local Development 1. Clone the repository: ```bash git clone https://github.com/Prashant-ambati/llava-implementation.git cd llava-implementation ``` 2. Install dependencies: ```bash pip install -r requirements.txt ``` 3. Run the application: ```bash python llava-chat/app.py ``` ## Model Architecture - Vision Model: CLIP ViT-Base - Language Model: TinyLlama-1.1B-Chat - Projection Layer: MLP with configurable hidden dimensions ## Memory Optimization The implementation includes several memory optimization techniques: - 8-bit quantization for language model - Efficient image processing - Gradient checkpointing - Memory-efficient attention - Automatic mixed precision ## API Endpoints - `POST /process_image`: Process an image with a prompt - `GET /status`: Check model and application status ## License This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details. ## Acknowledgments - Based on the paper "Visual Instruction Tuning" (NeurIPS 2023) - Uses models from Hugging Face Transformers - Built with FastAPI and Gradio