---
title: LLaVA Chat
emoji: 🖼️
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 4.19.2
app_file: app.py
pinned: false
license: mit
---

# LLaVA Chat

A lightweight implementation of LLaVA (Large Language and Vision Assistant) optimized for Hugging Face Spaces deployment.

## Features

- Efficient model loading with 8-bit quantization
- Memory-optimized inference
- FastAPI backend with Gradio interface
- Support for image understanding and visual conversations
- Optimized for deployment on Hugging Face Spaces

## Quick Start

1. Visit the [Hugging Face Space](https://huggingface.co/spaces/Prashant26am/llava-chat)
2. Upload an image
3. Ask questions about the image
4. Get AI-powered responses

## Local Development

1. Clone the repository:
```bash
git clone https://github.com/Prashant-ambati/llava-implementation.git
cd llava-implementation
```

2. Install dependencies:
```bash
pip install -r requirements.txt
```

3. Run the application:
```bash
python llava-chat/app.py
```

## Model Architecture

- Vision Model: CLIP ViT-Base
- Language Model: TinyLlama-1.1B-Chat
- Projection Layer: MLP with configurable hidden dimensions

## Memory Optimization

The implementation includes several memory optimization techniques:
- 8-bit quantization for language model
- Efficient image processing
- Gradient checkpointing
- Memory-efficient attention
- Automatic mixed precision

## API Endpoints

- `POST /process_image`: Process an image with a prompt
- `GET /status`: Check model and application status

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Acknowledgments

- Based on the paper "Visual Instruction Tuning" (NeurIPS 2023)
- Uses models from Hugging Face Transformers
- Built with FastAPI and Gradio