Anvit25's picture
Upload readme
a8cbf37 verified
๐Ÿง  Multi-Modal AI Chatbot
A multi-modal conversational assistant that can:
๐Ÿ’ฌ Chat naturally using an LLM.
๐Ÿ–ผ๏ธ Analyze uploaded images (with summarization via Groq API).
๐ŸŽต Analyze audio files and return transcriptions/predictions.
๐Ÿ” Perform local semantic image search using pre-computed embeddings.
๐ŸŽฏ Handle intent classification via rule-based matching with intents.json.
Built with Gradio, Sentence Transformers, and Groq API, this project combines text, image, and audio workflows into a unified chat interface.
๐Ÿ“‚ Project Structure
.
โ”œโ”€โ”€ .env # Environment variables (must contain GROQ_API_KEY)
โ”œโ”€โ”€ .gitattributes
โ”œโ”€โ”€ .gitignore
โ”œโ”€โ”€ app.py # Main Gradio application
โ”œโ”€โ”€ image.json # Metadata for local image descriptions
โ”œโ”€โ”€ intents.json # Rule-based intent classifier definitions
โ”œโ”€โ”€ requirements.txt # Python dependencies
โ”œโ”€โ”€ README.md # Project documentation (this file)
โ””โ”€โ”€ images/ # (Optional) Local image directory
โš™๏ธ Setup
1. Clone the Repository
git clone https://github.com/<your-username>/<your-repo>.git
cd <your-repo>
2. Create Virtual Environment
python -m venv .venv
source .venv/bin/activate # Linux/Mac
.venv\Scripts\activate # Windows
3. Install Dependencies
pip install -r requirements.txt
4. Environment Variables
Create a .env file in the root directory:
GROQ_API_KEY=your_api_key_here
You can obtain an API key from Groq Cloud
.
๐Ÿš€ Run the App
python app.py
By default, Gradio will launch at:
http://127.0.0.1:7860
๐Ÿ›  Features
Chat Mode
Uses a hosted LLM via gradio_client.
Fallback to rule-based intent classifier for special queries.
Image Analysis
Upload .png, .jpg, .jpeg, .webp images.
Analyzed by vision model โ†’ summarized using Groq API.
Audio Analysis
Upload .wav, .mp3, .flac audio.
Returns friendly analysis result.
Local Image Search
Loads image metadata from image.json.
Embeddings computed with all-MiniLM-L6-v2.
Finds best semantic match from images/ folder.
Intent Classification
Rule-based, defined in intents.json.
Supports custom triggers like "search_local_image", "request_audio_analysis", etc.
๐Ÿ“Œ Example Usage
General Chat:
User: โ€œTell me something interesting.โ€
Bot: Generates response via chatbot client.
Search Local Image:
User: โ€œFind me the blueprint diagram.โ€
Bot: Returns matching local image + description.
Image Analysis:
User uploads engine_part.jpg โ†’ Bot analyzes and summarizes.
Audio Analysis:
User uploads sample.wav โ†’ Bot outputs recognized text/prediction.
๐Ÿ“ฆ Requirements
Key dependencies:
gradio
gradio_client
sentence-transformers
numpy
requests
python-dotenv
Full list in requirements.txt
.
๐Ÿ”ฎ Future Improvements
Add vector database for scalable image/document retrieval.
Enhance intent detection with hybrid (rules + semantic).
Extend multimodal support (video, PDFs).
Dockerize deployment for cloud environments.
๐Ÿ‘จโ€๐Ÿ’ป Author
Built with โค๏ธ by Anvit