Spaces:
Runtime error
Runtime error
File size: 3,204 Bytes
a8cbf37 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 | ๐ง Multi-Modal AI Chatbot
A multi-modal conversational assistant that can:
๐ฌ Chat naturally using an LLM.
๐ผ๏ธ Analyze uploaded images (with summarization via Groq API).
๐ต Analyze audio files and return transcriptions/predictions.
๐ Perform local semantic image search using pre-computed embeddings.
๐ฏ Handle intent classification via rule-based matching with intents.json.
Built with Gradio, Sentence Transformers, and Groq API, this project combines text, image, and audio workflows into a unified chat interface.
๐ Project Structure
.
โโโ .env # Environment variables (must contain GROQ_API_KEY)
โโโ .gitattributes
โโโ .gitignore
โโโ app.py # Main Gradio application
โโโ image.json # Metadata for local image descriptions
โโโ intents.json # Rule-based intent classifier definitions
โโโ requirements.txt # Python dependencies
โโโ README.md # Project documentation (this file)
โโโ images/ # (Optional) Local image directory
โ๏ธ Setup
1. Clone the Repository
git clone https://github.com/<your-username>/<your-repo>.git
cd <your-repo>
2. Create Virtual Environment
python -m venv .venv
source .venv/bin/activate # Linux/Mac
.venv\Scripts\activate # Windows
3. Install Dependencies
pip install -r requirements.txt
4. Environment Variables
Create a .env file in the root directory:
GROQ_API_KEY=your_api_key_here
You can obtain an API key from Groq Cloud
.
๐ Run the App
python app.py
By default, Gradio will launch at:
http://127.0.0.1:7860
๐ Features
Chat Mode
Uses a hosted LLM via gradio_client.
Fallback to rule-based intent classifier for special queries.
Image Analysis
Upload .png, .jpg, .jpeg, .webp images.
Analyzed by vision model โ summarized using Groq API.
Audio Analysis
Upload .wav, .mp3, .flac audio.
Returns friendly analysis result.
Local Image Search
Loads image metadata from image.json.
Embeddings computed with all-MiniLM-L6-v2.
Finds best semantic match from images/ folder.
Intent Classification
Rule-based, defined in intents.json.
Supports custom triggers like "search_local_image", "request_audio_analysis", etc.
๐ Example Usage
General Chat:
User: โTell me something interesting.โ
Bot: Generates response via chatbot client.
Search Local Image:
User: โFind me the blueprint diagram.โ
Bot: Returns matching local image + description.
Image Analysis:
User uploads engine_part.jpg โ Bot analyzes and summarizes.
Audio Analysis:
User uploads sample.wav โ Bot outputs recognized text/prediction.
๐ฆ Requirements
Key dependencies:
gradio
gradio_client
sentence-transformers
numpy
requests
python-dotenv
Full list in requirements.txt
.
๐ฎ Future Improvements
Add vector database for scalable image/document retrieval.
Enhance intent detection with hybrid (rules + semantic).
Extend multimodal support (video, PDFs).
Dockerize deployment for cloud environments.
๐จโ๐ป Author
Built with โค๏ธ by Anvit |