Spaces:
Runtime error
Runtime error
| ๐ง Multi-Modal AI Chatbot | |
| A multi-modal conversational assistant that can: | |
| ๐ฌ Chat naturally using an LLM. | |
| ๐ผ๏ธ Analyze uploaded images (with summarization via Groq API). | |
| ๐ต Analyze audio files and return transcriptions/predictions. | |
| ๐ Perform local semantic image search using pre-computed embeddings. | |
| ๐ฏ Handle intent classification via rule-based matching with intents.json. | |
| Built with Gradio, Sentence Transformers, and Groq API, this project combines text, image, and audio workflows into a unified chat interface. | |
| ๐ Project Structure | |
| . | |
| โโโ .env # Environment variables (must contain GROQ_API_KEY) | |
| โโโ .gitattributes | |
| โโโ .gitignore | |
| โโโ app.py # Main Gradio application | |
| โโโ image.json # Metadata for local image descriptions | |
| โโโ intents.json # Rule-based intent classifier definitions | |
| โโโ requirements.txt # Python dependencies | |
| โโโ README.md # Project documentation (this file) | |
| โโโ images/ # (Optional) Local image directory | |
| โ๏ธ Setup | |
| 1. Clone the Repository | |
| git clone https://github.com/<your-username>/<your-repo>.git | |
| cd <your-repo> | |
| 2. Create Virtual Environment | |
| python -m venv .venv | |
| source .venv/bin/activate # Linux/Mac | |
| .venv\Scripts\activate # Windows | |
| 3. Install Dependencies | |
| pip install -r requirements.txt | |
| 4. Environment Variables | |
| Create a .env file in the root directory: | |
| GROQ_API_KEY=your_api_key_here | |
| You can obtain an API key from Groq Cloud | |
| . | |
| ๐ Run the App | |
| python app.py | |
| By default, Gradio will launch at: | |
| http://127.0.0.1:7860 | |
| ๐ Features | |
| Chat Mode | |
| Uses a hosted LLM via gradio_client. | |
| Fallback to rule-based intent classifier for special queries. | |
| Image Analysis | |
| Upload .png, .jpg, .jpeg, .webp images. | |
| Analyzed by vision model โ summarized using Groq API. | |
| Audio Analysis | |
| Upload .wav, .mp3, .flac audio. | |
| Returns friendly analysis result. | |
| Local Image Search | |
| Loads image metadata from image.json. | |
| Embeddings computed with all-MiniLM-L6-v2. | |
| Finds best semantic match from images/ folder. | |
| Intent Classification | |
| Rule-based, defined in intents.json. | |
| Supports custom triggers like "search_local_image", "request_audio_analysis", etc. | |
| ๐ Example Usage | |
| General Chat: | |
| User: โTell me something interesting.โ | |
| Bot: Generates response via chatbot client. | |
| Search Local Image: | |
| User: โFind me the blueprint diagram.โ | |
| Bot: Returns matching local image + description. | |
| Image Analysis: | |
| User uploads engine_part.jpg โ Bot analyzes and summarizes. | |
| Audio Analysis: | |
| User uploads sample.wav โ Bot outputs recognized text/prediction. | |
| ๐ฆ Requirements | |
| Key dependencies: | |
| gradio | |
| gradio_client | |
| sentence-transformers | |
| numpy | |
| requests | |
| python-dotenv | |
| Full list in requirements.txt | |
| . | |
| ๐ฎ Future Improvements | |
| Add vector database for scalable image/document retrieval. | |
| Enhance intent detection with hybrid (rules + semantic). | |
| Extend multimodal support (video, PDFs). | |
| Dockerize deployment for cloud environments. | |
| ๐จโ๐ป Author | |
| Built with โค๏ธ by Anvit |