๐Ÿง  Multi-Modal AI Chatbot A multi-modal conversational assistant that can: ๐Ÿ’ฌ Chat naturally using an LLM. ๐Ÿ–ผ๏ธ Analyze uploaded images (with summarization via Groq API). ๐ŸŽต Analyze audio files and return transcriptions/predictions. ๐Ÿ” Perform local semantic image search using pre-computed embeddings. ๐ŸŽฏ Handle intent classification via rule-based matching with intents.json. Built with Gradio, Sentence Transformers, and Groq API, this project combines text, image, and audio workflows into a unified chat interface. ๐Ÿ“‚ Project Structure . โ”œโ”€โ”€ .env # Environment variables (must contain GROQ_API_KEY) โ”œโ”€โ”€ .gitattributes โ”œโ”€โ”€ .gitignore โ”œโ”€โ”€ app.py # Main Gradio application โ”œโ”€โ”€ image.json # Metadata for local image descriptions โ”œโ”€โ”€ intents.json # Rule-based intent classifier definitions โ”œโ”€โ”€ requirements.txt # Python dependencies โ”œโ”€โ”€ README.md # Project documentation (this file) โ””โ”€โ”€ images/ # (Optional) Local image directory โš™๏ธ Setup 1. Clone the Repository git clone https://github.com//.git cd 2. Create Virtual Environment python -m venv .venv source .venv/bin/activate # Linux/Mac .venv\Scripts\activate # Windows 3. Install Dependencies pip install -r requirements.txt 4. Environment Variables Create a .env file in the root directory: GROQ_API_KEY=your_api_key_here You can obtain an API key from Groq Cloud . ๐Ÿš€ Run the App python app.py By default, Gradio will launch at: http://127.0.0.1:7860 ๐Ÿ›  Features Chat Mode Uses a hosted LLM via gradio_client. Fallback to rule-based intent classifier for special queries. Image Analysis Upload .png, .jpg, .jpeg, .webp images. Analyzed by vision model โ†’ summarized using Groq API. Audio Analysis Upload .wav, .mp3, .flac audio. Returns friendly analysis result. Local Image Search Loads image metadata from image.json. Embeddings computed with all-MiniLM-L6-v2. Finds best semantic match from images/ folder. Intent Classification Rule-based, defined in intents.json. Supports custom triggers like "search_local_image", "request_audio_analysis", etc. ๐Ÿ“Œ Example Usage General Chat: User: โ€œTell me something interesting.โ€ Bot: Generates response via chatbot client. Search Local Image: User: โ€œFind me the blueprint diagram.โ€ Bot: Returns matching local image + description. Image Analysis: User uploads engine_part.jpg โ†’ Bot analyzes and summarizes. Audio Analysis: User uploads sample.wav โ†’ Bot outputs recognized text/prediction. ๐Ÿ“ฆ Requirements Key dependencies: gradio gradio_client sentence-transformers numpy requests python-dotenv Full list in requirements.txt . ๐Ÿ”ฎ Future Improvements Add vector database for scalable image/document retrieval. Enhance intent detection with hybrid (rules + semantic). Extend multimodal support (video, PDFs). Dockerize deployment for cloud environments. ๐Ÿ‘จโ€๐Ÿ’ป Author Built with โค๏ธ by Anvit