Spaces:

Anvit25
/

Orchestrator_final

Runtime error

App Files Files Community

Orchestrator_final / readme

Anvit25

Upload readme

a8cbf37 verified 10 months ago

Raw

History Blame Contribute Delete

3.2 kB

	🧠 Multi-Modal AI Chatbot

	A multi-modal conversational assistant that can:

	💬 Chat naturally using an LLM.

	🖼️ Analyze uploaded images (with summarization via Groq API).

	🎵 Analyze audio files and return transcriptions/predictions.

	🔍 Perform local semantic image search using pre-computed embeddings.

	🎯 Handle intent classification via rule-based matching with intents.json.

	Built with Gradio, Sentence Transformers, and Groq API, this project combines text, image, and audio workflows into a unified chat interface.

	📂 Project Structure
	.
	├── .env # Environment variables (must contain GROQ_API_KEY)
	├── .gitattributes
	├── .gitignore
	├── app.py # Main Gradio application
	├── image.json # Metadata for local image descriptions
	├── intents.json # Rule-based intent classifier definitions
	├── requirements.txt # Python dependencies
	├── README.md # Project documentation (this file)
	└── images/ # (Optional) Local image directory

	⚙️ Setup
	1. Clone the Repository
	git clone https://github.com/<your-username>/<your-repo>.git
	cd <your-repo>

	2. Create Virtual Environment
	python -m venv .venv
	source .venv/bin/activate # Linux/Mac
	.venv\Scripts\activate # Windows

	3. Install Dependencies
	pip install -r requirements.txt

	4. Environment Variables

	Create a .env file in the root directory:

	GROQ_API_KEY=your_api_key_here


	You can obtain an API key from Groq Cloud
	.

	🚀 Run the App
	python app.py


	By default, Gradio will launch at:

	http://127.0.0.1:7860

	🛠 Features

	Chat Mode

	Uses a hosted LLM via gradio_client.

	Fallback to rule-based intent classifier for special queries.

	Image Analysis

	Upload .png, .jpg, .jpeg, .webp images.

	Analyzed by vision model → summarized using Groq API.

	Audio Analysis

	Upload .wav, .mp3, .flac audio.

	Returns friendly analysis result.

	Local Image Search

	Loads image metadata from image.json.

	Embeddings computed with all-MiniLM-L6-v2.

	Finds best semantic match from images/ folder.

	Intent Classification

	Rule-based, defined in intents.json.

	Supports custom triggers like "search_local_image", "request_audio_analysis", etc.

	📌 Example Usage

	General Chat:
	User: “Tell me something interesting.”
	Bot: Generates response via chatbot client.

	Search Local Image:
	User: “Find me the blueprint diagram.”
	Bot: Returns matching local image + description.

	Image Analysis:
	User uploads engine_part.jpg → Bot analyzes and summarizes.

	Audio Analysis:
	User uploads sample.wav → Bot outputs recognized text/prediction.

	📦 Requirements

	Key dependencies:

	gradio

	gradio_client

	sentence-transformers

	numpy

	requests

	python-dotenv

	Full list in requirements.txt
	.

	🔮 Future Improvements

	Add vector database for scalable image/document retrieval.

	Enhance intent detection with hybrid (rules + semantic).

	Extend multimodal support (video, PDFs).

	Dockerize deployment for cloud environments.

	👨‍💻 Author

	Built with ❤️ by Anvit