File size: 3,204 Bytes
a8cbf37
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
๐Ÿง  Multi-Modal AI Chatbot

A multi-modal conversational assistant that can:

๐Ÿ’ฌ Chat naturally using an LLM.

๐Ÿ–ผ๏ธ Analyze uploaded images (with summarization via Groq API).

๐ŸŽต Analyze audio files and return transcriptions/predictions.

๐Ÿ” Perform local semantic image search using pre-computed embeddings.

๐ŸŽฏ Handle intent classification via rule-based matching with intents.json.

Built with Gradio, Sentence Transformers, and Groq API, this project combines text, image, and audio workflows into a unified chat interface.

๐Ÿ“‚ Project Structure
.
โ”œโ”€โ”€ .env                # Environment variables (must contain GROQ_API_KEY)
โ”œโ”€โ”€ .gitattributes
โ”œโ”€โ”€ .gitignore
โ”œโ”€โ”€ app.py              # Main Gradio application
โ”œโ”€โ”€ image.json          # Metadata for local image descriptions
โ”œโ”€โ”€ intents.json        # Rule-based intent classifier definitions
โ”œโ”€โ”€ requirements.txt    # Python dependencies
โ”œโ”€โ”€ README.md           # Project documentation (this file)
โ””โ”€โ”€ images/             # (Optional) Local image directory

โš™๏ธ Setup
1. Clone the Repository
git clone https://github.com/<your-username>/<your-repo>.git
cd <your-repo>

2. Create Virtual Environment
python -m venv .venv
source .venv/bin/activate   # Linux/Mac
.venv\Scripts\activate      # Windows

3. Install Dependencies
pip install -r requirements.txt

4. Environment Variables

Create a .env file in the root directory:

GROQ_API_KEY=your_api_key_here


You can obtain an API key from Groq Cloud
.

๐Ÿš€ Run the App
python app.py


By default, Gradio will launch at:

http://127.0.0.1:7860

๐Ÿ›  Features

Chat Mode

Uses a hosted LLM via gradio_client.

Fallback to rule-based intent classifier for special queries.

Image Analysis

Upload .png, .jpg, .jpeg, .webp images.

Analyzed by vision model โ†’ summarized using Groq API.

Audio Analysis

Upload .wav, .mp3, .flac audio.

Returns friendly analysis result.

Local Image Search

Loads image metadata from image.json.

Embeddings computed with all-MiniLM-L6-v2.

Finds best semantic match from images/ folder.

Intent Classification

Rule-based, defined in intents.json.

Supports custom triggers like "search_local_image", "request_audio_analysis", etc.

๐Ÿ“Œ Example Usage

General Chat:
User: โ€œTell me something interesting.โ€
Bot: Generates response via chatbot client.

Search Local Image:
User: โ€œFind me the blueprint diagram.โ€
Bot: Returns matching local image + description.

Image Analysis:
User uploads engine_part.jpg โ†’ Bot analyzes and summarizes.

Audio Analysis:
User uploads sample.wav โ†’ Bot outputs recognized text/prediction.

๐Ÿ“ฆ Requirements

Key dependencies:

gradio

gradio_client

sentence-transformers

numpy

requests

python-dotenv

Full list in requirements.txt
.

๐Ÿ”ฎ Future Improvements

Add vector database for scalable image/document retrieval.

Enhance intent detection with hybrid (rules + semantic).

Extend multimodal support (video, PDFs).

Dockerize deployment for cloud environments.

๐Ÿ‘จโ€๐Ÿ’ป Author

Built with โค๏ธ by Anvit