Spaces:
Runtime error
A newer version of the Gradio SDK is available: 6.14.0
Methodology
The chatbot integrates multiple AI workflows into a single Gradio UI. The process follows these main stages:
Input Handling
Users interact via a multimodal text box (supports text, image, and audio).
The chatbot determines whether the query contains:
Text only
Image file
Audio file
Intent Classification
Text queries are processed through a rule-based intent classifier (intents.json).
Example intents:
"chat" β Send to hosted chatbot LLM.
"search_local_image" β Trigger local semantic image search.
"request_image_analysis" β Ask user to upload an image.
"request_audio_analysis" β Ask user to upload audio.
Local Semantic Search
Metadata from image.json provides descriptions for images in /images/.
Each description is encoded using SentenceTransformers (all-MiniLM-L6-v2).
Query embeddings are compared with stored embeddings using cosine similarity.
If similarity > threshold (0.4), best match image is returned.
Image Analysis Workflow
Uploaded images are passed to the vision model (via gradio_client).
Raw AI output (JSON) is summarized with Groq API (LLaMA-3.3-70B).
Final user-facing response is a friendly explanation.
Audio Analysis Workflow
Uploaded audio is processed via the audio model (Gradio client).
Returns prediction text (e.g., transcription or classification).
Packaged as a human-readable response.
Groq Summarization
Any complex JSON output (e.g., image analysis) is summarized.
A system prompt guides Groq to produce short, user-friendly summaries.
Ensures technical data is explained in simple language.
Conversation Management
All interactions are stored in Chatbot history.
User query + bot response pairs are maintained for continuity.
Multimodal interactions (e.g., image + explanation) are rendered in chat.
Architecture at a Glance
User Input (Text / Image / Audio)
β
βΌ
Intent Classifier βββΊ Rule-based (intents.json)
β
ββ Chat β Chatbot Client (LLM)
ββ Search Local Image β Embedding Match
ββ Image Analysis β Vision Client + Groq Summary
ββ Audio Analysis β Audio Client
βΌ
Response Generator (Groq Narrative + History)
βΌ
Gradio Chat UI