Spaces:
Runtime error
Runtime error
File size: 2,351 Bytes
98df613 2bc6924 98df613 2bc6924 98df613 2bc6924 98df613 2bc6924 98df613 2bc6924 98df613 2bc6924 98df613 2bc6924 98df613 2bc6924 98df613 2bc6924 98df613 2bc6924 98df613 2bc6924 98df613 2bc6924 98df613 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 | # Methodology
The chatbot integrates multiple AI workflows into a single Gradio UI. The process follows these main stages:
## Input Handling
Users interact via a multimodal text box (supports text, image, and audio).
The chatbot determines whether the query contains:
Text only
Image file
Audio file
## Intent Classification
Text queries are processed through a rule-based intent classifier (intents.json).
Example intents:
"chat" β Send to hosted chatbot LLM.
"search_local_image" β Trigger local semantic image search.
"request_image_analysis" β Ask user to upload an image.
"request_audio_analysis" β Ask user to upload audio.
## Local Semantic Search
Metadata from image.json provides descriptions for images in /images/.
Each description is encoded using SentenceTransformers (all-MiniLM-L6-v2).
Query embeddings are compared with stored embeddings using cosine similarity.
If similarity > threshold (0.4), best match image is returned.
## Image Analysis Workflow
Uploaded images are passed to the vision model (via gradio_client).
Raw AI output (JSON) is summarized with Groq API (LLaMA-3.3-70B).
Final user-facing response is a friendly explanation.
## Audio Analysis Workflow
Uploaded audio is processed via the audio model (Gradio client).
Returns prediction text (e.g., transcription or classification).
Packaged as a human-readable response.
## Groq Summarization
Any complex JSON output (e.g., image analysis) is summarized.
A system prompt guides Groq to produce short, user-friendly summaries.
Ensures technical data is explained in simple language.
## Conversation Management
All interactions are stored in Chatbot history.
User query + bot response pairs are maintained for continuity.
Multimodal interactions (e.g., image + explanation) are rendered in chat.
## Architecture at a Glance
User Input (Text / Image / Audio)
β
βΌ
Intent Classifier βββΊ Rule-based (intents.json)
β
ββ Chat β Chatbot Client (LLM)
ββ Search Local Image β Embedding Match
ββ Image Analysis β Vision Client + Groq Summary
ββ Audio Analysis β Audio Client
βΌ
Response Generator (Groq Narrative + History)
βΌ
Gradio Chat UI |