Spaces:

Anvit25
/

Orchestrator_final

Runtime error

App Files Files Community

Create methodology.md

by mandarmgd-03 - opened Sep 29, 2025

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

+87

-0

Files changed (1) hide show

methodology.md +87 -0

methodology.md ADDED Viewed

	@@ -0,0 +1,87 @@

+# Methodology
+The chatbot integrates multiple AI workflows into a single Gradio UI. The process follows these main stages:
+## Input Handling
+Users interact via a multimodal text box (supports text, image, and audio).
+The chatbot determines whether the query contains:
+Text only
+Image file
+Audio file
+## Intent Classification
+Text queries are processed through a rule-based intent classifier (intents.json).
+Example intents:
+"chat" → Send to hosted chatbot LLM.
+"search_local_image" → Trigger local semantic image search.
+"request_image_analysis" → Ask user to upload an image.
+"request_audio_analysis" → Ask user to upload audio.
+## Local Semantic Search
+Metadata from image.json provides descriptions for images in /images/.
+Each description is encoded using SentenceTransformers (all-MiniLM-L6-v2).
+Query embeddings are compared with stored embeddings using cosine similarity.
+If similarity > threshold (0.4), best match image is returned.
+## Image Analysis Workflow
+Uploaded images are passed to the vision model (via gradio_client).
+Raw AI output (JSON) is summarized with Groq API (LLaMA-3.3-70B).
+Final user-facing response is a friendly explanation.
+## Audio Analysis Workflow
+Uploaded audio is processed via the audio model (Gradio client).
+Returns prediction text (e.g., transcription or classification).
+Packaged as a human-readable response.
+## Groq Summarization
+Any complex JSON output (e.g., image analysis) is summarized.
+A system prompt guides Groq to produce short, user-friendly summaries.
+Ensures technical data is explained in simple language.
+## Conversation Management
+All interactions are stored in Chatbot history.
+User query + bot response pairs are maintained for continuity.
+Multimodal interactions (e.g., image + explanation) are rendered in chat.
+## Architecture at a Glance
+User Input (Text / Image / Audio)
+        │
+        ▼
+Intent Classifier ──► Rule-based (intents.json)
+        │
+        ├─ Chat → Chatbot Client (LLM)
+        ├─ Search Local Image → Embedding Match
+        ├─ Image Analysis → Vision Client + Groq Summary
+        └─ Audio Analysis → Audio Client
+        ▼
+Response Generator (Groq Narrative + History)
+        ▼
+Gradio Chat UI