Spaces:

Anvit25
/

Orchestrator_final

Runtime error

App Files Files Community

Orchestrator_final / methodology.md

Anvit25

Update methodology.md

2bc6924 verified 8 months ago

preview code

raw

history blame contribute delete

2.35 kB

	# Methodology

	The chatbot integrates multiple AI workflows into a single Gradio UI. The process follows these main stages:

	## Input Handling

	Users interact via a multimodal text box (supports text, image, and audio).

	The chatbot determines whether the query contains:

	Text only

	Image file

	Audio file

	## Intent Classification

	Text queries are processed through a rule-based intent classifier (intents.json).

	Example intents:

	"chat" → Send to hosted chatbot LLM.

	"search_local_image" → Trigger local semantic image search.

	"request_image_analysis" → Ask user to upload an image.

	"request_audio_analysis" → Ask user to upload audio.

	## Local Semantic Search

	Metadata from image.json provides descriptions for images in /images/.

	Each description is encoded using SentenceTransformers (all-MiniLM-L6-v2).

	Query embeddings are compared with stored embeddings using cosine similarity.

	If similarity > threshold (0.4), best match image is returned.

	## Image Analysis Workflow

	Uploaded images are passed to the vision model (via gradio_client).

	Raw AI output (JSON) is summarized with Groq API (LLaMA-3.3-70B).

	Final user-facing response is a friendly explanation.

	## Audio Analysis Workflow

	Uploaded audio is processed via the audio model (Gradio client).

	Returns prediction text (e.g., transcription or classification).

	Packaged as a human-readable response.

	## Groq Summarization

	Any complex JSON output (e.g., image analysis) is summarized.

	A system prompt guides Groq to produce short, user-friendly summaries.

	Ensures technical data is explained in simple language.

	## Conversation Management

	All interactions are stored in Chatbot history.

	User query + bot response pairs are maintained for continuity.

	Multimodal interactions (e.g., image + explanation) are rendered in chat.

	## Architecture at a Glance

	User Input (Text / Image / Audio)


	│

	▼

	Intent Classifier ──► Rule-based (intents.json)

	│

	├─ Chat → Chatbot Client (LLM)

	├─ Search Local Image → Embedding Match

	├─ Image Analysis → Vision Client + Groq Summary

	└─ Audio Analysis → Audio Client

	▼

	Response Generator (Groq Narrative + History)

	▼

	Gradio Chat UI