Create methodology.md

#1
by mandarmgd-03 - opened
Files changed (1) hide show
  1. methodology.md +87 -0
methodology.md ADDED
@@ -0,0 +1,87 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Methodology
2
+
3
+ The chatbot integrates multiple AI workflows into a single Gradio UI. The process follows these main stages:
4
+
5
+ ## Input Handling
6
+
7
+ Users interact via a multimodal text box (supports text, image, and audio).
8
+
9
+ The chatbot determines whether the query contains:
10
+
11
+ Text only
12
+
13
+ Image file
14
+
15
+ Audio file
16
+
17
+ ## Intent Classification
18
+
19
+ Text queries are processed through a rule-based intent classifier (intents.json).
20
+
21
+ Example intents:
22
+
23
+ "chat" β†’ Send to hosted chatbot LLM.
24
+
25
+ "search_local_image" β†’ Trigger local semantic image search.
26
+
27
+ "request_image_analysis" β†’ Ask user to upload an image.
28
+
29
+ "request_audio_analysis" β†’ Ask user to upload audio.
30
+
31
+ ## Local Semantic Search
32
+
33
+ Metadata from image.json provides descriptions for images in /images/.
34
+
35
+ Each description is encoded using SentenceTransformers (all-MiniLM-L6-v2).
36
+
37
+ Query embeddings are compared with stored embeddings using cosine similarity.
38
+
39
+ If similarity > threshold (0.4), best match image is returned.
40
+
41
+ ## Image Analysis Workflow
42
+
43
+ Uploaded images are passed to the vision model (via gradio_client).
44
+
45
+ Raw AI output (JSON) is summarized with Groq API (LLaMA-3.3-70B).
46
+
47
+ Final user-facing response is a friendly explanation.
48
+
49
+ ## Audio Analysis Workflow
50
+
51
+ Uploaded audio is processed via the audio model (Gradio client).
52
+
53
+ Returns prediction text (e.g., transcription or classification).
54
+
55
+ Packaged as a human-readable response.
56
+
57
+ ## Groq Summarization
58
+
59
+ Any complex JSON output (e.g., image analysis) is summarized.
60
+
61
+ A system prompt guides Groq to produce short, user-friendly summaries.
62
+
63
+ Ensures technical data is explained in simple language.
64
+
65
+ ## Conversation Management
66
+
67
+ All interactions are stored in Chatbot history.
68
+
69
+ User query + bot response pairs are maintained for continuity.
70
+
71
+ Multimodal interactions (e.g., image + explanation) are rendered in chat.
72
+
73
+ ## Architecture at a Glance
74
+
75
+ User Input (Text / Image / Audio)
76
+ β”‚
77
+ β–Ό
78
+ Intent Classifier ──► Rule-based (intents.json)
79
+ β”‚
80
+ β”œβ”€ Chat β†’ Chatbot Client (LLM)
81
+ β”œβ”€ Search Local Image β†’ Embedding Match
82
+ β”œβ”€ Image Analysis β†’ Vision Client + Groq Summary
83
+ └─ Audio Analysis β†’ Audio Client
84
+ β–Ό
85
+ Response Generator (Groq Narrative + History)
86
+ β–Ό
87
+ Gradio Chat UI