Zeggai Abdellah commited on
Commit
b9bf1c6
·
1 Parent(s): fdc8d14

update the main from test space

Browse files
data/Guide-pratique-de-mise-en-oeuvre-du-calendrier-national-de-vaccination-2023.json CHANGED
@@ -9061,13 +9061,13 @@
9061
  {
9062
  "type": "TableElement",
9063
  "element_id": "chunk-69_table_-1677335896359719811",
9064
- "text": "\n DESCRIPTION:\n Le calendrier vaccinal 2023 comprend les vaccins BCG, HBV, DTCaVPI-Hib-HBV, VPC, VPO, ROR, DTCaVPI et dT, administrés selon l'âge.\n :TABLE DATA:\n Vaccin Âge Naissance 02 mois 04 mois 11 mois 12 mois 18 mois 6 ans 11-13 ans 16-18 ans Tous les 10 ans à partir de 18 ans BCG BCG HBV HBV DTCaVPI-Hib-HBV DTCaVPI-Hib-HBV DTCaVPI-Hib-HBV DTCaVPI-Hib-HBV VPC VPC VPC VPC VPO VPO VPO VPO ROR ROR ROR DTCaVPI DTCaVPI dT dT dT dT\n :END TABLE DATA:\n\n ",
9065
  "filename": "Guide-pratique-de-mise-en-oeuvre-du-calendrier-national-de-vaccination-2023.pdf",
9066
  "filetype": "application/pdf",
9067
  "elements": {
9068
  "type": "Table",
9069
  "element_id": "2fbbc8eb32492ddeb5527973a42eeada",
9070
- "text": "Vaccin Âge Naissance 02 mois 04 mois 11 mois 12 mois 18 mois 6 ans 11-13 ans 16-18 ans Tous les 10 ans à partir de 18 ans BCG BCG HBV HBV DTCaVPI-Hib-HBV DTCaVPI-Hib-HBV DTCaVPI-Hib-HBV DTCaVPI-Hib-HBV VPC VPC VPC VPC VPO VPO VPO VPO ROR ROR ROR DTCaVPI DTCaVPI dT dT dT dT",
9071
  "metadata": {
9072
  "category_depth": 1,
9073
  "page_number": 10,
@@ -9128,7 +9128,7 @@
9128
  {
9129
  "type": "Table",
9130
  "element_id": "2fbbc8eb32492ddeb5527973a42eeada",
9131
- "text": "Vaccin Âge Naissance 02 mois 04 mois 11 mois 12 mois 18 mois 6 ans 11-13 ans 16-18 ans Tous les 10 ans à partir de 18 ans BCG BCG HBV HBV DTCaVPI-Hib-HBV DTCaVPI-Hib-HBV DTCaVPI-Hib-HBV DTCaVPI-Hib-HBV VPC VPC VPC VPC VPO VPO VPO VPO ROR ROR ROR DTCaVPI DTCaVPI dT dT dT dT",
9132
  "metadata": {
9133
  "category_depth": 1,
9134
  "page_number": 10,
 
9061
  {
9062
  "type": "TableElement",
9063
  "element_id": "chunk-69_table_-1677335896359719811",
9064
+ "text": "\n DESCRIPTION:\n Le calendrier vaccinal 2023 comprend les vaccins BCG, HBV, DTCaVPI-Hib-HBV, VPC, VPO, ROR, DTCaVPI et dT, administrés selon l'âge.\n :TABLE DATA:\n * **Naissance (Birth):** BCG, HBV\n* **02 mois (2 months):** DTCaVPI-Hib-HBV, VPC, VPO\n* **04 mois (4 months):** DTCaVPI-Hib-HBV, VPC, VPO\n* **11 mois (11 months):** ROR\n* **12 mois (12 months):** DTCaVPI-Hib-HBV, VPC, VPO\n* **18 mois (18 months):** ROR\n* **6 ans (6 years):** DTCaVPI\n* **11-13 ans (11-13 years):** dT\n* **16-18 ans (16-18 years):** dT\n* **Tous les 10 ans à partir de 18 ans (Every 10 years from age 18):** dT\n\nVaccin Âge Naissance 02 mois 04 mois 11 mois 12 mois 18 mois 6 ans 11-13 ans 16-18 ans Tous les 10 ans à partir de 18 ans BCG BCG HBV HBV DTCaVPI-Hib-HBV DTCaVPI-Hib-HBV DTCaVPI-Hib-HBV DTCaVPI-Hib-HBV VPC VPC VPC VPC VPO VPO VPO VPO ROR ROR ROR DTCaVPI DTCaVPI dT dT dT dT\n :END TABLE DATA:\n\n ",
9065
  "filename": "Guide-pratique-de-mise-en-oeuvre-du-calendrier-national-de-vaccination-2023.pdf",
9066
  "filetype": "application/pdf",
9067
  "elements": {
9068
  "type": "Table",
9069
  "element_id": "2fbbc8eb32492ddeb5527973a42eeada",
9070
+ "text": "* **Naissance (Birth):** BCG, HBV\n* **02 mois (2 months):** DTCaVPI-Hib-HBV, VPC, VPO\n* **04 mois (4 months):** DTCaVPI-Hib-HBV, VPC, VPO\n* **11 mois (11 months):** ROR\n* **12 mois (12 months):** DTCaVPI-Hib-HBV, VPC, VPO\n* **18 mois (18 months):** ROR\n* **6 ans (6 years):** DTCaVPI\n* **11-13 ans (11-13 years):** dT\n* **16-18 ans (16-18 years):** dT\n* **Tous les 10 ans à partir de 18 ans (Every 10 years from age 18):** dT\n\nVaccin Âge Naissance 02 mois 04 mois 11 mois 12 mois 18 mois 6 ans 11-13 ans 16-18 ans Tous les 10 ans à partir de 18 ans BCG BCG HBV HBV DTCaVPI-Hib-HBV DTCaVPI-Hib-HBV DTCaVPI-Hib-HBV DTCaVPI-Hib-HBV VPC VPC VPC VPC VPO VPO VPO VPO ROR ROR ROR DTCaVPI DTCaVPI dT dT dT dT",
9071
  "metadata": {
9072
  "category_depth": 1,
9073
  "page_number": 10,
 
9128
  {
9129
  "type": "Table",
9130
  "element_id": "2fbbc8eb32492ddeb5527973a42eeada",
9131
+ "text": "* **Naissance (Birth):** BCG, HBV\n* **02 mois (2 months):** DTCaVPI-Hib-HBV, VPC, VPO\n* **04 mois (4 months):** DTCaVPI-Hib-HBV, VPC, VPO\n* **11 mois (11 months):** ROR\n* **12 mois (12 months):** DTCaVPI-Hib-HBV, VPC, VPO\n* **18 mois (18 months):** ROR\n* **6 ans (6 years):** DTCaVPI\n* **11-13 ans (11-13 years):** dT\n* **16-18 ans (16-18 years):** dT\n* **Tous les 10 ans à partir de 18 ans (Every 10 years from age 18):** dT\n\nVaccin Âge Naissance 02 mois 04 mois 11 mois 12 mois 18 mois 6 ans 11-13 ans 16-18 ans Tous les 10 ans à partir de 18 ans BCG BCG HBV HBV DTCaVPI-Hib-HBV DTCaVPI-Hib-HBV DTCaVPI-Hib-HBV DTCaVPI-Hib-HBV VPC VPC VPC VPC VPO VPO VPO VPO ROR ROR ROR DTCaVPI DTCaVPI dT dT dT dT",
9132
  "metadata": {
9133
  "category_depth": 1,
9134
  "page_number": 10,
data/section_two_chunks.json CHANGED
@@ -6609,13 +6609,13 @@
6609
  {
6610
  "type": "TableElement",
6611
  "element_id": "chunk-59_table_-3658241022386714145",
6612
- "text": "\n DESCRIPTION:\n Le calendrier vaccinal 2023 débute à la naissance avec BCG et HBV, se poursuit avec des vaccins combinés et des rappels réguliers tout au long de la vie.\n :TABLE DATA:\n Vaccin Âge Naissance 02 mois 04 mois 11 mois 12 mois 18 mois 6 ans 11-13 ans 16-18 ans Tous les 10 ans à partir de 18 ans BCG BCG HBV HBV DTCaVPI-Hib-HBV DTCaVPI-Hib-HBV DTCaVPI-Hib-HBV DTCaVPI-Hib-HBV VPC VPC VPC VPC VPO VPO VPO VPO ROR ROR ROR DTCaVPI DTCaVPI dT dT dT dT\n :END TABLE DATA:\n\n ",
6613
  "filename": "Guide-pratique-de-mise-en-oeuvre-du-calendrier-national-de-vaccination-2023.pdf",
6614
  "filetype": "application/pdf",
6615
  "elements": {
6616
  "type": "Table",
6617
  "element_id": "2fbbc8eb32492ddeb5527973a42eeada",
6618
- "text": "Vaccin Âge Naissance 02 mois 04 mois 11 mois 12 mois 18 mois 6 ans 11-13 ans 16-18 ans Tous les 10 ans à partir de 18 ans BCG BCG HBV HBV DTCaVPI-Hib-HBV DTCaVPI-Hib-HBV DTCaVPI-Hib-HBV DTCaVPI-Hib-HBV VPC VPC VPC VPC VPO VPO VPO VPO ROR ROR ROR DTCaVPI DTCaVPI dT dT dT dT",
6619
  "metadata": {
6620
  "category_depth": 1,
6621
  "page_number": 10,
@@ -6676,7 +6676,7 @@
6676
  {
6677
  "type": "Table",
6678
  "element_id": "2fbbc8eb32492ddeb5527973a42eeada",
6679
- "text": "Vaccin Âge Naissance 02 mois 04 mois 11 mois 12 mois 18 mois 6 ans 11-13 ans 16-18 ans Tous les 10 ans à partir de 18 ans BCG BCG HBV HBV DTCaVPI-Hib-HBV DTCaVPI-Hib-HBV DTCaVPI-Hib-HBV DTCaVPI-Hib-HBV VPC VPC VPC VPC VPO VPO VPO VPO ROR ROR ROR DTCaVPI DTCaVPI dT dT dT dT",
6680
  "metadata": {
6681
  "category_depth": 1,
6682
  "page_number": 10,
 
6609
  {
6610
  "type": "TableElement",
6611
  "element_id": "chunk-59_table_-3658241022386714145",
6612
+ "text": "\n DESCRIPTION:\n Le calendrier vaccinal 2023 comprend les vaccins BCG, HBV, DTCaVPI-Hib-HBV, VPC, VPO, ROR, DTCaVPI et dT, administrés selon l'âge.\n :TABLE DATA:\n * **Naissance (Birth):** BCG, HBV\n* **02 mois (2 months):** DTCaVPI-Hib-HBV, VPC, VPO\n* **04 mois (4 months):** DTCaVPI-Hib-HBV, VPC, VPO\n* **11 mois (11 months):** ROR\n* **12 mois (12 months):** DTCaVPI-Hib-HBV, VPC, VPO\n* **18 mois (18 months):** ROR\n* **6 ans (6 years):** DTCaVPI\n* **11-13 ans (11-13 years):** dT\n* **16-18 ans (16-18 years):** dT\n* **Tous les 10 ans à partir de 18 ans (Every 10 years from age 18):** dT\n\nVaccin Âge Naissance 02 mois 04 mois 11 mois 12 mois 18 mois 6 ans 11-13 ans 16-18 ans Tous les 10 ans à partir de 18 ans BCG BCG HBV HBV DTCaVPI-Hib-HBV DTCaVPI-Hib-HBV DTCaVPI-Hib-HBV DTCaVPI-Hib-HBV VPC VPC VPC VPC VPO VPO VPO VPO ROR ROR ROR DTCaVPI DTCaVPI dT dT dT dT\n :END TABLE DATA:\n\n ",
6613
  "filename": "Guide-pratique-de-mise-en-oeuvre-du-calendrier-national-de-vaccination-2023.pdf",
6614
  "filetype": "application/pdf",
6615
  "elements": {
6616
  "type": "Table",
6617
  "element_id": "2fbbc8eb32492ddeb5527973a42eeada",
6618
+ "text": "* **Naissance (Birth):** BCG, HBV\n* **02 mois (2 months):** DTCaVPI-Hib-HBV, VPC, VPO\n* **04 mois (4 months):** DTCaVPI-Hib-HBV, VPC, VPO\n* **11 mois (11 months):** ROR\n* **12 mois (12 months):** DTCaVPI-Hib-HBV, VPC, VPO\n* **18 mois (18 months):** ROR\n* **6 ans (6 years):** DTCaVPI\n* **11-13 ans (11-13 years):** dT\n* **16-18 ans (16-18 years):** dT\n* **Tous les 10 ans à partir de 18 ans (Every 10 years from age 18):** dT\n\nVaccin Âge Naissance 02 mois 04 mois 11 mois 12 mois 18 mois 6 ans 11-13 ans 16-18 ans Tous les 10 ans à partir de 18 ans BCG BCG HBV HBV DTCaVPI-Hib-HBV DTCaVPI-Hib-HBV DTCaVPI-Hib-HBV DTCaVPI-Hib-HBV VPC VPC VPC VPC VPO VPO VPO VPO ROR ROR ROR DTCaVPI DTCaVPI dT dT dT dT",
6619
  "metadata": {
6620
  "category_depth": 1,
6621
  "page_number": 10,
 
6676
  {
6677
  "type": "Table",
6678
  "element_id": "2fbbc8eb32492ddeb5527973a42eeada",
6679
+ "text": "* **Naissance (Birth):** BCG, HBV\n* **02 mois (2 months):** DTCaVPI-Hib-HBV, VPC, VPO\n* **04 mois (4 months):** DTCaVPI-Hib-HBV, VPC, VPO\n* **11 mois (11 months):** ROR\n* **12 mois (12 months):** DTCaVPI-Hib-HBV, VPC, VPO\n* **18 mois (18 months):** ROR\n* **6 ans (6 years):** DTCaVPI\n* **11-13 ans (11-13 years):** dT\n* **16-18 ans (16-18 years):** dT\n* **Tous les 10 ans à partir de 18 ans (Every 10 years from age 18):** dT\n\nVaccin Âge Naissance 02 mois 04 mois 11 mois 12 mois 18 mois 6 ans 11-13 ans 16-18 ans Tous les 10 ans à partir de 18 ans BCG BCG HBV HBV DTCaVPI-Hib-HBV DTCaVPI-Hib-HBV DTCaVPI-Hib-HBV DTCaVPI-Hib-HBV VPC VPC VPC VPC VPO VPO VPO VPO ROR ROR ROR DTCaVPI DTCaVPI dT dT dT dT",
6680
  "metadata": {
6681
  "category_depth": 1,
6682
  "page_number": 10,
prepare_env.py CHANGED
@@ -17,7 +17,11 @@ from langchain.retrievers.multi_query import MultiQueryRetriever
17
  from langchain_google_genai import ChatGoogleGenerativeAI
18
  from llama_index.core.tools import FunctionTool
19
  from llama_index.core.schema import TextNode
 
 
20
 
 
 
21
 
22
  def extract_source_ids(response_text):
23
  """
@@ -124,21 +128,54 @@ def create_vectorstore_from_json(json_path: str, collection_name: str, embedding
124
  print(f"✅ Vector store created with collection: {collection_name}")
125
  return vectorstore, documents
126
 
127
- def create_retriever(vectorstore, docs, llm):
128
- """Create ensemble retriever with vector and BM25 search"""
 
 
 
 
 
 
 
 
 
 
 
129
  print("🔍 Creating ensemble retriever...")
130
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
131
  # Vector retriever
132
  vector_retriever = vectorstore.as_retriever(
133
  search_type="similarity",
134
- search_kwargs={"k": 6}
135
  )
136
- print("✅ Vector retriever created (k=6)")
137
 
138
  # BM25 retriever
139
  bm25_retriever = BM25Retriever.from_documents(docs)
140
- bm25_retriever.k = 2
141
- print("✅ BM25 retriever created (k=2)")
142
 
143
  # Ensemble retriever
144
  ensemble_retriever = EnsembleRetriever(
@@ -147,10 +184,12 @@ def create_retriever(vectorstore, docs, llm):
147
  )
148
  print("✅ Ensemble retriever created (weights: 0.5, 0.5)")
149
 
150
- # Multi-query expanding retriever
 
151
  expanding_retriever = MultiQueryRetriever.from_llm(
152
  retriever=ensemble_retriever,
153
- llm=llm
 
154
  )
155
  print("✅ Multi-query expanding retriever created")
156
 
@@ -219,13 +258,15 @@ def section_tool_wrapper(retriever, section_path_chunks, query):
219
  return f"Error retrieving documents: {str(e)}"
220
 
221
  def create_section_tools(embedding_function, llm):
222
- """Create all section-specific retrieval tools"""
223
- print("🛠️ Creating section-specific retrieval tools...")
224
-
 
 
225
  # Define section paths - Fixed path structure
226
  section_paths = {
227
  'one': './data/section_one_chunks.json',
228
- 'two': './data/section_two_chunks.json',
229
  'three': './data/section_three_chunks.json',
230
  'four': './data/section_four_chunks.json',
231
  'five': './data/section_five_chunks.json',
@@ -235,7 +276,7 @@ def create_section_tools(embedding_function, llm):
235
  'nine': './data/section_nine_chunks.json',
236
  'ten': './data/section_ten_chunks.json'
237
  }
238
-
239
  # Create retrievers for each section
240
  section_retrievers = {}
241
  for section, path in section_paths.items():
@@ -243,7 +284,7 @@ def create_section_tools(embedding_function, llm):
243
  if os.path.exists(path):
244
  print(f"📁 Creating retriever for section {section} from {path}")
245
  vstore, docs = create_vectorstore_from_json(path, f"Guide_2023_{section}", embedding_function)
246
- section_retrievers[section] = create_retriever(vstore, docs, llm)
247
  print(f"✅ Successfully created retriever for section {section}")
248
  else:
249
  print(f"⚠️ Warning: File not found for section {section}: {path}")
@@ -251,7 +292,7 @@ def create_section_tools(embedding_function, llm):
251
  except Exception as e:
252
  print(f"❌ Error creating retriever for section {section}: {e}")
253
  section_retrievers[section] = None
254
-
255
  # Create main guide retriever
256
  guide_path = './data/Guide-pratique-de-mise-en-oeuvre-du-calendrier-national-de-vaccination-2023.json'
257
  guide_retriever = None
@@ -284,71 +325,48 @@ def create_section_tools(embedding_function, llm):
284
  except Exception as e:
285
  print(f"❌ Error creating immunization retriever: {e}")
286
 
287
- # General-purpose tool (entire Algerian guide)
288
- def guide_retrieval_tool(query: str) -> str:
289
- """
290
- General-purpose retrieval tool for the entire Algerian National Vaccination Guide (2023).
291
-
292
- Use ONLY when a query spans multiple unrelated sections or cannot be confidently routed
293
- to a more specific tool. This tool provides a fallback when the intent is ambiguous
294
- or multi-topic (e.g., combining schedule, cold chain, and public outreach in one query).
295
 
296
- Do NOT use this tool for clearly scoped questions related to vaccination schedules, disease profiles,
297
- vaccine logistics, or procedural workflows — use the specific section tools instead.
298
-
299
- Secondary source: WHO Immunization Guide (use `immunization_tool` when broader context is required).
 
 
300
 
301
  Args:
302
- query (str): A general, complex, or cross-sectional vaccination query.
303
 
304
  Returns:
305
- str: Synthesized answer from the entire national guide.
306
  """
307
- print(f"🏥 GUIDE TOOL CALLED: {query[:50]}...")
308
  if not guide_retriever:
309
- print("❌ Guide retriever not available - main guide file may be missing")
310
  return "Guide retriever not available - main guide file may be missing"
311
- try:
312
- return section_tool_wrapper(guide_retriever, guide_path, query)
313
- except Exception as e:
314
- print(f"❌ Error accessing guide retriever: {str(e)}")
315
- return f"Error accessing guide retriever: {str(e)}"
316
 
317
- def immunization_tool(query: str) -> str:
318
  """
319
- WHO Immunization in Practice 2015 retrieval tool.
320
-
321
- Use ONLY when global guidance or procedural context is needed that is not covered or
322
- is unclear in the Algerian guide. This is a secondary reference for training standards,
323
- general immunization logistics, and vaccine delivery practices.
324
-
325
- Do NOT use this tool to answer country-specific policy or scheduling questions.
326
 
327
  Args:
328
- query (str): A question seeking global immunization practices.
329
 
330
  Returns:
331
  str: Content from the WHO Immunization in Practice guide.
332
  """
333
  print(f"🌍 WHO TOOL CALLED: {query[:50]}...")
334
  if not immunization_retriever:
335
- print("❌ Immunization in Practice retriever not available - WHO guide file may be missing")
336
  return "Immunization in Practice retriever not available - WHO guide file may be missing"
337
- try:
338
- return section_tool_wrapper(immunization_retriever, immunization_path, query)
339
- except Exception as e:
340
- print(f"❌ Error accessing immunization retriever: {str(e)}")
341
- return f"Error accessing immunization retriever: {str(e)}"
342
 
343
- # Section-Specific Tools - Fixed implementation
344
- def section_one_tool(query: str) -> str:
345
  """
346
- Section 1: Programme Élargi de Vaccination (PEV)
347
-
348
- Use for queries about the national immunization program structure: its objectives,
349
- history, evaluation, and rationale for updates to the Algerian calendar.
350
-
351
- Do NOT use for vaccine schedules, disease information, or administration techniques.
352
 
353
  Args:
354
  query (str): A question about the foundation or evolution of the PEV.
@@ -356,24 +374,16 @@ def create_section_tools(embedding_function, llm):
356
  Returns:
357
  str: Response from Section 1.
358
  """
359
- print(f"📋 SECTION 1 TOOL CALLED: {query[:50]}...")
360
  if not section_retrievers.get('one'):
361
- print("Section 1 retriever not available - file may be missing")
362
- return "Section 1 retriever not available - file may be missing"
363
- try:
364
- return section_tool_wrapper(section_retrievers['one'], section_paths['one'], query)
365
- except Exception as e:
366
- print(f"❌ Error accessing section 1: {str(e)}")
367
- return f"Error accessing section 1: {str(e)}"
368
 
369
- def section_two_tool(query: str) -> str:
370
  """
371
- Section 2: Maladies Ciblées
372
-
373
- Use ONLY for questions about the characteristics of vaccine-preventable diseases:
374
- symptoms, transmission, complications, and prevention.
375
-
376
- Do NOT use for questions about vaccines, administration schedules, or procedures.
377
 
378
  Args:
379
  query (str): A question about a disease covered by the national vaccination program.
@@ -381,99 +391,67 @@ def create_section_tools(embedding_function, llm):
381
  Returns:
382
  str: Disease-specific content from Section 2.
383
  """
384
- print(f"🦠 SECTION 2 TOOL CALLED: {query[:50]}...")
385
  if not section_retrievers.get('two'):
386
- print("Section 2 retriever not available - file may be missing")
387
- return "Section 2 retriever not available - file may be missing"
388
- try:
389
- return section_tool_wrapper(section_retrievers['two'], section_paths['two'], query)
390
- except Exception as e:
391
- print(f"❌ Error accessing section 2: {str(e)}")
392
- return f"Error accessing section 2: {str(e)}"
393
 
394
- def section_three_tool(query: str) -> str:
395
  """
396
- Section 3: Vaccins du Calendrier
397
-
398
- Use ONLY for questions about the vaccines themselves: their types, compositions,
399
- methods of administration, and how they work.
400
-
401
- Do NOT use for schedule timing, catch-up protocols, or disease information.
402
 
403
  Args:
404
- query (str): A question about a vaccine's formulation or method of delivery.
405
 
406
  Returns:
407
- str: Vaccine info from Section 3.
408
  """
409
- print(f"💉 SECTION 3 TOOL CALLED: {query[:50]}...")
410
  if not section_retrievers.get('three'):
411
- print("Section 3 retriever not available - file may be missing")
412
- return "Section 3 retriever not available - file may be missing"
413
- try:
414
- return section_tool_wrapper(section_retrievers['three'], section_paths['three'], query)
415
- except Exception as e:
416
- print(f"❌ Error accessing section 3: {str(e)}")
417
- return f"Error accessing section 3: {str(e)}"
418
 
419
- def section_four_tool(query: str) -> str:
420
  """
421
- Section 4: Rattrapage Vaccinal
422
-
423
- Use ONLY when the question involves missed or delayed vaccinations and how to reschedule them
424
- based on the child's current age.
425
-
426
- Do NOT use for standard schedules (on-time), vaccine properties, or cold chain issues.
427
 
428
  Args:
429
- query (str): A question about catch-up vaccination based on delay or omission.
430
 
431
  Returns:
432
- str: Catch-up guidance from Section 4.
433
  """
434
- print(f"🔄 SECTION 4 TOOL CALLED: {query[:50]}...")
435
  if not section_retrievers.get('four'):
436
- print("Section 4 retriever not available - file may be missing")
437
- return "Section 4 retriever not available - file may be missing"
438
- try:
439
- return section_tool_wrapper(section_retrievers['four'], section_paths['four'], query)
440
- except Exception as e:
441
- print(f"❌ Error accessing section 4: {str(e)}")
442
- return f"Error accessing section 4: {str(e)}"
443
 
444
- def section_five_tool(query: str) -> str:
445
  """
446
- Section 5: Populations Particulières
447
-
448
- Use ONLY for vaccination questions concerning special populations:
449
- preterm infants, immunosuppressed patients, chronic illness, or allergy conditions.
450
-
451
- Do NOT use for general population, standard calendar, or vaccine preparation.
452
 
453
  Args:
454
- query (str): A question about tailored vaccination for vulnerable groups.
455
 
456
  Returns:
457
  str: Custom recommendations from Section 5.
458
  """
459
- print(f"👥 SECTION 5 TOOL CALLED: {query[:50]}...")
460
  if not section_retrievers.get('five'):
461
- print("Section 5 retriever not available - file may be missing")
462
- return "Section 5 retriever not available - file may be missing"
463
- try:
464
- return section_tool_wrapper(section_retrievers['five'], section_paths['five'], query)
465
- except Exception as e:
466
- print(f"❌ Error accessing section 5: {str(e)}")
467
- return f"Error accessing section 5: {str(e)}"
468
 
469
- def section_six_tool(query: str) -> str:
470
  """
471
- Section 6: Chaîne du Froid
472
-
473
- Use ONLY for questions about vaccine storage, transport, cold chain equipment,
474
- temperature monitoring, or cold chain failures.
475
-
476
- Do NOT use for dose timing, administration methods, or disease information.
477
 
478
  Args:
479
  query (str): A logistics-related question about vaccine temperature management.
@@ -481,99 +459,67 @@ def create_section_tools(embedding_function, llm):
481
  Returns:
482
  str: Cold chain instructions from Section 6.
483
  """
484
- print(f"❄️ SECTION 6 TOOL CALLED: {query[:50]}...")
485
  if not section_retrievers.get('six'):
486
- print("�� Section 6 retriever not available - file may be missing")
487
- return "Section 6 retriever not available - file may be missing"
488
- try:
489
- return section_tool_wrapper(section_retrievers['six'], section_paths['six'], query)
490
- except Exception as e:
491
- print(f"❌ Error accessing section 6: {str(e)}")
492
- return f"Error accessing section 6: {str(e)}"
493
 
494
- def section_seven_tool(query: str) -> str:
495
  """
496
- Section 7: Sécurité des Injections
497
-
498
- Use ONLY for questions related to the safe administration of vaccines:
499
- equipment use, technique, safety precautions, and waste disposal.
500
-
501
- Do NOT use for vaccine types, schedules, or cold chain issues.
502
 
503
  Args:
504
- query (str): A question about how to inject vaccines safely.
505
 
506
  Returns:
507
  str: Best practices from Section 7.
508
  """
509
- print(f"🛡️ SECTION 7 TOOL CALLED: {query[:50]}...")
510
  if not section_retrievers.get('seven'):
511
- print("Section 7 retriever not available - file may be missing")
512
- return "Section 7 retriever not available - file may be missing"
513
- try:
514
- return section_tool_wrapper(section_retrievers['seven'], section_paths['seven'], query)
515
- except Exception as e:
516
- print(f"❌ Error accessing section 7: {str(e)}")
517
- return f"Error accessing section 7: {str(e)}"
518
 
519
- def section_eight_tool(query: str) -> str:
520
  """
521
- Section 8: Séance de Vaccination & Vaccinovigilance
522
-
523
- Use ONLY for questions about running a vaccination session, preparing the setting,
524
- recording injections, and monitoring for adverse events (AEFI).
525
-
526
- Do NOT use for disease, vaccine, or scheduling details.
527
 
528
  Args:
529
- query (str): A question about operational conduct during vaccination.
530
 
531
  Returns:
532
  str: Workflow and safety monitoring details from Section 8.
533
  """
534
- print(f"📊 SECTION 8 TOOL CALLED: {query[:50]}...")
535
  if not section_retrievers.get('eight'):
536
- print("Section 8 retriever not available - file may be missing")
537
- return "Section 8 retriever not available - file may be missing"
538
- try:
539
- return section_tool_wrapper(section_retrievers['eight'], section_paths['eight'], query)
540
- except Exception as e:
541
- print(f"❌ Error accessing section 8: {str(e)}")
542
- return f"Error accessing section 8: {str(e)}"
543
 
544
- def section_nine_tool(query: str) -> str:
545
  """
546
- Section 9: Planification des Séances de Vaccination
547
-
548
- Use ONLY for planning and logistics questions: session scheduling, stock estimation,
549
- and operational preparation at the facility level.
550
-
551
- Do NOT use for vaccine info, schedules, or injection techniques.
552
 
553
  Args:
554
- query (str): A question about how to plan or organize vaccination services.
555
 
556
  Returns:
557
  str: Planning and stock guidance from Section 9.
558
  """
559
- print(f"📅 SECTION 9 TOOL CALLED: {query[:50]}...")
560
  if not section_retrievers.get('nine'):
561
- print("Section 9 retriever not available - file may be missing")
562
- return "Section 9 retriever not available - file may be missing"
563
- try:
564
- return section_tool_wrapper(section_retrievers['nine'], section_paths['nine'], query)
565
- except Exception as e:
566
- print(f"❌ Error accessing section 9: {str(e)}")
567
- return f"Error accessing section 9: {str(e)}"
568
 
569
- def section_ten_tool(query: str) -> str:
570
  """
571
- Section 10: Mobilisation Sociale
572
-
573
- Use ONLY for questions about communication strategies, overcoming vaccine hesitancy,
574
- rumor management, or community outreach.
575
-
576
- Do NOT use for medical, logistical, or procedural topics.
577
 
578
  Args:
579
  query (str): A question about public engagement or communication for vaccination.
@@ -581,34 +527,29 @@ def create_section_tools(embedding_function, llm):
581
  Returns:
582
  str: Public mobilization strategies from Section 10.
583
  """
584
- print(f"📢 SECTION 10 TOOL CALLED: {query[:50]}...")
585
  if not section_retrievers.get('ten'):
586
- print("Section 10 retriever not available - file may be missing")
587
- return "Section 10 retriever not available - file may be missing"
588
- try:
589
- return section_tool_wrapper(section_retrievers['ten'], section_paths['ten'], query)
590
- except Exception as e:
591
- print(f"❌ Error accessing section 10: {str(e)}")
592
- return f"Error accessing section 10: {str(e)}"
593
 
594
- # Create FunctionTool objects
595
  tools = [
596
- FunctionTool.from_defaults(name="Guide_vector_tool", fn=guide_retrieval_tool),
597
- FunctionTool.from_defaults(name="Immunization_in_Practice_tool", fn=immunization_tool),
598
  # Section-specific tools
599
- FunctionTool.from_defaults(name="section_one_vector_query_tool", fn=section_one_tool),
600
- FunctionTool.from_defaults(name="section_two_vector_query_tool", fn=section_two_tool),
601
- FunctionTool.from_defaults(name="section_three_vector_query_tool", fn=section_three_tool),
602
- FunctionTool.from_defaults(name="section_four_vector_query_tool", fn=section_four_tool),
603
- FunctionTool.from_defaults(name="section_five_vector_query_tool", fn=section_five_tool),
604
- FunctionTool.from_defaults(name="section_six_vector_query_tool", fn=section_six_tool),
605
- FunctionTool.from_defaults(name="section_seven_vector_query_tool", fn=section_seven_tool),
606
- FunctionTool.from_defaults(name="section_eight_vector_query_tool", fn=section_eight_tool),
607
- FunctionTool.from_defaults(name="section_nine_vector_query_tool", fn=section_nine_tool),
608
- FunctionTool.from_defaults(name="section_ten_vector_query_tool", fn=section_ten_tool),
609
  ]
610
-
611
- print(f"✅ Created {len(tools)} section tools")
612
  return tools
613
 
614
  def prepare_environment():
 
17
  from langchain_google_genai import ChatGoogleGenerativeAI
18
  from llama_index.core.tools import FunctionTool
19
  from llama_index.core.schema import TextNode
20
+ from langchain.prompts import PromptTemplate
21
+ import logging
22
 
23
+ logging.basicConfig()
24
+ logging.getLogger("langchain.retrievers.multi_query").setLevel(logging.INFO)
25
 
26
  def extract_source_ids(response_text):
27
  """
 
128
  print(f"✅ Vector store created with collection: {collection_name}")
129
  return vectorstore, documents
130
 
131
+ def create_retriever(vectorstore, docs, llm, bm25_k=3,vector_k=6):
132
+ """Create ensemble retriever with vector and BM25 search
133
+
134
+ Args:
135
+ vectorstore: The vector store for similarity search
136
+ docs: Documents for BM25 retriever
137
+ llm: Language model for multi-query generation
138
+ bm25_k: Number of documents to retrieve with BM25
139
+ vector_k: Number of documents to retrieve with vector search
140
+
141
+ Returns:
142
+ Configured retriever (MultiQueryRetriever or EnsembleRetriever)
143
+ """
144
  print("🔍 Creating ensemble retriever...")
145
 
146
+ # PromptTemplate for Vaccine Assistant MultiQuery Retriever
147
+ VACCINE_MULTIQUERY_PROMPT = PromptTemplate(
148
+ input_variables=["question"],
149
+ template="""You are an AI assistant specialized in vaccine-related medical information retrieval.
150
+ Your task is to generate multiple search queries based on the original question to find relevant information from official vaccine medical documents.
151
+
152
+ IMPORTANT GUIDELINES:
153
+ - Keep all vaccine-specific terminology and medical terms intact
154
+ - Maintain the clinical and medical context
155
+ - Focus on evidence-based vaccine information
156
+ - Preserve any specific vaccine names, diseases, or medical conditions mentioned
157
+ - Generate queries that would help retrieve information about vaccine schedules, dosing, contraindications, adverse events, and disease prevention
158
+
159
+ Original question: {question}
160
+
161
+ Generate 4 different search queries that rephrase the original question while maintaining vaccine terminology and medical accuracy. Each query should approach the topic from a slightly different angle to maximize retrieval from vaccine medical documents.
162
+
163
+ Provide only the alternative questions, one per line."""
164
+ )
165
+
166
+
167
+
168
  # Vector retriever
169
  vector_retriever = vectorstore.as_retriever(
170
  search_type="similarity",
171
+ search_kwargs={"k": vector_k}
172
  )
173
+ print(f"✅ Vector retriever created (k={vector_k})")
174
 
175
  # BM25 retriever
176
  bm25_retriever = BM25Retriever.from_documents(docs)
177
+ bm25_retriever.k = bm25_k
178
+ print(f"✅ BM25 retriever created (k={bm25_k})")
179
 
180
  # Ensemble retriever
181
  ensemble_retriever = EnsembleRetriever(
 
184
  )
185
  print("✅ Ensemble retriever created (weights: 0.5, 0.5)")
186
 
187
+
188
+ # Multi-query expanding retriever (only for filtered mode)
189
  expanding_retriever = MultiQueryRetriever.from_llm(
190
  retriever=ensemble_retriever,
191
+ llm=llm,
192
+ prompt=VACCINE_MULTIQUERY_PROMPT,
193
  )
194
  print("✅ Multi-query expanding retriever created")
195
 
 
258
  return f"Error retrieving documents: {str(e)}"
259
 
260
  def create_section_tools(embedding_function, llm):
261
+ """
262
+ Create all section-specific retrieval tools with improved descriptions for accurate routing.
263
+ """
264
+ print("🛠️ Creating section-specific retrieval tools with enhanced descriptions...")
265
+
266
  # Define section paths - Fixed path structure
267
  section_paths = {
268
  'one': './data/section_one_chunks.json',
269
+ 'two': './data/section_two_chunks.json',
270
  'three': './data/section_three_chunks.json',
271
  'four': './data/section_four_chunks.json',
272
  'five': './data/section_five_chunks.json',
 
276
  'nine': './data/section_nine_chunks.json',
277
  'ten': './data/section_ten_chunks.json'
278
  }
279
+
280
  # Create retrievers for each section
281
  section_retrievers = {}
282
  for section, path in section_paths.items():
 
284
  if os.path.exists(path):
285
  print(f"📁 Creating retriever for section {section} from {path}")
286
  vstore, docs = create_vectorstore_from_json(path, f"Guide_2023_{section}", embedding_function)
287
+ section_retrievers[section] = create_retriever(vstore, docs, llm, bm25_k=7, vector_k=10)
288
  print(f"✅ Successfully created retriever for section {section}")
289
  else:
290
  print(f"⚠️ Warning: File not found for section {section}: {path}")
 
292
  except Exception as e:
293
  print(f"❌ Error creating retriever for section {section}: {e}")
294
  section_retrievers[section] = None
295
+
296
  # Create main guide retriever
297
  guide_path = './data/Guide-pratique-de-mise-en-oeuvre-du-calendrier-national-de-vaccination-2023.json'
298
  guide_retriever = None
 
325
  except Exception as e:
326
  print(f"❌ Error creating immunization retriever: {e}")
327
 
328
+ # --- Tool Definitions with Improved Descriptions ---
 
 
 
 
 
 
 
329
 
330
+ def general_guide_tool(query: str) -> str:
331
+ """
332
+ A general-purpose tool for the Algerian National Vaccination Guide.
333
+ **Use this tool as a fallback** if no other specific tool seems appropriate, or for very broad, multi-topic questions
334
+ (e.g., 'Summarize the Algerian vaccination policy and its safety measures').
335
+ **Always prefer a more specific tool if the query matches its description** (e.g., use 'cold_chain_tool' for temperature questions).
336
 
337
  Args:
338
+ query (str): A broad or ambiguous question about the Algerian National Vaccination Guide.
339
 
340
  Returns:
341
+ str: Content retrieved from the entire guide.
342
  """
343
+ print(f"🏥 GENERAL GUIDE TOOL CALLED (FALLBACK): {query[:50]}...")
344
  if not guide_retriever:
 
345
  return "Guide retriever not available - main guide file may be missing"
346
+ return section_tool_wrapper(guide_retriever, guide_path, query)
 
 
 
 
347
 
348
+ def who_immunization_tool(query: str) -> str:
349
  """
350
+ Provides information from the WHO's 'Immunization in Practice' guide. Use this for questions about
351
+ **global immunization standards**, international best practices, or for comparing Algerian policy to
352
+ general WHO recommendations on topics like cold chain, safety, and disease control.
 
 
 
 
353
 
354
  Args:
355
+ query (str): A question seeking global or general immunization practices.
356
 
357
  Returns:
358
  str: Content from the WHO Immunization in Practice guide.
359
  """
360
  print(f"🌍 WHO TOOL CALLED: {query[:50]}...")
361
  if not immunization_retriever:
 
362
  return "Immunization in Practice retriever not available - WHO guide file may be missing"
363
+ return section_tool_wrapper(immunization_retriever, immunization_path, query)
 
 
 
 
364
 
365
+ def program_overview_tool(query: str) -> str:
 
366
  """
367
+ (Section 1) The primary tool for questions about the **history, objectives, and structure** of Algeria's
368
+ national immunization program (PEV - Programme Élargi de Vaccination). Use this for topics like
369
+ the program's rationale, key achievements, and the reasons for updates to the vaccination calendar.
 
 
 
370
 
371
  Args:
372
  query (str): A question about the foundation or evolution of the PEV.
 
374
  Returns:
375
  str: Response from Section 1.
376
  """
377
+ print(f"📋 PROGRAM OVERVIEW (S1) TOOL CALLED: {query[:50]}...")
378
  if not section_retrievers.get('one'):
379
+ return "Section 1 retriever not available"
380
+ return section_tool_wrapper(section_retrievers['one'], section_paths['one'], query)
 
 
 
 
 
381
 
382
+ def disease_info_tool(query: str) -> str:
383
  """
384
+ (Section 2) The definitive tool for information on **specific vaccine-preventable diseases**.
385
+ Use this to find details on **symptoms, transmission methods, complications**, and prevention
386
+ strategies for diseases like Diphtheria, Measles, Polio, Tetanus, etc.
 
 
 
387
 
388
  Args:
389
  query (str): A question about a disease covered by the national vaccination program.
 
391
  Returns:
392
  str: Disease-specific content from Section 2.
393
  """
394
+ print(f"🦠 DISEASE INFO (S2) TOOL CALLED: {query[:50]}...")
395
  if not section_retrievers.get('two'):
396
+ return "Section 2 retriever not available"
397
+ return section_tool_wrapper(section_retrievers['two'], section_paths['two'], query)
 
 
 
 
 
398
 
399
+ def vaccine_properties_tool(query: str) -> str:
400
  """
401
+ (Section 3) The specific tool for questions about the **vaccines themselves**: their types (e.g., BCG, ROR,
402
+ DTCaVPI), composition, whether they are live or inactivated, and the correct **method of administration**
403
+ (e.g., intradermal, intramuscular, oral).
 
 
 
404
 
405
  Args:
406
+ query (str): A question about a vaccine's formulation or how it is administered.
407
 
408
  Returns:
409
+ str: Vaccine-specific info from Section 3.
410
  """
411
+ print(f"💉 VACCINE PROPERTIES (S3) TOOL CALLED: {query[:50]}...")
412
  if not section_retrievers.get('three'):
413
+ return "Section 3 retriever not available"
414
+ return section_tool_wrapper(section_retrievers['three'], section_paths['three'], query)
 
 
 
 
 
415
 
416
+ def catch_up_vaccination_tool(query: str) -> str:
417
  """
418
+ (Section 4) Specialized tool for **missed or delayed vaccinations (rattrapage vaccinal)**.
419
+ Use this for questions about creating a **catch-up schedule** for a child who is behind
420
+ on their shots, based on their age and vaccination history.
 
 
 
421
 
422
  Args:
423
+ query (str): A question about catch-up vaccination due to a delay or missed dose.
424
 
425
  Returns:
426
+ str: Catch-up schedule guidance from Section 4.
427
  """
428
+ print(f"🔄 CATCH-UP (S4) TOOL CALLED: {query[:50]}...")
429
  if not section_retrievers.get('four'):
430
+ return "Section 4 retriever not available"
431
+ return section_tool_wrapper(section_retrievers['four'], section_paths['four'], query)
 
 
 
 
 
432
 
433
+ def special_populations_tool(query: str) -> str:
434
  """
435
+ (Section 5) The designated tool for vaccination guidelines concerning **special populations**.
436
+ Use for questions about vaccinating preterm infants, allergic children, or patients with
437
+ immunosuppression, chronic illnesses (cardiac, pulmonary), or other specific health conditions.
 
 
 
438
 
439
  Args:
440
+ query (str): A question about tailored vaccination for a vulnerable or special group.
441
 
442
  Returns:
443
  str: Custom recommendations from Section 5.
444
  """
445
+ print(f"👥 SPECIAL POPULATIONS (S5) TOOL CALLED: {query[:50]}...")
446
  if not section_retrievers.get('five'):
447
+ return "Section 5 retriever not available"
448
+ return section_tool_wrapper(section_retrievers['five'], section_paths['five'], query)
 
 
 
 
 
449
 
450
+ def cold_chain_tool(query: str) -> str:
451
  """
452
+ (Section 6) The definitive tool for all questions about the **cold chain**, including vaccine **storage
453
+ temperatures**, transport protocols, refrigerators, temperature monitoring (like PCV pastilles),
454
+ and procedures for handling cold chain failures or power outages.
 
 
 
455
 
456
  Args:
457
  query (str): A logistics-related question about vaccine temperature management.
 
459
  Returns:
460
  str: Cold chain instructions from Section 6.
461
  """
462
+ print(f"❄️ COLD CHAIN (S6) TOOL CALLED: {query[:50]}...")
463
  if not section_retrievers.get('six'):
464
+ return "Section 6 retriever not available"
465
+ return section_tool_wrapper(section_retrievers['six'], section_paths['six'], query)
 
 
 
 
 
466
 
467
+ def injection_safety_tool(query: str) -> str:
468
  """
469
+ (Section 7) The primary tool for questions related to the **safe administration of injections**.
470
+ Use for topics like sterile equipment, proper injection techniques, preventing needlestick injuries,
471
+ and safe disposal of medical waste (DASRI).
 
 
 
472
 
473
  Args:
474
+ query (str): A question about how to perform vaccine injections safely.
475
 
476
  Returns:
477
  str: Best practices from Section 7.
478
  """
479
+ print(f"🛡️ INJECTION SAFETY (S7) TOOL CALLED: {query[:50]}...")
480
  if not section_retrievers.get('seven'):
481
+ return "Section 7 retriever not available"
482
+ return section_tool_wrapper(section_retrievers['seven'], section_paths['seven'], query)
 
 
 
 
 
483
 
484
+ def session_management_tool(query: str) -> str:
485
  """
486
+ (Section 8) Use this tool for questions about the **operational conduct of a vaccination session**
487
+ and **vaccinovigilance**. This includes preparing the session, material setup, registering vaccination
488
+ acts, and monitoring/reporting adverse events post-vaccination (MPVI).
 
 
 
489
 
490
  Args:
491
+ query (str): A question about running a vaccination session or post-vaccine monitoring.
492
 
493
  Returns:
494
  str: Workflow and safety monitoring details from Section 8.
495
  """
496
+ print(f"📊 SESSION MGMT (S8) TOOL CALLED: {query[:50]}...")
497
  if not section_retrievers.get('eight'):
498
+ return "Section 8 retriever not available"
499
+ return section_tool_wrapper(section_retrievers['eight'], section_paths['eight'], query)
 
 
 
 
 
500
 
501
+ def planning_and_logistics_tool(query: str) -> str:
502
  """
503
+ (Section 9) This tool is for **planning vaccination sessions and managing logistics**. Use it for
504
+ questions about creating operational maps, estimating vaccine and supply needs, managing stock,
505
+ and reducing vaccine wastage.
 
 
 
506
 
507
  Args:
508
+ query (str): A question about organizing vaccination services or managing stock.
509
 
510
  Returns:
511
  str: Planning and stock guidance from Section 9.
512
  """
513
+ print(f"📅 PLANNING & LOGISTICS (S9) TOOL CALLED: {query[:50]}...")
514
  if not section_retrievers.get('nine'):
515
+ return "Section 9 retriever not available"
516
+ return section_tool_wrapper(section_retrievers['nine'], section_paths['nine'], query)
 
 
 
 
 
517
 
518
+ def communication_tool(query: str) -> str:
519
  """
520
+ (Section 10) The specific tool for **social mobilization and communication**. Use this for
521
+ questions about communication strategies, addressing **vaccine hesitancy**, managing rumors,
522
+ and community outreach to promote vaccination.
 
 
 
523
 
524
  Args:
525
  query (str): A question about public engagement or communication for vaccination.
 
527
  Returns:
528
  str: Public mobilization strategies from Section 10.
529
  """
530
+ print(f"📢 COMMUNICATION (S10) TOOL CALLED: {query[:50]}...")
531
  if not section_retrievers.get('ten'):
532
+ return "Section 10 retriever not available"
533
+ return section_tool_wrapper(section_retrievers['ten'], section_paths['ten'], query)
 
 
 
 
 
534
 
535
+ # Create FunctionTool objects with new, clearer names
536
  tools = [
537
+ FunctionTool.from_defaults(name="general_guide_tool", fn=general_guide_tool),
538
+ FunctionTool.from_defaults(name="who_immunization_tool", fn=who_immunization_tool),
539
  # Section-specific tools
540
+ FunctionTool.from_defaults(name="program_overview_tool", fn=program_overview_tool),
541
+ FunctionTool.from_defaults(name="disease_info_tool", fn=disease_info_tool),
542
+ FunctionTool.from_defaults(name="vaccine_properties_tool", fn=vaccine_properties_tool),
543
+ FunctionTool.from_defaults(name="catch_up_vaccination_tool", fn=catch_up_vaccination_tool),
544
+ FunctionTool.from_defaults(name="special_populations_tool", fn=special_populations_tool),
545
+ FunctionTool.from_defaults(name="cold_chain_tool", fn=cold_chain_tool),
546
+ FunctionTool.from_defaults(name="injection_safety_tool", fn=injection_safety_tool),
547
+ FunctionTool.from_defaults(name="session_management_tool", fn=session_management_tool),
548
+ FunctionTool.from_defaults(name="planning_and_logistics_tool", fn=planning_and_logistics_tool),
549
+ FunctionTool.from_defaults(name="communication_tool", fn=communication_tool),
550
  ]
551
+
552
+ print(f"✅ Created {len(tools)} tools with improved routing descriptions")
553
  return tools
554
 
555
  def prepare_environment():
rag_pipeline.py CHANGED
@@ -121,7 +121,7 @@ You provide evidence-based guidance using only information from official vaccine
121
  Answer the doctor's question accurately and concisely using only the provided information.
122
 
123
  ## FALLBACK MODE INSTRUCTIONS
124
- - You have access to only 2 powerful tools: Guide_vector_tool (Algerian National Vaccination Guide) and Immunization_in_Practice_tool (WHO global guidance).
125
  - **MANDATORY TOOL USAGE**: Always use the relevant tool(s) to search for information before answering, even if you initially think no information is available.
126
  - Be direct and efficient - search once with each tool if needed, then provide your answer.
127
  - Do not overthink or search repeatedly - these tools are comprehensive.
@@ -132,7 +132,7 @@ Answer the doctor's question accurately and concisely using only the provided in
132
  1. For each fact in your response, include an inline citation in the format [Source ID] immediately following the information, e.g., [e795ebd28318886c0b1a5395ac30ad90].
133
  2. The Source ID must be the exact alphanumeric identifier from the search results, NOT the tool name or any other text.
134
  3. Do NOT use 'Source:' in the citation format; use only the Source ID in square brackets.
135
- 4. Do NOT use tool names (like Guide_vector_tool, Immunization_in_Practice_tool) as citations.
136
  5. If a fact is supported by multiple sources, use adjacent citations: [e795ebd28318886c0b1a5395ac30ad90][21a932b2340bb16707763f57f0ad2]
137
  6. Use ONLY the provided information from tool outputs and never include facts from your general knowledge.
138
 
@@ -146,20 +146,18 @@ Answer the doctor's question accurately and concisely using only the provided in
146
 
147
  ### CRITICAL: Efficient Fallback Strategy
148
  1. **MANDATORY SEARCH**: Use each relevant tool at least once to search for information, even if you suspect the information might not be available.
149
- 2. **BREAK DOWN COMPLEX QUERIES**: For comparative or multi-part questions (e.g., comparing Algerian and WHO guidelines), break the query into sub-queries and use the appropriate tool for each part:
150
- - Use Guide_vector_tool for Algerian-specific information (e.g., national schedules, coverage targets).
151
- - Use Immunization_in_Practice_tool for WHO-specific information (e.g., global recommendations, coverage targets).
152
  3. **DO NOT STOP PREMATURELY**: Do not conclude "no information is available" without using the relevant tool(s) to search for the answer.
153
  4. **BE DECISIVE**: Once you find relevant information for each sub-query, formulate your response immediately.
154
  5. **ANSWER FULLY**: Address all parts of the question, using multiple tools if required by the query.
 
155
 
156
  ### Response Guidelines
157
  - **MANDATORY TOOL SELECTION**:
158
- - For queries mentioning "WHO," "World Health Organization," "international," "global guidance," or WHO documents (e.g., page numbers), use Immunization_in_Practice_tool first.
159
- - For queries mentioning "Algerian," "national guide," or Algerian-specific terms (e.g., page numbers), use Guide_vector_tool first.
160
- - For comparative queries (e.g., Algerian vs. WHO), use both Guide_vector_tool and Immunization_in_Practice_tool, addressing each part systematically.
161
  - **EXPLICIT REASONING**: Before answering, log your reasoning steps, including which tools you will use and why, based on the query’s content.
162
- - **Query Decomposition**: Break comparative or multi-part queries into sub-queries (e.g., one for Algerian information, one for WHO information) and use the appropriate tool for each.
163
  - Provide all found information with proper citations using Source IDs only.
164
  - If information is limited, clearly state: "Based on the available documents, I can provide the following information..." and indicate what is not available.
165
 
@@ -178,7 +176,7 @@ Answer the doctor's question accurately and concisely using only the provided in
178
  1. For each fact in your response, include an inline citation in the format [Source ID] immediately following the information, e.g., [e795ebd28318886c0b1a5395ac30ad90].
179
  2. The Source ID must be the exact alphanumeric identifier from the search results, NOT the tool name or any other text.
180
  3. Do NOT use 'Source:' in the citation format; use only the Source ID in square brackets.
181
- 4. Do NOT use tool names (like Guide_vector_tool, Immunization_in_Practice_tool) as citations.
182
  5. If a fact is supported by multiple sources, use adjacent citations: [e795ebd28318886c0b1a5395ac30ad90][21a932b2340bb16707763f57f0ad2]
183
  6. Use ONLY the provided information from tool outputs and never include facts from your general knowledge.
184
 
@@ -193,28 +191,23 @@ Answer the doctor's question accurately and concisely using only the provided in
193
  ### CRITICAL: Efficient Response Strategy
194
  1. **MANDATORY SEARCH**: Always use the relevant tool(s) to search for information before answering, even if you initially think no information is available.
195
  2. **MANDATORY TOOL SELECTION**:
196
- - For queries mentioning "WHO," "World Health Organization," "international," "global guidance," or WHO documents (e.g., page numbers), use Immunization_in_Practice_tool first.
197
- - For queries mentioning "Algerian," "national guide," or Algerian-specific terms (e.g., page numbers), use Guide_vector_tool first.
198
- - For comparative queries (e.g., Algerian vs. WHO), use both Guide_vector_tool and Immunization_in_Practice_tool, addressing each part systematically.
199
- 3. **Query Decomposition**: Break comparative or multi-part queries into sub-queries (e.g., one for Algerian information, one for WHO information) and use the appropriate tool for each.
200
  4. **DO NOT STOP PREMATURELY**: Do not conclude "no information is available" without using the relevant tool(s) to search for the answer.
201
- 5. **EXPLICIT REASONING**: Before answering, log your reasoning steps, including which tools you will use and why, based on the query’s content.
202
- 6. **BE DECISIVE**: Once you find relevant information for each sub-query, formulate your response immediately.
203
- 7. **ANSWER FULLY**: Address all parts of the question, using multiple tools if required by the query.
204
- 8. **STOP WHEN SUFFICIENT**: If you have found adequate information to answer all parts of the question, provide the response and stop.
 
 
 
205
 
206
  ### Response Guidelines for Complex Questions
207
- - For comparative questions: Break the query into sub-queries (e.g., Algerian vs. WHO), use Guide_vector_tool for Algerian specifics and Immunization_in_Practice_tool for WHO specifics, then provide the comparison.
208
  - For multi-part questions: Address each part systematically, using the appropriate tool for each sub-query.
209
  - If information is not found after using the relevant tool(s): State clearly: "Based on the available documents, I can provide the following information..." and specify what is not available.
210
- - Do not repeatedly search for the same terms or rephrase searches excessively.
211
-
212
- ### When Information is Limited
213
- If you cannot find complete information to fully answer a question:
214
- 1. Provide whatever relevant information you did find with proper citations using Source IDs only.
215
- 2. Clearly state: "Based on the available documents, I can provide the following information..."
216
- 3. Indicate what specific information is not available: "However, information about [specific topic] was not found in the provided documents after searching with the relevant tool(s)."
217
- 4. Do not conclude "no information is available" without attempting a search with the appropriate tool(s).
218
 
219
  ---
220
  """
@@ -243,11 +236,13 @@ If you cannot find complete information to fully answer a question:
243
  print(f"[LOG] ⚠️ Using fallback prompt template for {'fallback' if is_fallback else 'standard'} agent")
244
  return PromptTemplate(template=safe_template)
245
 
 
246
  def create_agent(tools, llm, is_fallback=False):
247
  """Create the ReAct agent with custom prompt"""
248
 
249
  agent_type = "FALLBACK" if is_fallback else "STANDARD"
250
- max_iter = 3 if is_fallback else 8
 
251
 
252
  print(f"[LOG] Creating {agent_type} ReAct agent with {len(tools)} tools and max_iterations={max_iter}")
253
 
@@ -256,14 +251,14 @@ def create_agent(tools, llm, is_fallback=False):
256
  tools,
257
  llm=llm,
258
  verbose=True,
259
- max_iterations=max_iter, # Reduced iterations for fallback agent
260
  )
261
 
262
- # Create and apply appropriate custom prompt
263
  try:
264
  safe_custom_prompt = create_safe_custom_prompt(tools, llm, is_fallback=is_fallback)
265
  agent.update_prompts({"agent_worker:system_prompt": safe_custom_prompt})
266
- print(f"✅ Successfully updated {agent_type} agent with custom prompt")
267
  except Exception as e:
268
  print(f"❌ {agent_type} agent prompt update failed: {e}")
269
  print(f"⚠️ Using original {agent_type} agent without modifications")
@@ -273,16 +268,16 @@ def create_agent(tools, llm, is_fallback=False):
273
 
274
 
275
  def create_fallback_tools(all_tools):
276
- """Extract only the guide_retrieval_tool and immunization_tool for fallback agent"""
277
 
278
- print("[LOG] Creating fallback tools (guide + immunization only)")
279
 
280
  fallback_tools = []
281
  tool_names_found = []
282
 
283
  for tool in all_tools:
284
  tool_name = tool.metadata.name if hasattr(tool, 'metadata') else str(tool)
285
- if tool_name in ["Guide_vector_tool", "Immunization_in_Practice_tool"]:
286
  fallback_tools.append(tool)
287
  tool_names_found.append(tool_name)
288
 
@@ -333,7 +328,14 @@ def initialize_rag_pipeline(tools):
333
 
334
 
335
  def detect_max_iterations_error(response_text):
336
- """Detect if the response indicates a max iterations error"""
 
 
 
 
 
 
 
337
 
338
  max_iteration_indicators = [
339
  "max iterations",
@@ -343,11 +345,10 @@ def detect_max_iterations_error(response_text):
343
  "iteration limit"
344
  ]
345
 
346
- response_lower = response_text.lower()
347
-
348
- # Check for max iterations indicators
349
  for indicator in max_iteration_indicators:
350
  if indicator in response_lower:
 
351
  return True
352
 
353
  # Check for very short or empty responses (often indicates failure)
@@ -388,7 +389,7 @@ def process_question(agents_dict, question: str) -> str:
388
 
389
  # Check if we need to use fallback
390
  if detect_max_iterations_error(response_text):
391
- print("[LOG] 🔄 Max iterations detected, switching to FALLBACK AGENT...")
392
 
393
  if fallback_agent is None:
394
  print("[LOG] ❌ Fallback agent not available, returning error message")
@@ -418,7 +419,7 @@ def process_question(agents_dict, question: str) -> str:
418
 
419
  # Check if fallback also failed
420
  if detect_max_iterations_error(fallback_text):
421
- print("[LOG] ❌ Fallback agent also hit max iterations")
422
  return ("I apologize, but I'm having difficulty finding specific information about your question in the available documents. "
423
  "Please try asking a more specific question or rephrasing your query.")
424
 
@@ -500,16 +501,17 @@ def process_question_with_sequential_citations(agents_dict, question: str, chunk
500
  print(f"[LOG] Chunks directory: {chunks_directory}")
501
  start_time = time.time()
502
 
503
- used_fallback = False
504
 
505
  try:
506
  # Get the response using the enhanced process_question function
507
  response_text = process_question(agents_dict, question)
508
 
509
- # Check if this looks like a fallback was used (simple heuristic)
510
- if "fallback" in response_text.lower() or len(response_text) < 50:
 
511
  used_fallback = True
512
- print("[LOG] 🛡️ Fallback agent was likely used")
513
 
514
  agent_time = time.time() - start_time
515
  print(f"[LOG] Agent processing completed in {agent_time:.2f} seconds")
@@ -533,6 +535,10 @@ def process_question_with_sequential_citations(agents_dict, question: str, chunk
533
 
534
  for json_file in min_chunks_files:
535
  json_path = os.path.join(chunks_directory, json_file)
 
 
 
 
536
  print(f"[LOG] Loading {json_file}...")
537
  try:
538
  with open(json_path, "r", encoding="utf-8") as f:
@@ -548,23 +554,23 @@ def process_question_with_sequential_citations(agents_dict, question: str, chunk
548
  print("[LOG] Finding cited elements...")
549
  cited_elements_ordered = []
550
  for i, source_id in enumerate(unique_ids): # This preserves the order
551
- print(f"[LOG] Looking for source ID {i+1}/{len(unique_ids)}: {source_id}")
552
  found = False
553
  for element in all_chunks_data:
 
554
  if element.get("type") == 'TableElement':
555
- if element.get("elements",{}).get("element_id") == source_id:
556
- cited_elements_ordered.append(element.get("elements",{}))
557
  found = True
558
  break
559
- else:
560
- if "elements" in element:
561
- for nested_element in element["elements"]:
562
- if nested_element.get("element_id") == source_id:
563
- cited_elements_ordered.append(nested_element)
564
- found = True
565
- break
566
- else:
567
- continue
568
  break
569
  if not found:
570
  print(f"[LOG] ⚠️ Source ID {source_id} not found in chunks data")
 
121
  Answer the doctor's question accurately and concisely using only the provided information.
122
 
123
  ## FALLBACK MODE INSTRUCTIONS
124
+ - You have access to only 2 powerful tools: general_guide_tool (Algerian National Vaccination Guide) and who_immunization_tool (WHO global guidance).
125
  - **MANDATORY TOOL USAGE**: Always use the relevant tool(s) to search for information before answering, even if you initially think no information is available.
126
  - Be direct and efficient - search once with each tool if needed, then provide your answer.
127
  - Do not overthink or search repeatedly - these tools are comprehensive.
 
132
  1. For each fact in your response, include an inline citation in the format [Source ID] immediately following the information, e.g., [e795ebd28318886c0b1a5395ac30ad90].
133
  2. The Source ID must be the exact alphanumeric identifier from the search results, NOT the tool name or any other text.
134
  3. Do NOT use 'Source:' in the citation format; use only the Source ID in square brackets.
135
+ 4. Do NOT use tool names (like general_guide_tool, who_immunization_tool) as citations.
136
  5. If a fact is supported by multiple sources, use adjacent citations: [e795ebd28318886c0b1a5395ac30ad90][21a932b2340bb16707763f57f0ad2]
137
  6. Use ONLY the provided information from tool outputs and never include facts from your general knowledge.
138
 
 
146
 
147
  ### CRITICAL: Efficient Fallback Strategy
148
  1. **MANDATORY SEARCH**: Use each relevant tool at least once to search for information, even if you suspect the information might not be available.
149
+ 2. **BREAK DOWN COMPLEX QUERIES**: For comparative or multi-part questions (e.g., comparing Algerian and WHO guidelines), break the query into sub-queries and use the appropriate tool for each part.
 
 
150
  3. **DO NOT STOP PREMATURELY**: Do not conclude "no information is available" without using the relevant tool(s) to search for the answer.
151
  4. **BE DECISIVE**: Once you find relevant information for each sub-query, formulate your response immediately.
152
  5. **ANSWER FULLY**: Address all parts of the question, using multiple tools if required by the query.
153
+ 6. **FINAL ANSWER**: Once you have your answer, present it directly. Do not output your internal 'thought' or 'action' steps. Your final output must be the synthesized answer itself.
154
 
155
  ### Response Guidelines
156
  - **MANDATORY TOOL SELECTION**:
157
+ - For queries mentioning "WHO," "World Health Organization," "international," "global guidance," or WHO documents, use who_immunization_tool first.
158
+ - For queries mentioning "Algerian," "national guide," or Algerian-specific terms, use general_guide_tool first.
159
+ - For comparative queries (e.g., Algerian vs. WHO), use both tools, addressing each part systematically.
160
  - **EXPLICIT REASONING**: Before answering, log your reasoning steps, including which tools you will use and why, based on the query’s content.
 
161
  - Provide all found information with proper citations using Source IDs only.
162
  - If information is limited, clearly state: "Based on the available documents, I can provide the following information..." and indicate what is not available.
163
 
 
176
  1. For each fact in your response, include an inline citation in the format [Source ID] immediately following the information, e.g., [e795ebd28318886c0b1a5395ac30ad90].
177
  2. The Source ID must be the exact alphanumeric identifier from the search results, NOT the tool name or any other text.
178
  3. Do NOT use 'Source:' in the citation format; use only the Source ID in square brackets.
179
+ 4. Do NOT use tool names (like general_guide_tool, cold_chain_tool) as citations.
180
  5. If a fact is supported by multiple sources, use adjacent citations: [e795ebd28318886c0b1a5395ac30ad90][21a932b2340bb16707763f57f0ad2]
181
  6. Use ONLY the provided information from tool outputs and never include facts from your general knowledge.
182
 
 
191
  ### CRITICAL: Efficient Response Strategy
192
  1. **MANDATORY SEARCH**: Always use the relevant tool(s) to search for information before answering, even if you initially think no information is available.
193
  2. **MANDATORY TOOL SELECTION**:
194
+ - For queries about global standards or WHO, use who_immunization_tool.
195
+ - For broad questions about the Algerian guide, use general_guide_tool.
196
+ - For specific topics like cold chain, disease info, etc., use the most specific tool (e.g., cold_chain_tool, disease_info_tool).
197
+ 3. **Query Decomposition**: Break comparative or multi-part queries into sub-queries and use the appropriate tool for each.
198
  4. **DO NOT STOP PREMATURELY**: Do not conclude "no information is available" without using the relevant tool(s) to search for the answer.
199
+ 5. **EXPLICIT REASONING**: Before answering, log your reasoning steps, including which tools you will use and why.
200
+ 6. **BE DECISIVE**: Once you find relevant information, formulate your response.
201
+
202
+ ### Final Answer Generation
203
+ - **STOP WHEN SUFFICIENT**: Once you have gathered enough information from the tools to answer the user's question completely, you MUST stop using tools and formulate a final answer.
204
+ - **SYNTHESIZE THE ANSWER**: Formulate a comprehensive, final answer based ONLY on the observed tool outputs.
205
+ - **PRESENT CLEANLY**: Present this final answer directly to the user. Your final output must be the answer itself, not your internal 'thought' or 'action' steps.
206
 
207
  ### Response Guidelines for Complex Questions
208
+ - For comparative questions: Break the query into sub-queries, use the appropriate tools, then provide the comparison.
209
  - For multi-part questions: Address each part systematically, using the appropriate tool for each sub-query.
210
  - If information is not found after using the relevant tool(s): State clearly: "Based on the available documents, I can provide the following information..." and specify what is not available.
 
 
 
 
 
 
 
 
211
 
212
  ---
213
  """
 
236
  print(f"[LOG] ⚠️ Using fallback prompt template for {'fallback' if is_fallback else 'standard'} agent")
237
  return PromptTemplate(template=safe_template)
238
 
239
+
240
  def create_agent(tools, llm, is_fallback=False):
241
  """Create the ReAct agent with custom prompt"""
242
 
243
  agent_type = "FALLBACK" if is_fallback else "STANDARD"
244
+ # **FIX**: Increased max_iterations to give the agent more steps to reason
245
+ max_iter = 15
246
 
247
  print(f"[LOG] Creating {agent_type} ReAct agent with {len(tools)} tools and max_iterations={max_iter}")
248
 
 
251
  tools,
252
  llm=llm,
253
  verbose=True,
254
+ max_iterations=max_iter,
255
  )
256
 
257
+ # Create and apply safe custom prompt
258
  try:
259
  safe_custom_prompt = create_safe_custom_prompt(tools, llm, is_fallback=is_fallback)
260
  agent.update_prompts({"agent_worker:system_prompt": safe_custom_prompt})
261
+ print(f"✅ Successfully updated {agent_type} agent with safe custom prompt")
262
  except Exception as e:
263
  print(f"❌ {agent_type} agent prompt update failed: {e}")
264
  print(f"⚠️ Using original {agent_type} agent without modifications")
 
268
 
269
 
270
  def create_fallback_tools(all_tools):
271
+ """Extract only the general_guide_tool and who_immunization_tool for fallback agent"""
272
 
273
+ print("[LOG] Creating fallback tools (guide + WHO only)")
274
 
275
  fallback_tools = []
276
  tool_names_found = []
277
 
278
  for tool in all_tools:
279
  tool_name = tool.metadata.name if hasattr(tool, 'metadata') else str(tool)
280
+ if tool_name in ["general_guide_tool", "who_immunization_tool"]:
281
  fallback_tools.append(tool)
282
  tool_names_found.append(tool_name)
283
 
 
328
 
329
 
330
  def detect_max_iterations_error(response_text):
331
+ """Detect if the response indicates a max iterations error OR is an unfinished thought."""
332
+
333
+ response_lower = response_text.lower().strip()
334
+
335
+ # **FIX**: Check if the response is the agent's raw thought process.
336
+ if response_lower.startswith("a:```thought") or response_lower.startswith("```thought"):
337
+ print("[LOG] Detected unfinished agent thought process.")
338
+ return True
339
 
340
  max_iteration_indicators = [
341
  "max iterations",
 
345
  "iteration limit"
346
  ]
347
 
348
+ # Check for explicit max iterations indicators
 
 
349
  for indicator in max_iteration_indicators:
350
  if indicator in response_lower:
351
+ print(f"[LOG] Detected max iteration indicator: '{indicator}'")
352
  return True
353
 
354
  # Check for very short or empty responses (often indicates failure)
 
389
 
390
  # Check if we need to use fallback
391
  if detect_max_iterations_error(response_text):
392
+ print("[LOG] 🔄 Max iterations or unfinished thought detected, switching to FALLBACK AGENT...")
393
 
394
  if fallback_agent is None:
395
  print("[LOG] ❌ Fallback agent not available, returning error message")
 
419
 
420
  # Check if fallback also failed
421
  if detect_max_iterations_error(fallback_text):
422
+ print("[LOG] ❌ Fallback agent also hit max iterations or failed to produce an answer.")
423
  return ("I apologize, but I'm having difficulty finding specific information about your question in the available documents. "
424
  "Please try asking a more specific question or rephrasing your query.")
425
 
 
501
  print(f"[LOG] Chunks directory: {chunks_directory}")
502
  start_time = time.time()
503
 
504
+ used_fallback = False # This flag is a heuristic
505
 
506
  try:
507
  # Get the response using the enhanced process_question function
508
  response_text = process_question(agents_dict, question)
509
 
510
+ # Check if fallback was likely used (simple heuristic based on logs)
511
+ # A more robust way would be for `process_question` to return a tuple (response, used_fallback)
512
+ if "switching to fallback agent" in response_text.lower():
513
  used_fallback = True
514
+ print("[LOG] 🛡️ Fallback agent was likely used based on log indicators.")
515
 
516
  agent_time = time.time() - start_time
517
  print(f"[LOG] Agent processing completed in {agent_time:.2f} seconds")
 
535
 
536
  for json_file in min_chunks_files:
537
  json_path = os.path.join(chunks_directory, json_file)
538
+ if not os.path.exists(json_path):
539
+ print(f"[LOG] ⚠️ Skipping non-existent file: {json_path}")
540
+ continue
541
+
542
  print(f"[LOG] Loading {json_file}...")
543
  try:
544
  with open(json_path, "r", encoding="utf-8") as f:
 
554
  print("[LOG] Finding cited elements...")
555
  cited_elements_ordered = []
556
  for i, source_id in enumerate(unique_ids): # This preserves the order
557
+ # print(f"[LOG] Looking for source ID {i+1}/{len(unique_ids)}: {source_id}") # This is too verbose for normal operation
558
  found = False
559
  for element in all_chunks_data:
560
+ # Handle TableElement structure
561
  if element.get("type") == 'TableElement':
562
+ if element.get("elements", {}).get("element_id") == source_id:
563
+ cited_elements_ordered.append(element.get("elements", {}))
564
  found = True
565
  break
566
+ # Handle other element structures
567
+ elif "elements" in element and isinstance(element["elements"], list):
568
+ for nested_element in element["elements"]:
569
+ if isinstance(nested_element, dict) and nested_element.get("element_id") == source_id:
570
+ cited_elements_ordered.append(nested_element)
571
+ found = True
572
+ break
573
+ if found:
 
574
  break
575
  if not found:
576
  print(f"[LOG] ⚠️ Source ID {source_id} not found in chunks data")