Peterase commited on
Commit
8104246
Β·
1 Parent(s): 03c5a91

feat: numbered citations [1][2][3], follow-up questions, focus modes

Browse files

- LLM prompt now outputs [1][2][3] numbered citations instead of verbose inline sources
- Each source gets a citation_index attached before being sent to frontend
- LLM generates 3 follow-up questions at end of every answer (FOLLOW_UP: q1 | q2 | q3)
- Parser strips FOLLOW_UP block from answer and returns as follow_up_questions array
- Both execute_chat and execute_stream updated consistently

Version: 2.7

src/core/use_cases/rag_chat_use_case.py CHANGED
@@ -704,6 +704,22 @@ JSON:"""
704
  # context_text = f"{trend_text}\n\nRetrieved Search Context:\n{context_text}"
705
  # except: pass
706
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
707
  prompt = f"""You are ARKI AI, a real-time news assistant. Today's date is {datetime.utcnow().strftime("%B %d, %Y")}.
708
 
709
  STRICT RULES β€” READ CAREFULLY BEFORE ANSWERING:
@@ -717,7 +733,7 @@ STEP 2 β€” EVALUATE THE SOURCES:
717
  Read the News Context below and determine:
718
 
719
  A) DIRECT MATCH β€” Sources directly answer the question:
720
- β†’ Provide a comprehensive answer with citations
721
  β†’ Synthesize information from multiple sources
722
  β†’ Use numbered points with **bold** headlines
723
 
@@ -725,7 +741,6 @@ B) RELATED INFORMATION β€” Sources have related but not exact information:
725
  β†’ Acknowledge what you found: "I found articles about [related topic]"
726
  β†’ Explain the gap: "but not specifically about [exact query]"
727
  β†’ Provide the related information anyway (it may still be helpful)
728
- β†’ Suggest: "Would you like to know about [related topic] instead?"
729
 
730
  C) NO RELEVANT INFORMATION β€” Sources are completely unrelated:
731
  β†’ Say clearly: "I couldn't find relevant news on that topic in today's feed."
@@ -733,12 +748,17 @@ C) NO RELEVANT INFORMATION β€” Sources are completely unrelated:
733
 
734
  STEP 3 β€” ANSWER RULES:
735
  1. Use ONLY facts from the News Context below. NEVER use training data or general knowledge.
736
- 2. CITATIONS: After EVERY fact, add inline citation: "β€” Source: name" using the exact name from the [Source:] tag.
737
  3. Prioritize high-authority sources (BBC, Reuters, Al Jazeera, The Guardian) over others.
738
- 4. Non-English articles β€” translate content to English, note language: "β€” Source: Al Jazeera (Arabic)".
739
  5. Always respond in English. No hedging. No "based on my knowledge."
740
  6. Be helpful and flexible β€” if exact match not found, offer related information.
 
 
 
741
 
 
 
742
  News Context (from live multilingual database):
743
  {context_text}
744
 
@@ -749,7 +769,17 @@ User Question: {request.query}
749
 
750
  Answer:"""
751
 
752
- answer = self.llm.generate(prompt)
 
 
 
 
 
 
 
 
 
 
753
  retrieved_ids = [str(doc.get("doc_id")) for doc in final_sources]
754
 
755
  self.chat_history_db.save_interaction(session_id, request.query, answer, retrieved_ids)
@@ -759,10 +789,15 @@ Answer:"""
759
  doc.get("source_type") == "live" or doc.get("is_live")
760
  for doc in final_sources
761
  )
762
-
 
 
 
 
763
  result = {
764
  "answer": answer,
765
  "sources": final_sources,
 
766
  "session_id": session_id,
767
  "debug": {
768
  "search_query": request.query,
@@ -798,6 +833,22 @@ Answer:"""
798
  request.query, request.top_k, request.source_filter, request.language_filter, getattr(request, 'days_back', None)
799
  )
800
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
801
  prompt_stream = f"""You are ARKI AI, a real-time news assistant. Today's date is {datetime.utcnow().strftime("%B %d, %Y")}.
802
 
803
  STRICT RULES β€” READ CAREFULLY BEFORE ANSWERING:
@@ -811,7 +862,7 @@ STEP 2 β€” EVALUATE THE SOURCES:
811
  Read the News Context below and determine:
812
 
813
  A) DIRECT MATCH β€” Sources directly answer the question:
814
- β†’ Provide a comprehensive answer with citations
815
  β†’ Synthesize information from multiple sources
816
  β†’ Use numbered points with **bold** headlines
817
 
@@ -819,7 +870,6 @@ B) RELATED INFORMATION β€” Sources have related but not exact information:
819
  β†’ Acknowledge what you found: "I found articles about [related topic]"
820
  β†’ Explain the gap: "but not specifically about [exact query]"
821
  β†’ Provide the related information anyway (it may still be helpful)
822
- β†’ Suggest: "Would you like to know about [related topic] instead?"
823
 
824
  C) NO RELEVANT INFORMATION β€” Sources are completely unrelated:
825
  β†’ Say clearly: "I couldn't find relevant news on that topic in today's feed."
@@ -827,12 +877,17 @@ C) NO RELEVANT INFORMATION β€” Sources are completely unrelated:
827
 
828
  STEP 3 β€” ANSWER RULES:
829
  1. Use ONLY facts from the News Context below. NEVER use training data or general knowledge.
830
- 2. CITATIONS: After EVERY fact, add inline citation: "β€” Source: name" using the exact name from the [Source:] tag.
831
  3. Prioritize high-authority sources (BBC, Reuters, Al Jazeera, The Guardian) over others.
832
- 4. Non-English articles β€” translate content to English, note language: "β€” Source: Al Jazeera (Arabic)".
833
  5. Always respond in English. No hedging. No "based on my knowledge."
834
  6. Be helpful and flexible β€” if exact match not found, offer related information.
 
 
 
835
 
 
 
836
  News Context (from live multilingual database):
837
  {context_text}
838
 
@@ -854,10 +909,20 @@ Answer:"""
854
  except:
855
  pass
856
 
 
 
 
 
 
 
 
 
 
857
  import json
858
  final_response = {
859
- "answer": full_answer,
860
  "sources": final_sources,
 
861
  "session_id": session_id
862
  }
863
  yield f"data: {json.dumps(final_response)}\n\n"
@@ -866,4 +931,4 @@ Answer:"""
866
  # Only persist history for authenticated users
867
  if not is_guest:
868
  retrieved_ids = [str(doc.get("doc_id")) for doc in final_sources]
869
- self.chat_history_db.save_interaction(session_id, request.query, full_answer, retrieved_ids, user_id=user_id)
 
704
  # context_text = f"{trend_text}\n\nRetrieved Search Context:\n{context_text}"
705
  # except: pass
706
 
707
+ # ── Build numbered source index for citations ─────────────────────────
708
+ # Each source gets a number [1], [2], [3]... so the LLM can cite by number
709
+ source_index_lines = ""
710
+ for idx, doc in enumerate(final_sources, 1):
711
+ meta = doc.get("metadata", {})
712
+ source_name = (
713
+ meta.get("source") or meta.get("title") or doc.get("source") or "Unknown"
714
+ )
715
+ search_lang = meta.get("_search_lang", "en")
716
+ if search_lang and search_lang != "en":
717
+ lang_label = SUPPORTED_LANGUAGES.get(search_lang, search_lang.upper())
718
+ source_label = f"{source_name} ({lang_label})"
719
+ else:
720
+ source_label = source_name
721
+ source_index_lines += f"[{idx}] {source_label}\n"
722
+
723
  prompt = f"""You are ARKI AI, a real-time news assistant. Today's date is {datetime.utcnow().strftime("%B %d, %Y")}.
724
 
725
  STRICT RULES β€” READ CAREFULLY BEFORE ANSWERING:
 
733
  Read the News Context below and determine:
734
 
735
  A) DIRECT MATCH β€” Sources directly answer the question:
736
+ β†’ Provide a comprehensive answer with numbered citations
737
  β†’ Synthesize information from multiple sources
738
  β†’ Use numbered points with **bold** headlines
739
 
 
741
  β†’ Acknowledge what you found: "I found articles about [related topic]"
742
  β†’ Explain the gap: "but not specifically about [exact query]"
743
  β†’ Provide the related information anyway (it may still be helpful)
 
744
 
745
  C) NO RELEVANT INFORMATION β€” Sources are completely unrelated:
746
  β†’ Say clearly: "I couldn't find relevant news on that topic in today's feed."
 
748
 
749
  STEP 3 β€” ANSWER RULES:
750
  1. Use ONLY facts from the News Context below. NEVER use training data or general knowledge.
751
+ 2. CITATIONS: After EVERY fact, cite using the source NUMBER like [1] or [2][3]. Use the Source Index below to match names to numbers.
752
  3. Prioritize high-authority sources (BBC, Reuters, Al Jazeera, The Guardian) over others.
753
+ 4. Non-English articles β€” translate content to English in your answer.
754
  5. Always respond in English. No hedging. No "based on my knowledge."
755
  6. Be helpful and flexible β€” if exact match not found, offer related information.
756
+ 7. At the END of your answer, on a new line, write exactly:
757
+ FOLLOW_UP: question1 | question2 | question3
758
+ These must be 3 short, specific follow-up questions the user might want to ask next, based on your answer.
759
 
760
+ Source Index:
761
+ {source_index_lines}
762
  News Context (from live multilingual database):
763
  {context_text}
764
 
 
769
 
770
  Answer:"""
771
 
772
+ raw_answer = self.llm.generate(prompt)
773
+
774
+ # ── Parse follow-up questions out of the answer ───────────────────────
775
+ follow_up_questions: List[str] = []
776
+ answer = raw_answer
777
+ if "FOLLOW_UP:" in raw_answer:
778
+ parts = raw_answer.split("FOLLOW_UP:", 1)
779
+ answer = parts[0].strip()
780
+ follow_up_raw = parts[1].strip()
781
+ follow_up_questions = [q.strip() for q in follow_up_raw.split("|") if q.strip()][:3]
782
+
783
  retrieved_ids = [str(doc.get("doc_id")) for doc in final_sources]
784
 
785
  self.chat_history_db.save_interaction(session_id, request.query, answer, retrieved_ids)
 
789
  doc.get("source_type") == "live" or doc.get("is_live")
790
  for doc in final_sources
791
  )
792
+
793
+ # ── Attach citation index to each source for frontend rendering ───────
794
+ for idx, doc in enumerate(final_sources, 1):
795
+ doc["citation_index"] = idx
796
+
797
  result = {
798
  "answer": answer,
799
  "sources": final_sources,
800
+ "follow_up_questions": follow_up_questions,
801
  "session_id": session_id,
802
  "debug": {
803
  "search_query": request.query,
 
833
  request.query, request.top_k, request.source_filter, request.language_filter, getattr(request, 'days_back', None)
834
  )
835
 
836
+ # ── Build numbered source index for citations ─────────────────────────
837
+ source_index_lines = ""
838
+ for idx, doc in enumerate(final_sources, 1):
839
+ meta = doc.get("metadata", {})
840
+ source_name = (
841
+ meta.get("source") or meta.get("title") or doc.get("source") or "Unknown"
842
+ )
843
+ search_lang = meta.get("_search_lang", "en")
844
+ if search_lang and search_lang != "en":
845
+ lang_label = SUPPORTED_LANGUAGES.get(search_lang, search_lang.upper())
846
+ source_label = f"{source_name} ({lang_label})"
847
+ else:
848
+ source_label = source_name
849
+ source_index_lines += f"[{idx}] {source_label}\n"
850
+ doc["citation_index"] = idx
851
+
852
  prompt_stream = f"""You are ARKI AI, a real-time news assistant. Today's date is {datetime.utcnow().strftime("%B %d, %Y")}.
853
 
854
  STRICT RULES β€” READ CAREFULLY BEFORE ANSWERING:
 
862
  Read the News Context below and determine:
863
 
864
  A) DIRECT MATCH β€” Sources directly answer the question:
865
+ β†’ Provide a comprehensive answer with numbered citations
866
  β†’ Synthesize information from multiple sources
867
  β†’ Use numbered points with **bold** headlines
868
 
 
870
  β†’ Acknowledge what you found: "I found articles about [related topic]"
871
  β†’ Explain the gap: "but not specifically about [exact query]"
872
  β†’ Provide the related information anyway (it may still be helpful)
 
873
 
874
  C) NO RELEVANT INFORMATION β€” Sources are completely unrelated:
875
  β†’ Say clearly: "I couldn't find relevant news on that topic in today's feed."
 
877
 
878
  STEP 3 β€” ANSWER RULES:
879
  1. Use ONLY facts from the News Context below. NEVER use training data or general knowledge.
880
+ 2. CITATIONS: After EVERY fact, cite using the source NUMBER like [1] or [2][3]. Use the Source Index below to match names to numbers.
881
  3. Prioritize high-authority sources (BBC, Reuters, Al Jazeera, The Guardian) over others.
882
+ 4. Non-English articles β€” translate content to English in your answer.
883
  5. Always respond in English. No hedging. No "based on my knowledge."
884
  6. Be helpful and flexible β€” if exact match not found, offer related information.
885
+ 7. At the END of your answer, on a new line, write exactly:
886
+ FOLLOW_UP: question1 | question2 | question3
887
+ These must be 3 short, specific follow-up questions the user might want to ask next, based on your answer.
888
 
889
+ Source Index:
890
+ {source_index_lines}
891
  News Context (from live multilingual database):
892
  {context_text}
893
 
 
909
  except:
910
  pass
911
 
912
+ # ── Parse follow-up questions out of the streamed answer ──────────────
913
+ follow_up_questions: List[str] = []
914
+ clean_answer = full_answer
915
+ if "FOLLOW_UP:" in full_answer:
916
+ parts = full_answer.split("FOLLOW_UP:", 1)
917
+ clean_answer = parts[0].strip()
918
+ follow_up_raw = parts[1].strip()
919
+ follow_up_questions = [q.strip() for q in follow_up_raw.split("|") if q.strip()][:3]
920
+
921
  import json
922
  final_response = {
923
+ "answer": clean_answer,
924
  "sources": final_sources,
925
+ "follow_up_questions": follow_up_questions,
926
  "session_id": session_id
927
  }
928
  yield f"data: {json.dumps(final_response)}\n\n"
 
931
  # Only persist history for authenticated users
932
  if not is_guest:
933
  retrieved_ids = [str(doc.get("doc_id")) for doc in final_sources]
934
+ self.chat_history_db.save_interaction(session_id, request.query, clean_answer, retrieved_ids, user_id=user_id)