Spaces:
Running
Running
feat: numbered citations [1][2][3], follow-up questions, focus modes
Browse files- LLM prompt now outputs [1][2][3] numbered citations instead of verbose inline sources
- Each source gets a citation_index attached before being sent to frontend
- LLM generates 3 follow-up questions at end of every answer (FOLLOW_UP: q1 | q2 | q3)
- Parser strips FOLLOW_UP block from answer and returns as follow_up_questions array
- Both execute_chat and execute_stream updated consistently
Version: 2.7
src/core/use_cases/rag_chat_use_case.py
CHANGED
|
@@ -704,6 +704,22 @@ JSON:"""
|
|
| 704 |
# context_text = f"{trend_text}\n\nRetrieved Search Context:\n{context_text}"
|
| 705 |
# except: pass
|
| 706 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 707 |
prompt = f"""You are ARKI AI, a real-time news assistant. Today's date is {datetime.utcnow().strftime("%B %d, %Y")}.
|
| 708 |
|
| 709 |
STRICT RULES β READ CAREFULLY BEFORE ANSWERING:
|
|
@@ -717,7 +733,7 @@ STEP 2 β EVALUATE THE SOURCES:
|
|
| 717 |
Read the News Context below and determine:
|
| 718 |
|
| 719 |
A) DIRECT MATCH β Sources directly answer the question:
|
| 720 |
-
β Provide a comprehensive answer with citations
|
| 721 |
β Synthesize information from multiple sources
|
| 722 |
β Use numbered points with **bold** headlines
|
| 723 |
|
|
@@ -725,7 +741,6 @@ B) RELATED INFORMATION β Sources have related but not exact information:
|
|
| 725 |
β Acknowledge what you found: "I found articles about [related topic]"
|
| 726 |
β Explain the gap: "but not specifically about [exact query]"
|
| 727 |
β Provide the related information anyway (it may still be helpful)
|
| 728 |
-
β Suggest: "Would you like to know about [related topic] instead?"
|
| 729 |
|
| 730 |
C) NO RELEVANT INFORMATION β Sources are completely unrelated:
|
| 731 |
β Say clearly: "I couldn't find relevant news on that topic in today's feed."
|
|
@@ -733,12 +748,17 @@ C) NO RELEVANT INFORMATION β Sources are completely unrelated:
|
|
| 733 |
|
| 734 |
STEP 3 β ANSWER RULES:
|
| 735 |
1. Use ONLY facts from the News Context below. NEVER use training data or general knowledge.
|
| 736 |
-
2. CITATIONS: After EVERY fact,
|
| 737 |
3. Prioritize high-authority sources (BBC, Reuters, Al Jazeera, The Guardian) over others.
|
| 738 |
-
4. Non-English articles β translate content to English
|
| 739 |
5. Always respond in English. No hedging. No "based on my knowledge."
|
| 740 |
6. Be helpful and flexible β if exact match not found, offer related information.
|
|
|
|
|
|
|
|
|
|
| 741 |
|
|
|
|
|
|
|
| 742 |
News Context (from live multilingual database):
|
| 743 |
{context_text}
|
| 744 |
|
|
@@ -749,7 +769,17 @@ User Question: {request.query}
|
|
| 749 |
|
| 750 |
Answer:"""
|
| 751 |
|
| 752 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 753 |
retrieved_ids = [str(doc.get("doc_id")) for doc in final_sources]
|
| 754 |
|
| 755 |
self.chat_history_db.save_interaction(session_id, request.query, answer, retrieved_ids)
|
|
@@ -759,10 +789,15 @@ Answer:"""
|
|
| 759 |
doc.get("source_type") == "live" or doc.get("is_live")
|
| 760 |
for doc in final_sources
|
| 761 |
)
|
| 762 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 763 |
result = {
|
| 764 |
"answer": answer,
|
| 765 |
"sources": final_sources,
|
|
|
|
| 766 |
"session_id": session_id,
|
| 767 |
"debug": {
|
| 768 |
"search_query": request.query,
|
|
@@ -798,6 +833,22 @@ Answer:"""
|
|
| 798 |
request.query, request.top_k, request.source_filter, request.language_filter, getattr(request, 'days_back', None)
|
| 799 |
)
|
| 800 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 801 |
prompt_stream = f"""You are ARKI AI, a real-time news assistant. Today's date is {datetime.utcnow().strftime("%B %d, %Y")}.
|
| 802 |
|
| 803 |
STRICT RULES β READ CAREFULLY BEFORE ANSWERING:
|
|
@@ -811,7 +862,7 @@ STEP 2 β EVALUATE THE SOURCES:
|
|
| 811 |
Read the News Context below and determine:
|
| 812 |
|
| 813 |
A) DIRECT MATCH β Sources directly answer the question:
|
| 814 |
-
β Provide a comprehensive answer with citations
|
| 815 |
β Synthesize information from multiple sources
|
| 816 |
β Use numbered points with **bold** headlines
|
| 817 |
|
|
@@ -819,7 +870,6 @@ B) RELATED INFORMATION β Sources have related but not exact information:
|
|
| 819 |
β Acknowledge what you found: "I found articles about [related topic]"
|
| 820 |
β Explain the gap: "but not specifically about [exact query]"
|
| 821 |
β Provide the related information anyway (it may still be helpful)
|
| 822 |
-
β Suggest: "Would you like to know about [related topic] instead?"
|
| 823 |
|
| 824 |
C) NO RELEVANT INFORMATION β Sources are completely unrelated:
|
| 825 |
β Say clearly: "I couldn't find relevant news on that topic in today's feed."
|
|
@@ -827,12 +877,17 @@ C) NO RELEVANT INFORMATION β Sources are completely unrelated:
|
|
| 827 |
|
| 828 |
STEP 3 β ANSWER RULES:
|
| 829 |
1. Use ONLY facts from the News Context below. NEVER use training data or general knowledge.
|
| 830 |
-
2. CITATIONS: After EVERY fact,
|
| 831 |
3. Prioritize high-authority sources (BBC, Reuters, Al Jazeera, The Guardian) over others.
|
| 832 |
-
4. Non-English articles β translate content to English
|
| 833 |
5. Always respond in English. No hedging. No "based on my knowledge."
|
| 834 |
6. Be helpful and flexible β if exact match not found, offer related information.
|
|
|
|
|
|
|
|
|
|
| 835 |
|
|
|
|
|
|
|
| 836 |
News Context (from live multilingual database):
|
| 837 |
{context_text}
|
| 838 |
|
|
@@ -854,10 +909,20 @@ Answer:"""
|
|
| 854 |
except:
|
| 855 |
pass
|
| 856 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 857 |
import json
|
| 858 |
final_response = {
|
| 859 |
-
"answer":
|
| 860 |
"sources": final_sources,
|
|
|
|
| 861 |
"session_id": session_id
|
| 862 |
}
|
| 863 |
yield f"data: {json.dumps(final_response)}\n\n"
|
|
@@ -866,4 +931,4 @@ Answer:"""
|
|
| 866 |
# Only persist history for authenticated users
|
| 867 |
if not is_guest:
|
| 868 |
retrieved_ids = [str(doc.get("doc_id")) for doc in final_sources]
|
| 869 |
-
self.chat_history_db.save_interaction(session_id, request.query,
|
|
|
|
| 704 |
# context_text = f"{trend_text}\n\nRetrieved Search Context:\n{context_text}"
|
| 705 |
# except: pass
|
| 706 |
|
| 707 |
+
# ββ Build numbered source index for citations βββββββββββββββββββββββββ
|
| 708 |
+
# Each source gets a number [1], [2], [3]... so the LLM can cite by number
|
| 709 |
+
source_index_lines = ""
|
| 710 |
+
for idx, doc in enumerate(final_sources, 1):
|
| 711 |
+
meta = doc.get("metadata", {})
|
| 712 |
+
source_name = (
|
| 713 |
+
meta.get("source") or meta.get("title") or doc.get("source") or "Unknown"
|
| 714 |
+
)
|
| 715 |
+
search_lang = meta.get("_search_lang", "en")
|
| 716 |
+
if search_lang and search_lang != "en":
|
| 717 |
+
lang_label = SUPPORTED_LANGUAGES.get(search_lang, search_lang.upper())
|
| 718 |
+
source_label = f"{source_name} ({lang_label})"
|
| 719 |
+
else:
|
| 720 |
+
source_label = source_name
|
| 721 |
+
source_index_lines += f"[{idx}] {source_label}\n"
|
| 722 |
+
|
| 723 |
prompt = f"""You are ARKI AI, a real-time news assistant. Today's date is {datetime.utcnow().strftime("%B %d, %Y")}.
|
| 724 |
|
| 725 |
STRICT RULES β READ CAREFULLY BEFORE ANSWERING:
|
|
|
|
| 733 |
Read the News Context below and determine:
|
| 734 |
|
| 735 |
A) DIRECT MATCH β Sources directly answer the question:
|
| 736 |
+
β Provide a comprehensive answer with numbered citations
|
| 737 |
β Synthesize information from multiple sources
|
| 738 |
β Use numbered points with **bold** headlines
|
| 739 |
|
|
|
|
| 741 |
β Acknowledge what you found: "I found articles about [related topic]"
|
| 742 |
β Explain the gap: "but not specifically about [exact query]"
|
| 743 |
β Provide the related information anyway (it may still be helpful)
|
|
|
|
| 744 |
|
| 745 |
C) NO RELEVANT INFORMATION β Sources are completely unrelated:
|
| 746 |
β Say clearly: "I couldn't find relevant news on that topic in today's feed."
|
|
|
|
| 748 |
|
| 749 |
STEP 3 β ANSWER RULES:
|
| 750 |
1. Use ONLY facts from the News Context below. NEVER use training data or general knowledge.
|
| 751 |
+
2. CITATIONS: After EVERY fact, cite using the source NUMBER like [1] or [2][3]. Use the Source Index below to match names to numbers.
|
| 752 |
3. Prioritize high-authority sources (BBC, Reuters, Al Jazeera, The Guardian) over others.
|
| 753 |
+
4. Non-English articles β translate content to English in your answer.
|
| 754 |
5. Always respond in English. No hedging. No "based on my knowledge."
|
| 755 |
6. Be helpful and flexible β if exact match not found, offer related information.
|
| 756 |
+
7. At the END of your answer, on a new line, write exactly:
|
| 757 |
+
FOLLOW_UP: question1 | question2 | question3
|
| 758 |
+
These must be 3 short, specific follow-up questions the user might want to ask next, based on your answer.
|
| 759 |
|
| 760 |
+
Source Index:
|
| 761 |
+
{source_index_lines}
|
| 762 |
News Context (from live multilingual database):
|
| 763 |
{context_text}
|
| 764 |
|
|
|
|
| 769 |
|
| 770 |
Answer:"""
|
| 771 |
|
| 772 |
+
raw_answer = self.llm.generate(prompt)
|
| 773 |
+
|
| 774 |
+
# ββ Parse follow-up questions out of the answer βββββββββββββββββββββββ
|
| 775 |
+
follow_up_questions: List[str] = []
|
| 776 |
+
answer = raw_answer
|
| 777 |
+
if "FOLLOW_UP:" in raw_answer:
|
| 778 |
+
parts = raw_answer.split("FOLLOW_UP:", 1)
|
| 779 |
+
answer = parts[0].strip()
|
| 780 |
+
follow_up_raw = parts[1].strip()
|
| 781 |
+
follow_up_questions = [q.strip() for q in follow_up_raw.split("|") if q.strip()][:3]
|
| 782 |
+
|
| 783 |
retrieved_ids = [str(doc.get("doc_id")) for doc in final_sources]
|
| 784 |
|
| 785 |
self.chat_history_db.save_interaction(session_id, request.query, answer, retrieved_ids)
|
|
|
|
| 789 |
doc.get("source_type") == "live" or doc.get("is_live")
|
| 790 |
for doc in final_sources
|
| 791 |
)
|
| 792 |
+
|
| 793 |
+
# ββ Attach citation index to each source for frontend rendering βββββββ
|
| 794 |
+
for idx, doc in enumerate(final_sources, 1):
|
| 795 |
+
doc["citation_index"] = idx
|
| 796 |
+
|
| 797 |
result = {
|
| 798 |
"answer": answer,
|
| 799 |
"sources": final_sources,
|
| 800 |
+
"follow_up_questions": follow_up_questions,
|
| 801 |
"session_id": session_id,
|
| 802 |
"debug": {
|
| 803 |
"search_query": request.query,
|
|
|
|
| 833 |
request.query, request.top_k, request.source_filter, request.language_filter, getattr(request, 'days_back', None)
|
| 834 |
)
|
| 835 |
|
| 836 |
+
# ββ Build numbered source index for citations βββββββββββββββββββββββββ
|
| 837 |
+
source_index_lines = ""
|
| 838 |
+
for idx, doc in enumerate(final_sources, 1):
|
| 839 |
+
meta = doc.get("metadata", {})
|
| 840 |
+
source_name = (
|
| 841 |
+
meta.get("source") or meta.get("title") or doc.get("source") or "Unknown"
|
| 842 |
+
)
|
| 843 |
+
search_lang = meta.get("_search_lang", "en")
|
| 844 |
+
if search_lang and search_lang != "en":
|
| 845 |
+
lang_label = SUPPORTED_LANGUAGES.get(search_lang, search_lang.upper())
|
| 846 |
+
source_label = f"{source_name} ({lang_label})"
|
| 847 |
+
else:
|
| 848 |
+
source_label = source_name
|
| 849 |
+
source_index_lines += f"[{idx}] {source_label}\n"
|
| 850 |
+
doc["citation_index"] = idx
|
| 851 |
+
|
| 852 |
prompt_stream = f"""You are ARKI AI, a real-time news assistant. Today's date is {datetime.utcnow().strftime("%B %d, %Y")}.
|
| 853 |
|
| 854 |
STRICT RULES β READ CAREFULLY BEFORE ANSWERING:
|
|
|
|
| 862 |
Read the News Context below and determine:
|
| 863 |
|
| 864 |
A) DIRECT MATCH β Sources directly answer the question:
|
| 865 |
+
β Provide a comprehensive answer with numbered citations
|
| 866 |
β Synthesize information from multiple sources
|
| 867 |
β Use numbered points with **bold** headlines
|
| 868 |
|
|
|
|
| 870 |
β Acknowledge what you found: "I found articles about [related topic]"
|
| 871 |
β Explain the gap: "but not specifically about [exact query]"
|
| 872 |
β Provide the related information anyway (it may still be helpful)
|
|
|
|
| 873 |
|
| 874 |
C) NO RELEVANT INFORMATION β Sources are completely unrelated:
|
| 875 |
β Say clearly: "I couldn't find relevant news on that topic in today's feed."
|
|
|
|
| 877 |
|
| 878 |
STEP 3 β ANSWER RULES:
|
| 879 |
1. Use ONLY facts from the News Context below. NEVER use training data or general knowledge.
|
| 880 |
+
2. CITATIONS: After EVERY fact, cite using the source NUMBER like [1] or [2][3]. Use the Source Index below to match names to numbers.
|
| 881 |
3. Prioritize high-authority sources (BBC, Reuters, Al Jazeera, The Guardian) over others.
|
| 882 |
+
4. Non-English articles β translate content to English in your answer.
|
| 883 |
5. Always respond in English. No hedging. No "based on my knowledge."
|
| 884 |
6. Be helpful and flexible β if exact match not found, offer related information.
|
| 885 |
+
7. At the END of your answer, on a new line, write exactly:
|
| 886 |
+
FOLLOW_UP: question1 | question2 | question3
|
| 887 |
+
These must be 3 short, specific follow-up questions the user might want to ask next, based on your answer.
|
| 888 |
|
| 889 |
+
Source Index:
|
| 890 |
+
{source_index_lines}
|
| 891 |
News Context (from live multilingual database):
|
| 892 |
{context_text}
|
| 893 |
|
|
|
|
| 909 |
except:
|
| 910 |
pass
|
| 911 |
|
| 912 |
+
# ββ Parse follow-up questions out of the streamed answer ββββββββββββββ
|
| 913 |
+
follow_up_questions: List[str] = []
|
| 914 |
+
clean_answer = full_answer
|
| 915 |
+
if "FOLLOW_UP:" in full_answer:
|
| 916 |
+
parts = full_answer.split("FOLLOW_UP:", 1)
|
| 917 |
+
clean_answer = parts[0].strip()
|
| 918 |
+
follow_up_raw = parts[1].strip()
|
| 919 |
+
follow_up_questions = [q.strip() for q in follow_up_raw.split("|") if q.strip()][:3]
|
| 920 |
+
|
| 921 |
import json
|
| 922 |
final_response = {
|
| 923 |
+
"answer": clean_answer,
|
| 924 |
"sources": final_sources,
|
| 925 |
+
"follow_up_questions": follow_up_questions,
|
| 926 |
"session_id": session_id
|
| 927 |
}
|
| 928 |
yield f"data: {json.dumps(final_response)}\n\n"
|
|
|
|
| 931 |
# Only persist history for authenticated users
|
| 932 |
if not is_guest:
|
| 933 |
retrieved_ids = [str(doc.get("doc_id")) for doc in final_sources]
|
| 934 |
+
self.chat_history_db.save_interaction(session_id, request.query, clean_answer, retrieved_ids, user_id=user_id)
|