Spaces:

10gen
/

deepsearchitv2

Running

App Files Files Community

Guiyom commited on Mar 2, 2025

Commit

319411a

verified ·

1 Parent(s): 1bd735b

Update app.py

Browse files

Files changed (1) hide show

app.py +17 -5

app.py CHANGED Viewed

@@ -1232,6 +1232,7 @@ Important:
 - You will safely mention any of these elements provided you respect the above mentioned formatting {{[{{...}}]}}
 - You must have at least 20 of such occurences - this is important for the grounding of the report in verifiable facts and sources
 - Don't mention the safe-formatting in the report or at the end, just do it - This is just for regex processing purpose
 // Important
 - Make it real, with anecdotes from the content
@@ -1243,6 +1244,7 @@ Important:
     result = openai_call(prompt, model="o3-mini", max_tokens_param=10000)
     result = result.strip().strip("```").strip()
     result = re.sub(r'\{\[\{(.*?)\}\]\}', r'\1', result)
     logging.info(f"The code produced for this focus placeholder:\n{placeholder_text}\n\n {result}\n\n")
     return result
@@ -1450,10 +1452,12 @@ Note: the output will be processed through regex and the identifiers removed, bu
 Important:
 - You will safely mention any of these elements provided you respect the above mentioned formatting {{[{{...}}]}}
 - You must have at least 20 of such occurences - this is important for the grounding of the report in verifiable facts and sources
-- Don't mention the safe-formatting in the report or at the end, just do it - This is just for regex processing purpose"""
         )
         summary_chunk = openai_call(prompt=chunk_prompt, model="gpt-4o-mini", max_tokens_param=500, temperature=0.7)
         summary_chunk = re.sub(r'\{\[\{(.*?)\}\]\}', r'\1', summary_chunk)
         global SUMMARIZATION_REQUEST_COUNT, TOTAL_SUMMARIZED_WORDS
         SUMMARIZATION_REQUEST_COUNT += 1
         TOTAL_SUMMARIZED_WORDS += len(summary_chunk.split())
@@ -1478,10 +1482,12 @@ Note: the output will be processed through regex and the identifiers removed, bu
 Important:
 - you will safely mention any of these elements provided you respect the above mentioned formatting {{[{{...}}]}}
 - you must have at least 20 of such occurences
-- don't mention this formatting thing in the report, just do it"""
     )
     final_summary = openai_call(prompt=final_prompt, model="gpt-4o-mini", max_tokens_param=target_length, temperature=0.7)
     final_summary = re.sub(r'\{\[\{(.*?)\}\]\}', r'\1', final_summary)
     return final_summary.strip()
@@ -1543,6 +1549,7 @@ Important:
 - you will safely mention any of these elements provided you respect the above mentioned formatting {{[{{...}}]}}
 - You must have at least 20 of such occurences - this is important for the grounding of the report in verifiable facts and sources
 - don't mention this formatting thing in the report, just do it
 IMPORTANT: Format your response as a proper JSON object with these fields:
 - "relevant": "yes" or "no"
@@ -1554,6 +1561,7 @@ IMPORTANT: Format your response as a proper JSON object with these fields:
     try:
         response = openai_call(prompt=prompt, model="gpt-4o-mini", max_tokens_param=max_tokens, temperature=temperature)
         response = re.sub(r'\{\[\{(.*?)\}\]\}', r'\1', response)
         if not response:
             logging.error("analyze_with_gpt4o: Empty response received from API.")
             return {"relevant": "no", "summary": "", "followups": []}
@@ -1733,6 +1741,7 @@ Important:
 - You will safely mention any of these elements provided you respect the above mentioned formatting {{[{{...}}]}}
 - You must have at least {10 * pages} of such occurences scattered around the report - this is important for the grounding of the report in verifiable facts and sources
 - Don't mention the safe-formatting in the report or at the end, just do it - This is just for regex processing purpose
 // Sources
 Use the following learnings and merged reference details from a deep research process on:
@@ -1766,6 +1775,11 @@ Note: Exclude the use of html numbered lists format, they don't get correctly im
   <h4> for bulletpoint title (ex: <h4>item to detail:</h4>details ...)
 - Use inline formatting for the tables with homogeneous border and colors
 - Avoid Chinese characters in the output (use the Pinyin version) since they won't display correcly in the pdf (black boxes)
 --------------- Placeholders -----------
 In order to enrich the content, within the core sections (between introduction and conclusion), you can inject some placeholders that will be developped later on.
@@ -1858,9 +1872,6 @@ with:
 Important note for focus placeholders:
 - after [[ put "Focus Placeholder n:" explicitly (with n as the ref number of the focus box created). This will be used in a regex
 - Do not add a title for the Focus placeholder just before the [[...]], the content that will replace the focus placeholder - generated later on - will already include a title
-- For the Table of contents: do not mention the pages, but make each item on separate line
-- The reference table at the end containing the citations details should have 4 columns: the ref number, the title of the document, the author(s, the URL - with hyperlink)
-the name of the reference table should be: "Reference Summary Table"
 // Report ending required
 End the report with the following sequence:
@@ -1911,6 +1922,7 @@ Important note: placeholders (visual, graph or focus) can only appear in the sec
     report = openai_call(prompt, model="o3-mini", max_tokens_param=tokentarget)
     # Post-processing
     report = re.sub(r'\{\[\{(.*?)\}\]\}', r'\1', report)
     # If the report is too long, compress it.
     if len(report) > MAX_MESSAGE_LENGTH:

 - You will safely mention any of these elements provided you respect the above mentioned formatting {{[{{...}}]}}
 - You must have at least 20 of such occurences - this is important for the grounding of the report in verifiable facts and sources
 - Don't mention the safe-formatting in the report or at the end, just do it - This is just for regex processing purpose
+- LinkedIn is not a source - you should check the author of the page visited, this is the real source, mention the name of the author and then add "from LinkedIn Pulse"
 // Important
 - Make it real, with anecdotes from the content
     result = openai_call(prompt, model="o3-mini", max_tokens_param=10000)
     result = result.strip().strip("```").strip()
     result = re.sub(r'\{\[\{(.*?)\}\]\}', r'\1', result)
+    result = re.sub(r'\[\{(.*?)\}\]', r'\1', result)
     logging.info(f"The code produced for this focus placeholder:\n{placeholder_text}\n\n {result}\n\n")
     return result
 Important:
 - You will safely mention any of these elements provided you respect the above mentioned formatting {{[{{...}}]}}
 - You must have at least 20 of such occurences - this is important for the grounding of the report in verifiable facts and sources
+- Don't mention the safe-formatting in the report or at the end, just do it - This is just for regex processing purpose
+- LinkedIn is not a source - you should check the author of the page visited, this is the real source, mention the name of the author and then add "from LinkedIn Pulse""""
         )
         summary_chunk = openai_call(prompt=chunk_prompt, model="gpt-4o-mini", max_tokens_param=500, temperature=0.7)
         summary_chunk = re.sub(r'\{\[\{(.*?)\}\]\}', r'\1', summary_chunk)
+        summary_chunk = re.sub(r'\[\{(.*?)\}\]', r'\1', summary_chunk)
         global SUMMARIZATION_REQUEST_COUNT, TOTAL_SUMMARIZED_WORDS
         SUMMARIZATION_REQUEST_COUNT += 1
         TOTAL_SUMMARIZED_WORDS += len(summary_chunk.split())
 Important:
 - you will safely mention any of these elements provided you respect the above mentioned formatting {{[{{...}}]}}
 - you must have at least 20 of such occurences
+- don't mention this formatting thing in the report, just do it
+- LinkedIn is not a source - you should check the author of the page visited, this is the real source, mention the name of the author and then add "from LinkedIn Pulse""""
     )
     final_summary = openai_call(prompt=final_prompt, model="gpt-4o-mini", max_tokens_param=target_length, temperature=0.7)
     final_summary = re.sub(r'\{\[\{(.*?)\}\]\}', r'\1', final_summary)
+    final_summary = re.sub(r'\[\{(.*?)\}\]', r'\1', final_summary)
     return final_summary.strip()
 - you will safely mention any of these elements provided you respect the above mentioned formatting {{[{{...}}]}}
 - You must have at least 20 of such occurences - this is important for the grounding of the report in verifiable facts and sources
 - don't mention this formatting thing in the report, just do it
+- LinkedIn is not a source - you should check the author of the page visited, this is the real source, mention the name of the author and then add "from LinkedIn Pulse"
 IMPORTANT: Format your response as a proper JSON object with these fields:
 - "relevant": "yes" or "no"
     try:
         response = openai_call(prompt=prompt, model="gpt-4o-mini", max_tokens_param=max_tokens, temperature=temperature)
         response = re.sub(r'\{\[\{(.*?)\}\]\}', r'\1', response)
+        response = re.sub(r'\[\{(.*?)\}\]', r'\1', response)
         if not response:
             logging.error("analyze_with_gpt4o: Empty response received from API.")
             return {"relevant": "no", "summary": "", "followups": []}
 - You will safely mention any of these elements provided you respect the above mentioned formatting {{[{{...}}]}}
 - You must have at least {10 * pages} of such occurences scattered around the report - this is important for the grounding of the report in verifiable facts and sources
 - Don't mention the safe-formatting in the report or at the end, just do it - This is just for regex processing purpose
+- LinkedIn is not a source - you should check the author of the page visited, this is the real source, mention the name of the author and then add "from LinkedIn Pulse"
 // Sources
 Use the following learnings and merged reference details from a deep research process on:
   <h4> for bulletpoint title (ex: <h4>item to detail:</h4>details ...)
 - Use inline formatting for the tables with homogeneous border and colors
 - Avoid Chinese characters in the output (use the Pinyin version) since they won't display correcly in the pdf (black boxes)
+- For the Table of contents: do not mention the pages, but make each item on separate line
+- Put "Table of contents" and "Abstract" title in h1 format.
+- The Table of contents should not mention the abstract and table of contents, the numbering should start from the introduction and end with References Summary Table
+- The reference table at the end containing the citations details should have 4 columns: the ref number, the title of the document, the author(s, the URL - with hyperlink)
+- the name of the reference table should be: "Reference Summary Table"
 --------------- Placeholders -----------
 In order to enrich the content, within the core sections (between introduction and conclusion), you can inject some placeholders that will be developped later on.
 Important note for focus placeholders:
 - after [[ put "Focus Placeholder n:" explicitly (with n as the ref number of the focus box created). This will be used in a regex
 - Do not add a title for the Focus placeholder just before the [[...]], the content that will replace the focus placeholder - generated later on - will already include a title
 // Report ending required
 End the report with the following sequence:
     report = openai_call(prompt, model="o3-mini", max_tokens_param=tokentarget)
     # Post-processing
     report = re.sub(r'\{\[\{(.*?)\}\]\}', r'\1', report)
+    report = re.sub(r'\[\{(.*?)\}\]', r'\1', report)
     # If the report is too long, compress it.
     if len(report) > MAX_MESSAGE_LENGTH: