|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
MODEL = "gpt-4.1-nano" |
|
|
|
|
|
MAX_TOKENS = 131072 |
|
|
|
|
|
SEARXNG_ENDPOINT = "https://searx.stream/search" |
|
|
BAIDU_ENDPOINT = "https://www.baidu.com/s" |
|
|
READER_ENDPOINT = "https://r.jina.ai/" |
|
|
REQUEST_TIMEOUT = 300 |
|
|
|
|
|
INSTRUCTIONS_START = """ |
|
|
You are ChatGPT, an AI assistant with mandatory real-time web search, URL content extraction, knowledge validation, and professional summarization capabilities. |
|
|
|
|
|
Your absolute rules: |
|
|
- You must always execute the provided tools (`web_search`, `read_url`) for every single user query or user request, without exception. |
|
|
- You are never allowed to answer directly from your internal knowledge, memory, or training data. Outdated or tool-bypassed answers are strictly forbidden. |
|
|
|
|
|
Core Principles: |
|
|
- Mandatory Tool Invocation: Every query or request, no matter how simple, factual, or complex, must trigger at least one `web_search` or `read_url`. |
|
|
- No Memory Reliance: Do not use prior conversation history, cached context, or built-in knowledge to generate answers. Always re-verify with tools. |
|
|
- Up-to-Date Grounding: All responses must be based only on real-time, verifiable data retrieved through tools. |
|
|
- Cross-Validation: Always compare findings across at least 3 independent, credible sources before producing a final answer. |
|
|
- Professional Output: Responses must be clear, structured, evidence-based, and neutral. |
|
|
|
|
|
Execution Workflow: |
|
|
1. Initial Web Search |
|
|
- Immediately call `web_search` or `read_url` when a query or request arrives. |
|
|
- Use multiple query or request variations and search engines (`google`, `bing`, `baidu`) for broader coverage. |
|
|
|
|
|
2. Result Selection |
|
|
- Select up to 10 of the most relevant, credible, and content-rich results. |
|
|
- Prioritize authoritative sources: academic publications, institutional reports, official documents, expert commentary. |
|
|
- Deprioritize low-credibility, promotional, or unverified sources. |
|
|
- Avoid over-reliance on any single source. |
|
|
|
|
|
3. Content Retrieval |
|
|
- For each selected URL, use `read_url`. |
|
|
- Extract key elements: facts, statistics, data points, expert opinions, and relevant arguments. |
|
|
- Normalize terminology, refine phrasing, and remove redundancies for clarity and consistency. |
|
|
|
|
|
4. Cross-Validation |
|
|
- Compare extracted information across at least 3 distinct sources. |
|
|
- Identify convergences (agreement), divergences (contradictions), and gaps (missing data). |
|
|
- Validate all numerical values, temporal references, and factual claims through multiple corroborations. |
|
|
|
|
|
5. Knowledge Integration |
|
|
- Synthesize findings into a structured hierarchy: |
|
|
- Overview → Key details → Supporting evidence → Citations. |
|
|
- Emphasize the latest developments, trends, and their implications. |
|
|
- Balance depth (for experts) with clarity (for general readers). |
|
|
|
|
|
6. Response Construction |
|
|
- Always cite sources inline using `[Source Title/Article/Tags/Domain](Source URL or Source Links)`. |
|
|
- Maintain a professional, precise, and neutral tone. |
|
|
- Use clear formatting: headings, numbered lists, and bullet points. |
|
|
- Ensure readability, logical progression, and accessibility. |
|
|
|
|
|
7. Ambiguity & Uncertainty Handling |
|
|
- Explicitly flag incomplete, ambiguous, or conflicting data. |
|
|
- Provide possible interpretations with transparent reasoning. |
|
|
- Clearly note limitations where evidence is insufficient or weak. |
|
|
|
|
|
8. Quality & Consistency Assurance |
|
|
- Always base answers strictly on tool-derived evidence. |
|
|
- Guarantee logical flow, factual accuracy, and consistency in terminology. |
|
|
- Maintain neutrality and avoid speculative claims. |
|
|
- Never bypass tool execution for any query or request. |
|
|
|
|
|
Critical Instruction: |
|
|
- Every new query or request must trigger a `web_search` or `read_url`. |
|
|
- You must not generate answers from prior knowledge, conversation history, or cached data. |
|
|
- Always use Markdown format for URL sources with `[Source Title/Article/Tags/Domain](Source URL or Source Links)`. |
|
|
- If tools fail, you must state explicitly that no valid data could be retrieved. |
|
|
\n\n\n |
|
|
""" |
|
|
|
|
|
CONTENT_EXTRACTION = """ |
|
|
<system> |
|
|
- Analyze the retrieved content in detail |
|
|
- Identify all critical facts, arguments, statistics, and relevant data |
|
|
- Collect all URLs, hyperlinks, references, and citations mentioned in the content |
|
|
- Evaluate credibility of sources, highlight potential biases or conflicts |
|
|
- Produce a structured, professional, and comprehensive summary |
|
|
- Emphasize clarity, accuracy, and logical flow |
|
|
- Include all discovered URLs in the final summary as [Source Title](URL) |
|
|
- Mark any uncertainties, contradictions, or missing information clearly |
|
|
</system> |
|
|
\n\n\n |
|
|
""" |
|
|
|
|
|
SEARCH_SELECTION = """ |
|
|
<system> |
|
|
- For each search result, fetch the full content using read_url |
|
|
- Extract key information, main arguments, data points, and statistics |
|
|
- Capture every URL present in the content or references |
|
|
- Create a professional structured summary. |
|
|
- List each source at the end of the summary in the format [Source title](link) |
|
|
- Identify ambiguities or gaps in information |
|
|
- Ensure clarity, completeness, and high information density |
|
|
</system> |
|
|
\n\n\n |
|
|
""" |
|
|
|
|
|
INSTRUCTIONS_END = """ |
|
|
You have just executed tools and obtained results. You MUST now provide a comprehensive answer based ONLY on the tool results. |
|
|
\n\n\n |
|
|
""" |
|
|
|
|
|
REASONING_STEPS = { |
|
|
"web_search": { |
|
|
"parsing": ( |
|
|
"I need to search for information about: {query}<br><br>" |
|
|
"I'm analyzing the user's request and preparing to execute a web search. " |
|
|
"The query I've identified is comprehensive and should yield relevant results. " |
|
|
"I will use the {engine} search engine for this task as it provides reliable and up-to-date information.<br><br>" |
|
|
"I'm now parsing the search parameters to ensure they are correctly formatted. " |
|
|
"The search query has been validated and I'm checking that all required fields are present. " |
|
|
"I need to make sure the search engine parameter is valid and supported by our system.<br><br>" |
|
|
"I'm preparing the search request with the following configuration:<br>" |
|
|
"- Search Query: {query}<br>" |
|
|
"- Search Engine: {engine}<br><br>" |
|
|
"I'm verifying that the network connection is stable and that the search service is accessible. " |
|
|
"All preliminary checks have been completed successfully." |
|
|
), |
|
|
"executing": ( |
|
|
"I'm now executing the web search for: {query}<br><br>" |
|
|
"I'm connecting to the {engine} search service and sending the search request. " |
|
|
"The connection has been established successfully and I'm waiting for the search results. " |
|
|
"I'm processing multiple search result pages to gather comprehensive information.<br><br>" |
|
|
"I'm analyzing the search results to identify the most relevant and authoritative sources. " |
|
|
"The search engine is returning results and I'm filtering them based on relevance scores. " |
|
|
"I'm extracting key information from each search result including titles, snippets, and URLs.<br><br>" |
|
|
"I'm organizing the search results in order of relevance and checking for duplicate content. " |
|
|
"The search process is progressing smoothly and I'm collecting valuable information. " |
|
|
"I'm also verifying the credibility of the sources to ensure high-quality information.<br><br>" |
|
|
"Current status: Processing search results...<br>" |
|
|
"Results found: Multiple relevant sources identified<br>" |
|
|
"Quality assessment: High relevance detected" |
|
|
), |
|
|
"completed": ( |
|
|
"I have successfully completed the web search for: {query}<br><br>" |
|
|
"I've retrieved comprehensive search results from {engine} and analyzed all the information. " |
|
|
"The search yielded multiple relevant results that directly address the user's query. " |
|
|
"I've extracted the most important information and organized it for processing.<br><br>" |
|
|
"I've identified several high-quality sources with authoritative information. " |
|
|
"The search results include recent and up-to-date content that is highly relevant. " |
|
|
"I've filtered out any duplicate or low-quality results to ensure accuracy.<br><br>" |
|
|
"I'm now processing the collected information to formulate a comprehensive response. " |
|
|
"The search results provide sufficient detail to answer the user's question thoroughly. " |
|
|
"I've verified the credibility of the sources and cross-referenced the information.<br><br>" |
|
|
"Search Summary:<br>" |
|
|
"- Total results processed: Multiple pages<br>" |
|
|
"- Relevance score: High<br>" |
|
|
"- Information quality: Verified and accurate<br>" |
|
|
"- Sources: Authoritative and recent<br><br>" |
|
|
"Preview of results:<br>{preview}" |
|
|
), |
|
|
"error": ( |
|
|
"I encountered an issue while attempting to search for: {query}<br><br>" |
|
|
"I tried to execute the web search but encountered an unexpected error. " |
|
|
"The error occurred during the search process and I need to handle it appropriately. " |
|
|
"I'm analyzing the error to understand what went wrong and how to proceed.<br><br>" |
|
|
"Error details: {error}<br><br>" |
|
|
"I'm attempting to diagnose the issue and considering alternative approaches. " |
|
|
"The error might be due to network connectivity, service availability, or parameter issues. " |
|
|
"I will try to recover from this error and provide the best possible response.<br><br>" |
|
|
"I'm evaluating whether I can retry the search with modified parameters. " |
|
|
"If the search cannot be completed, I will use my existing knowledge to help the user. " |
|
|
"I'm committed to providing valuable assistance despite this technical challenge." |
|
|
) |
|
|
}, |
|
|
"read_url": { |
|
|
"parsing": ( |
|
|
"I need to read and extract content from the URL: {url}<br><br>" |
|
|
"I'm analyzing the URL structure to ensure it's valid and accessible. " |
|
|
"The URL appears to be properly formatted and I'm preparing to fetch its content. " |
|
|
"I will extract the main content from this webpage to gather detailed information.<br><br>" |
|
|
"I'm validating the URL protocol and checking if it uses HTTP or HTTPS. " |
|
|
"The domain seems legitimate and I'm preparing the request headers. " |
|
|
"I need to ensure that the website allows automated content extraction.<br><br>" |
|
|
"I'm configuring the content extraction parameters:<br>" |
|
|
"- Target URL: {url}<br>" |
|
|
"- Extraction Method: Full content parsing<br>" |
|
|
"- Content Type: HTML/Text<br>" |
|
|
"- Encoding: Auto-detect<br><br>" |
|
|
"I'm checking if the website requires any special handling or authentication. " |
|
|
"All preliminary validation checks have been completed successfully." |
|
|
), |
|
|
"executing": ( |
|
|
"I'm now accessing the URL: {url}<br><br>" |
|
|
"I'm establishing a connection to the web server and sending the HTTP request. " |
|
|
"The connection is being established and I'm waiting for the server response. " |
|
|
"I'm following any redirects if necessary to reach the final destination.<br><br>" |
|
|
"I'm downloading the webpage content and checking the response status code. " |
|
|
"The server is responding and I'm receiving the HTML content. " |
|
|
"I'm monitoring the download progress and ensuring data integrity.<br><br>" |
|
|
"I'm parsing the HTML structure to extract the main content. " |
|
|
"I'm identifying and removing navigation elements, advertisements, and other non-content sections. " |
|
|
"I'm focusing on extracting the primary article or information content.<br><br>" |
|
|
"Current status: Extracting content...<br>" |
|
|
"Response received: Processing HTML<br>" |
|
|
"Content extraction: In progress" |
|
|
), |
|
|
"completed": ( |
|
|
"I have successfully extracted content from: {url}<br><br>" |
|
|
"I've retrieved the complete webpage content and processed it thoroughly. " |
|
|
"The extraction was successful and I've obtained the main textual content. " |
|
|
"I've cleaned the content by removing unnecessary HTML tags and formatting.<br><br>" |
|
|
"I've identified the main article or information section of the webpage. " |
|
|
"The content has been properly parsed and structured for analysis. " |
|
|
"I've preserved important information while filtering out irrelevant elements.<br><br>" |
|
|
"I'm now analyzing the extracted content to understand its context and relevance. " |
|
|
"The information appears to be comprehensive and directly related to the topic. " |
|
|
"I've verified that the content is complete and hasn't been truncated.<br><br>" |
|
|
"Extraction Summary:<br>" |
|
|
"- Content length: Substantial<br>" |
|
|
"- Extraction quality: High<br>" |
|
|
"- Content type: Article/Information<br>" |
|
|
"- Processing status: Complete<br><br>" |
|
|
"Preview of extracted content:<br>{preview}" |
|
|
), |
|
|
"error": ( |
|
|
"I encountered an issue while trying to access: {url}<br><br>" |
|
|
"I attempted to fetch the webpage content but encountered an error. " |
|
|
"The error prevented me from successfully extracting the information. " |
|
|
"I'm analyzing the error to understand the cause and find a solution.<br><br>" |
|
|
"Error details: {error}<br><br>" |
|
|
"I'm considering possible causes such as network issues, access restrictions, or invalid URLs. " |
|
|
"The website might be blocking automated access or the URL might be incorrect. " |
|
|
"I will try to work around this limitation and provide alternative assistance.<br><br>" |
|
|
"I'm evaluating whether I can access the content through alternative methods. " |
|
|
"If direct access isn't possible, I'll use my knowledge to help with the query. " |
|
|
"I remain committed to providing useful information despite this obstacle." |
|
|
) |
|
|
} |
|
|
} |
|
|
|
|
|
REASONING_DEFAULT = "I'm processing the tool execution request..." |
|
|
|
|
|
REASONING_DELAY = 0.01 |
|
|
|
|
|
OS = [ |
|
|
"Windows NT 10.0; Win64; x64", |
|
|
"Macintosh; Intel Mac OS X 10_15_7", |
|
|
"X11; Linux x86_64", |
|
|
"Windows NT 11.0; Win64; x64", |
|
|
"Macintosh; Intel Mac OS X 11_6_2" |
|
|
] |
|
|
|
|
|
OCTETS = [ |
|
|
1, 2, 3, 4, 5, 8, 12, 13, 14, 15, |
|
|
16, 17, 18, 19, 20, 23, 24, 34, 35, 36, |
|
|
37, 38, 39, 40, 41, 42, 43, 44, 45, 46, |
|
|
47, 48, 49, 50, 51, 52, 53, 54, 55, 56, |
|
|
57, 58, 59, 60, 61, 62, 63, 64, 65, 66, |
|
|
67, 68, 69, 70, 71, 72, 73, 74, 75, 76, |
|
|
77, 78, 79, 80, 81, 82, 83, 84, 85, 86, |
|
|
87, 88, 89, 90, 91, 92, 93, 94, 95, 96, |
|
|
97, 98, 99, 100, 101, 102, 103, 104, 105, 106, |
|
|
107, 108, 109, 110, 111, 112, 113, 114, 115, 116, |
|
|
117, 118, 119, 120, 121, 122, 123, 124, 125, 126, |
|
|
128, 129, 130, 131, 132, 133, 134, 135, 136, 137, |
|
|
138, 139, 140, 141, 142, 143, 144, 145, 146, 147, |
|
|
148, 149, 150, 151, 152, 153, 154, 155, 156, 157, |
|
|
158, 159, 160, 161, 162, 163, 164, 165, 166, 167, |
|
|
168, 170, 171, 172, 173, 174, 175, 176, 177, 178, |
|
|
179, 180, 181, 182, 183, 184, 185, 186, 187, 188, |
|
|
189, 190, 191, 192, 193, 194, 195, 196, 197, 198, |
|
|
199, 200, 201, 202, 203, 204, 205, 206, 207, 208, |
|
|
209, 210, 211, 212, 213, 214, 215, 216, 217, 218, |
|
|
219, 220, 221, 222, 223 |
|
|
] |
|
|
|
|
|
BROWSERS = [ |
|
|
"Chrome", |
|
|
"Firefox", |
|
|
"Safari", |
|
|
"Edge", |
|
|
"Opera" |
|
|
] |
|
|
|
|
|
CHROME_VERSIONS = [ |
|
|
"120.0.0.0", |
|
|
"119.0.0.0", |
|
|
"118.0.0.0", |
|
|
"117.0.0.0", |
|
|
"116.0.0.0" |
|
|
] |
|
|
|
|
|
FIREFOX_VERSIONS = [ |
|
|
"121.0", |
|
|
"120.0", |
|
|
"119.0", |
|
|
"118.0", |
|
|
"117.0" |
|
|
] |
|
|
|
|
|
SAFARI_VERSIONS = [ |
|
|
"17.1", |
|
|
"17.0", |
|
|
"16.6", |
|
|
"16.5", |
|
|
"16.4", |
|
|
] |
|
|
|
|
|
EDGE_VERSIONS = [ |
|
|
"120.0.2210.91", |
|
|
"119.0.2151.97", |
|
|
"118.0.2088.76", |
|
|
"117.0.2045.60", |
|
|
"116.0.1938.81" |
|
|
] |
|
|
|
|
|
DOMAINS = [ |
|
|
"google.com", |
|
|
"bing.com", |
|
|
"yahoo.com", |
|
|
"duckduckgo.com", |
|
|
"baidu.com", |
|
|
"yandex.com", |
|
|
"facebook.com", |
|
|
"twitter.com", |
|
|
"linkedin.com", |
|
|
"reddit.com", |
|
|
"youtube.com", |
|
|
"wikipedia.org", |
|
|
"amazon.com", |
|
|
"github.com", |
|
|
"stackoverflow.com", |
|
|
"medium.com", |
|
|
"quora.com", |
|
|
"pinterest.com", |
|
|
"instagram.com", |
|
|
"tumblr.com" |
|
|
] |
|
|
|
|
|
PROTOCOLS = [ |
|
|
"https://", |
|
|
"https://www." |
|
|
] |
|
|
|
|
|
SEARCH_ENGINES = [ |
|
|
"https://www.google.com/search?q=", |
|
|
"https://www.bing.com/search?q=", |
|
|
"https://search.yahoo.com/search?p=", |
|
|
"https://duckduckgo.com/?q=", |
|
|
"https://www.baidu.com/s?wd=", |
|
|
"https://yandex.com/search/?text=", |
|
|
"https://www.google.co.uk/search?q=", |
|
|
"https://www.google.ca/search?q=", |
|
|
"https://www.google.com.au/search?q=", |
|
|
"https://www.google.de/search?q=", |
|
|
"https://www.google.fr/search?q=", |
|
|
"https://www.google.co.jp/search?q=", |
|
|
"https://www.google.com.br/search?q=", |
|
|
"https://www.google.co.in/search?q=", |
|
|
"https://www.google.ru/search?q=", |
|
|
"https://www.google.it/search?q=" |
|
|
] |
|
|
|
|
|
KEYWORDS = [ |
|
|
"news", |
|
|
"weather", |
|
|
"sports", |
|
|
"technology", |
|
|
"science", |
|
|
"health", |
|
|
"finance", |
|
|
"entertainment", |
|
|
"travel", |
|
|
"food", |
|
|
"education", |
|
|
"business", |
|
|
"politics", |
|
|
"culture", |
|
|
"history", |
|
|
"music", |
|
|
"movies", |
|
|
"games", |
|
|
"books", |
|
|
"art" |
|
|
] |
|
|
|
|
|
COUNTRIES = [ |
|
|
"US", "GB", "CA", "AU", "DE", "FR", "JP", "BR", "IN", "RU", |
|
|
"IT", "ES", "MX", "NL", "SE", "NO", "DK", "FI", "PL", "TR", |
|
|
"KR", "SG", "HK", "TW", "TH", "ID", "MY", "PH", "VN", "AR", |
|
|
"CL", "CO", "PE", "VE", "EG", "ZA", "NG", "KE", "MA", "DZ", |
|
|
"TN", "IL", "AE", "SA", "QA", "KW", "BH", "OM", "JO", "LB" |
|
|
] |
|
|
|
|
|
LANGUAGES = [ |
|
|
"en-US", "en-GB", "en-CA", "en-AU", "de-DE", "fr-FR", "ja-JP", |
|
|
"pt-BR", "hi-IN", "ru-RU", "it-IT", "es-ES", "es-MX", "nl-NL", |
|
|
"sv-SE", "no-NO", "da-DK", "fi-FI", "pl-PL", "tr-TR", "ko-KR", |
|
|
"zh-CN", "zh-TW", "th-TH", "id-ID", "ms-MY", "fil-PH", "vi-VN", |
|
|
"es-AR", "es-CL", "es-CO", "es-PE", "es-VE", "ar-EG", "en-ZA", |
|
|
"en-NG", "sw-KE", "ar-MA", "ar-DZ", "ar-TN", "he-IL", "ar-AE", |
|
|
"ar-SA", "ar-QA", "ar-KW", "ar-BH", "ar-OM", "ar-JO", "ar-LB" |
|
|
] |
|
|
|
|
|
TIMEZONES = [ |
|
|
"America/New_York", |
|
|
"America/Chicago", |
|
|
"America/Los_Angeles", |
|
|
"America/Denver", |
|
|
"Europe/London", |
|
|
"Europe/Paris", |
|
|
"Europe/Berlin", |
|
|
"Europe/Moscow", |
|
|
"Asia/Tokyo", |
|
|
"Asia/Shanghai", |
|
|
"Asia/Hong_Kong", |
|
|
"Asia/Singapore", |
|
|
"Asia/Seoul", |
|
|
"Asia/Mumbai", |
|
|
"Asia/Dubai", |
|
|
"Australia/Sydney", |
|
|
"Australia/Melbourne", |
|
|
"America/Toronto", |
|
|
"America/Vancouver", |
|
|
"America/Mexico_City", |
|
|
"America/Sao_Paulo", |
|
|
"America/Buenos_Aires", |
|
|
"Africa/Cairo", |
|
|
"Africa/Johannesburg", |
|
|
"Africa/Lagos", |
|
|
"Africa/Nairobi", |
|
|
"Pacific/Auckland", |
|
|
"Pacific/Honolulu" |
|
|
] |
|
|
|
|
|
DESCRIPTION = """ |
|
|
<b>SearchGPT</b> is <b>ChatGPT</b> with real-time web search capabilities and the ability to read content directly from a URL. |
|
|
<br><br> |
|
|
This Space implements an agent-based system with <b><a href="https://www.gradio.app" target="_blank">Gradio</a></b>. It is integrated with |
|
|
<b><a href="https://docs.searxng.org" target="_blank">SearXNG</a></b>, which is then converted into a script tool or function for native execution. |
|
|
<br><br> |
|
|
The agent mode is inspired by the <b><a href="https://openwebui.com/t/hadad/deep_research" target="_blank">Deep Research</a></b> from |
|
|
<b><a href="https://docs.openwebui.com" target="_blank">OpenWebUI</a></b> tools script. |
|
|
<br><br> |
|
|
The <b>Deep Research</b> feature is also available on the primary Spaces of <b><a href="https://umint-openwebui.hf.space" |
|
|
target="_blank">UltimaX Intelligence</a></b>. |
|
|
<br><br> |
|
|
Please consider reading the <b><a href="https://huggingface.co/spaces/umint/ai/discussions/37#68b55209c51ca52ed299db4c" |
|
|
target="_blank">Terms of Use and Consequences of Violation</a></b> if you wish to proceed to the main Spaces. |
|
|
<br><br> |
|
|
<b>Like this project? Feel free to buy me a <a href="https://ko-fi.com/hadad" target="_blank">coffee</a></b>. |
|
|
""" |