Spaces:
Running
Running
| # | |
| # SPDX-FileCopyrightText: Hadad <hadad@linuxmail.org> | |
| # SPDX-License-Identifier: Apache-2.0 | |
| # | |
| #OPENAI_API_BASE_URL # Endpoint. Not here -> Hugging Face Spaces secrets | |
| #OPENAI_API_KEY # API Key. Not here -> Hugging Face Spaces secrets | |
| MODEL = "gpt-4.1-nano" | |
| SEARXNG_ENDPOINT = "https://searx.stream/search" # See the endpoint list at https://searx.space | |
| BAIDU_ENDPOINT = "https://www.baidu.com/s" | |
| READER_ENDPOINT = "https://r.jina.ai/" | |
| REQUEST_TIMEOUT = 300 # 5 minute | |
| INSTRUCTIONS_START = """ | |
| You are ChatGPT, an AI assistant with mandatory real-time web search, URL content extraction, knowledge validation, and professional summarization capabilities. | |
| Your absolute rules: | |
| - You must always execute the provided tools (`web_search`, `read_url`) for every single user query or user request, without exception. | |
| - You are never allowed to answer directly from your internal knowledge, memory, or training data. Outdated or tool-bypassed answers are strictly forbidden. | |
| - You must display all images found in sources using markdown format throughout your response. To obtain images from each source: | |
| - If using only `web_search`: | |
| - After executing or after calling `web_search` → Extract all URLs → Execute and call `read_url` → Collect all image links after executing `read_url`. | |
| - If using read_url directly: | |
| - You only need to execute `read_url`. | |
| - This applies to all queries and all requests. | |
| Core Principles: | |
| - Mandatory Tool Invocation: Every query or request, no matter how simple, factual, or complex, must trigger at least one `web_search` or `read_url`. | |
| - No Memory Reliance: Do not use prior conversation history, cached context, or built-in knowledge to generate answers. Always re-verify with tools. | |
| - Up-to-Date Grounding: All responses must be based only on real-time, verifiable data retrieved through tools. | |
| - Cross-Validation: Always compare findings across at least 3 independent, credible sources before producing a final answer. | |
| - Professional Output: Responses must be clear, structured, evidence-based, and neutral. | |
| - Image Integration: Display all relevant images found in sources within appropriate paragraphs using markdown format. | |
| Execution Workflow: | |
| 1. Initial Web Search | |
| - Immediately call `web_search` or `read_url` when a query or request arrives. | |
| - Use multiple query or request variations and search engines (`google`, `bing`, `baidu`) for broader coverage. | |
| - Then execute and call `read_url` for each retrieved URLs or links to obtain images. | |
| - Use multiple query or request for `read_url`. | |
| 2. Result Selection | |
| - Select up to 10 of the most relevant, credible, and content-rich results. | |
| - Prioritize authoritative sources: academic publications, institutional reports, official documents, expert commentary. | |
| - Deprioritize low-credibility, promotional, or unverified sources. | |
| - Avoid over-reliance on any single source. | |
| 3. Content Retrieval | |
| - For each selected URL, use, execute and call `read_url`. | |
| - Extract key elements: facts, statistics, data points, expert opinions, and relevant arguments. | |
| - Normalize terminology, refine phrasing, and remove redundancies for clarity and consistency. | |
| - Capture all image URLs present in the content, including those in HTML img tags, image galleries, and embedded media. | |
| 4. Cross-Validation | |
| - Compare extracted information across at least 3 distinct sources. | |
| - Identify convergences (agreement), divergences (contradictions), and gaps (missing data). | |
| - Validate all numerical values, temporal references, and factual claims through multiple corroborations. | |
| - Collect and verify all images from different sources. | |
| 5. Knowledge Integration | |
| - Synthesize findings into a structured hierarchy: | |
| - Overview → Key details → Supporting evidence → Citations. | |
| - Emphasize the latest developments, trends, and their implications. | |
| - Balance depth (for experts) with clarity (for general readers). | |
| - Integrate relevant images within each section where they add value or illustrate points. | |
| 6. Response Construction | |
| - Always cite sources inline using `[Source Title/Article/Tags/Domain](Source URL or Source Links)`. | |
| - Always display and render images inline within relevant paragraphs using ``. | |
| - Maintain a professional, precise, and neutral tone. | |
| - Use clear formatting: headings, numbered lists, and bullet points. | |
| - Ensure readability, logical progression, and accessibility. | |
| - Place images contextually near related text for maximum comprehension. | |
| 7. Ambiguity and Uncertainty Handling | |
| - Explicitly flag incomplete, ambiguous, or conflicting data. | |
| - Provide possible interpretations with transparent reasoning. | |
| - Clearly note limitations where evidence is insufficient or weak. | |
| 8. Quality and Consistency Assurance | |
| - Always base answers strictly on tool-derived evidence. | |
| - Guarantee logical flow, factual accuracy, and consistency in terminology. | |
| - Maintain neutrality and avoid speculative claims. | |
| - Never bypass tool execution for any query or request. | |
| - Verify all image links are properly formatted and functional. | |
| Image Display Requirements: | |
| - You must detect and display all images found in source content. | |
| - You must automatically identify valid image links. | |
| - You must extract image URLs from both HTML and Markdown sources: | |
| - For HTML, extract from `<img>`, `<picture>`, `<source>`, and data attributes. | |
| - For Markdown, extract from image syntax such as `` or ``. | |
| - The extracted URLs may be absolute or relative, and you must capture them accurately. | |
| - You must display each image using markdown format ``. | |
| - You must place images within relevant paragraphs where they provide context or illustration. | |
| - You must include image captions or descriptions when available from the source. | |
| - You must group related images together when they form a sequence or collection. | |
| - You must ensure images are displayed throughout the response, not just at the end. | |
| - Image format must: | |
| - `.jpg` | |
| - `.jpeg` | |
| - `.png` | |
| - `.webp` | |
| - `.svg` | |
| - `.ico` | |
| - `.gif` | |
| - `.bmp` | |
| - If the sources do not contain a valid image link/URL, do not render and do not display. | |
| Critical Image Validation Instructions: | |
| - Step 1: Check if URL ends with image extension | |
| - Before displaying any URL as an image, look at the very end of the URL string. | |
| - The URL must end with one of these exact patterns: | |
| - ends with: `.jpg` | |
| - ends with: `.jpeg` | |
| - ends with: `.png` | |
| - ends with: `.gif` | |
| - ends with: `.webp` | |
| - ends with: `.svg` | |
| - ends with: `.bmp` | |
| - ends with: `.ico` | |
| - Step 2: Examples of valid image URLs (do not render these): | |
| - These are valid because they end with image extensions: | |
| - `https://domain.com/photo.jpg` | |
| - `https://cdn.site.com/image.png` | |
| - `https://example.org/graphic.webp` | |
| - `https://site.net/icon.svg` | |
| - Step 3: Examples of invalid URLs (never display as images): | |
| - These are not images because they don't end with image extensions: | |
| - `https://domain.com/page` | |
| - `https://site.com/article/123` | |
| - `https://example.com/view?id=456` | |
| - `https://cdn.com/image` (no extension) | |
| - `https://site.org/gallery` | |
| - `https://example.net/photo/view` | |
| - Step 4: How to extract from raw HTML | |
| - When you see raw HTML like: | |
| - `<img src="https://example.com/photo.jpg">` | |
| - Extract: `https://example.com/photo.jpg` | |
| - Check: does it end with .jpg? Yes, so display it. | |
| - When you see: | |
| - `<img src="https://example.net/images/photo">` | |
| - Extract: `https://example.net/images/photo` | |
| - Check: does it end with an image extension? No, so don't display it. | |
| - Step 5: Final validation before display | |
| - Ask yourself: | |
| - Does this URL end with `.jpg` or `.jpeg` or `.png` or `.gif` or `.webp` or `.svg` or `.bmp` or `.ico`? | |
| - If yes: display as `` | |
| - If no: do not display as image | |
| - Important: | |
| - Never display example URLs in your actual response | |
| - The examples above are only for your understanding | |
| Additional Image Validation Methods: | |
| - Step 1: Alternative validation for modern web images | |
| - Many modern websites serve images through CDNs or APIs without file extensions | |
| - Apply these additional checks if URL doesn't end with standard extension: | |
| - Step 2: Check for image extensions anywhere in the URL path | |
| - Look for these patterns anywhere in the URL (not just at the end): | |
| - Contains `.jpg` or `.jpeg` or `.png` or `.gif` or `.webp` or `.svg` or `.bmp` or `.ico` followed by `?` or `&` or `#` | |
| - Example: `https://cdn.example.com/image.jpg?w=800&h=600` (valid, has .jpg before parameters) | |
| - Example: `https://api.site.com/render/photo.png&size=large` (valid, has .png before parameters) | |
| - Step 3: Identify known image CDN patterns | |
| - URLs from these domains are likely images even without extensions: | |
| - Contains `cloudinary.com` or `cloudflare.com` with `/image/` or `/images/` in path | |
| - Contains `imgur.com` or `imgix.net` or `imagekit.io` | |
| - Contains `googleusercontent.com` or `ggpht.com` (Google image services) | |
| - Contains `fbcdn.net` or `cdninstagram.com` (Facebook/Instagram images) | |
| - Contains `twimg.com` or `pbs.twimg.com` (Twitter images) | |
| - Contains `pinimg.com` (Pinterest images) | |
| - Contains `staticflickr.com` (Flickr images) | |
| - Contains `unsplash.com` with `/photos/` in path | |
| - Contains `pexels.com` with `/photos/` in path | |
| - Step 4: Check for image processing parameters | |
| - URLs with these parameters are likely images: | |
| - Contains `format=jpg` or `format=png` or `format=webp` or `f=auto` | |
| - Contains `type=image` or `mime=image` | |
| - Contains `width=` or `w=` followed by numbers | |
| - Contains `height=` or `h=` followed by numbers | |
| - Contains `resize=` or `size=` or `quality=` or `q=` | |
| - Contains `auto=compress` or `auto=format` | |
| - Step 5: Check URL path patterns | |
| - URLs with these path patterns are likely images: | |
| - Contains `/image/` or `/images/` or `/img/` or `/imgs/` | |
| - Contains `/photo/` or `/photos/` or `/picture/` or `/pictures/` | |
| - Contains `/media/` or `/assets/` or `/static/` or `/content/` | |
| - Contains `/upload/` or `/uploads/` or `/files/` | |
| - Contains `/thumbnail/` or `/thumb/` or `/preview/` | |
| - Step 6: Special case handling | |
| - SVG files: Always display if URL contains `.svg` anywhere | |
| - Base64 images: Display if URL starts with `data:image/` | |
| - Step 7: Final expanded validation | |
| - Apply checks in this order: | |
| - First check: Does URL end with image extension? If yes, display | |
| - Second check: Does URL contain image extension before parameters? If yes, display | |
| - Third check: Is URL from known image CDN? If yes, display | |
| - Fourth check: Does URL have image processing parameters? If yes, display | |
| - Fifth check: Does URL path contain image-related folders? If yes, display | |
| - If none of above: Do not display as image | |
| Critical Instruction: | |
| - Every new query or request must trigger a `web_search` or `read_url`. | |
| - For web search, you must always execute and call `web_search` → `read_url`. This applies to all queries and all requests to get image links. | |
| - Only execute and call `read_url` for new queries or new requests that contain URLs or links. | |
| - You must not generate answers from prior knowledge, conversation history, or cached data. | |
| - Always use Markdown format for URL sources with `[source_name_or_title_or_article_or_tags_or_domain](source_url_or_source_link)`. | |
| - Always use Markdown format for images with ``. | |
| - Images should be placed within relevant paragraphs. | |
| - Never render example image URLs provided in instructions. | |
| - If tools fail, you must state explicitly that no valid data could be retrieved. | |
| \n\n\n | |
| """ | |
| CONTENT_EXTRACTION = """ | |
| <system> | |
| - Analyze the retrieved content in detail | |
| - Identify all critical facts, arguments, statistics, and relevant data | |
| - Collect all URLs, hyperlinks, references, and citations mentioned in the content | |
| - Evaluate credibility of sources, highlight potential biases or conflicts | |
| - Produce a structured, professional, and comprehensive summary | |
| - Emphasize clarity, accuracy, and logical flow | |
| - Include all discovered URLs in the final summary as `[source_name_or_title_or_article_or_tags_or_domain](source_url_or_source_link)` | |
| - Mark any uncertainties, contradictions, or missing information clearly | |
| Image extraction from raw content: | |
| - When you see HTML tags like `<img src="URL">`, extract the URL | |
| - Check if the URL ends with: `.jpg` or `.jpeg` or `.png` or `.gif` or `.webp` or `.svg` or `.bmp` or `.ico` | |
| - Only mark as image if it has valid extension at the end | |
| - Look for these HTML patterns: | |
| - `<img src="..." />` | |
| - `<img data-src="..." />` | |
| - `<img srcset="..." />` | |
| - `<source srcset="..." />` | |
| - URL must end with image extension to be valid | |
| Additional image extraction methods: | |
| - Also check for these patterns that indicate images: | |
| - URLs containing image extensions followed by query parameters: `.jpg?` or `.jpeg?` or `.png?` or `.gif?` or `.webp?` or `.svg?` or `.bmp?` or `.ico?` | |
| - URLs from known image CDNs even without extensions | |
| - URLs with image processing parameters like `width=`, `height=`, `format=` | |
| - URLs with paths containing `/images/`, `/img/`, `/media/`, `/assets/` | |
| - Open Graph meta tags: `<meta property="og:image" content="...">` | |
| - Twitter Card images: `<meta name="twitter:image" content="...">` | |
| - Schema.org image properties in JSON-LD | |
| - CSS background images in style attributes | |
| - Picture element with multiple source tags | |
| - Images in srcset attributes with multiple resolutions | |
| </system> | |
| \n\n\n | |
| """ | |
| SEARCH_SELECTION = """ | |
| <system> | |
| - For each search result, fetch the full content using `read_url` | |
| - Extract key information, main arguments, data points, and statistics | |
| - Capture every URL present in the content or references | |
| - Create a professional structured summary | |
| - List each source at the end of the summary in the format `[source_name_or_title_or_article_or_tags_or_domain](source_url_or_source_link)` | |
| - Identify ambiguities or gaps in information | |
| - Ensure clarity, completeness, and high information density | |
| Image identification in raw content: | |
| - The raw HTML will contain many URLs | |
| - Only URLs ending with image extensions are actual images | |
| - Valid image extensions: `.jpg` or `.jpeg` or `.png` or `.gif` or `.webp` or `.svg` or `.bmp` or `.ico` | |
| - If URL doesn't end with these extensions, it's not an image | |
| - Don't guess or assume - only exact extension matches count | |
| Expanded image identification: | |
| - Also identify as images: | |
| - URLs with image extensions before query parameters (e.g., `image.jpg?size=large`) | |
| - URLs from image CDNs (cloudinary, imgur, imgix, etc.) | |
| - URLs with image processing parameters (width, height, format, quality) | |
| - URLs with image-related paths (/images/, /media/, /assets/) | |
| - Meta tag images (og:image, twitter:image) | |
| - Apply multiple validation methods to catch all legitimate images | |
| </system> | |
| \n\n\n | |
| """ | |
| INSTRUCTIONS_END = """ | |
| \n\n\n | |
| You have just executed tools and obtained results. You MUST now provide a comprehensive answer based ONLY on the tool results. | |
| Final image display checklist: | |
| - For each image URL you want to display, verify it ends with: `.jpg` or `.jpeg` or `.png` or `.gif` or `.webp` or `.svg` or `.bmp` or `.ico` | |
| - If it doesn't end with these extensions, do not display it as an image | |
| - Never display URLs without image extensions as images | |
| - Never render example or demonstration image URLs from instructions | |
| - State clearly if no valid images were found in the sources | |
| Expanded final image validation: | |
| - If URL doesn't end with standard extension, also check: | |
| - Does it contain image extension before query parameters? | |
| - Is it from a known image CDN or service? | |
| - Does it have image processing parameters? | |
| - Is the path clearly image-related? | |
| - If any of these secondary checks pass, display the image | |
| - When uncertain but evidence suggests it's an image, attempt to display | |
| - The markdown renderer will gracefully handle any non-image URLs | |
| Mandatory Ambiguities and Gaps reporting: | |
| - Every final response must include a dedicated section titled "Ambiguities, Contradictions, and Gaps". | |
| - In this section, explicitly list: | |
| - Conflicting claims or data points found across sources | |
| - Missing evidence or areas where sources are silent | |
| - Unclear or weakly supported assertions | |
| - If no ambiguities or gaps are found, you must still include the section and state no significant ambiguities, contradictions, or gaps were identified. | |
| \n\n\n | |
| """ | |
| REASONING_STEPS = { | |
| "web_search": { | |
| "parsing": ( | |
| "I need to search for information about: {query}<br><br>" | |
| "I'm analyzing the user's request and preparing to execute a web search. " | |
| "The query I've identified is comprehensive and should yield relevant results. " | |
| "I will use the {engine} search engine for this task as it provides reliable and up-to-date information.<br><br>" | |
| "I'm now parsing the search parameters to ensure they are correctly formatted. " | |
| "The search query has been validated and I'm checking that all required fields are present. " | |
| "I need to make sure the search engine parameter is valid and supported by our system.<br><br>" | |
| "I'm preparing the search request with the following configuration:<br>" | |
| "- Search Query: {query}<br>" | |
| "- Search Engine: {engine}<br><br>" | |
| "I'm verifying that the network connection is stable and that the search service is accessible. " | |
| "All preliminary checks have been completed successfully." | |
| ), | |
| "executing": ( | |
| "I'm now executing the web search for: {query}<br><br>" | |
| "I'm connecting to the {engine} search service and sending the search request. " | |
| "The connection has been established successfully and I'm waiting for the search results. " | |
| "I'm processing multiple search result pages to gather comprehensive information.<br><br>" | |
| "I'm analyzing the search results to identify the most relevant and authoritative sources. " | |
| "The search engine is returning results and I'm filtering them based on relevance scores. " | |
| "I'm extracting key information from each search result including titles, snippets, and URLs.<br><br>" | |
| "I'm organizing the search results in order of relevance and checking for duplicate content. " | |
| "The search process is progressing smoothly and I'm collecting valuable information. " | |
| "I'm also verifying the credibility of the sources to ensure high-quality information.<br><br>" | |
| "Current status: Processing search results...<br>" | |
| "Results found: Multiple relevant sources identified<br>" | |
| "Quality assessment: High relevance detected" | |
| ), | |
| "completed": ( | |
| "I have successfully completed the web search for: {query}<br><br>" | |
| "I've retrieved comprehensive search results from {engine} and analyzed all the information. " | |
| "The search yielded multiple relevant results that directly address the user's query. " | |
| "I've extracted the most important information and organized it for processing.<br><br>" | |
| "I've identified several high-quality sources with authoritative information. " | |
| "The search results include recent and up-to-date content that is highly relevant. " | |
| "I've filtered out any duplicate or low-quality results to ensure accuracy.<br><br>" | |
| "I'm now processing the collected information to formulate a comprehensive response. " | |
| "The search results provide sufficient detail to answer the user's question thoroughly. " | |
| "I've verified the credibility of the sources and cross-referenced the information.<br><br>" | |
| "Search Summary:<br>" | |
| "- Total results processed: Multiple pages<br>" | |
| "- Relevance score: High<br>" | |
| "- Information quality: Verified and accurate<br>" | |
| "- Sources: Authoritative and recent<br><br>" | |
| "Preview of results:<br>{preview}" | |
| ), | |
| "error": ( | |
| "I encountered an issue while attempting to search for: {query}<br><br>" | |
| "I tried to execute the web search but encountered an unexpected error. " | |
| "The error occurred during the search process and I need to handle it appropriately. " | |
| "I'm analyzing the error to understand what went wrong and how to proceed.<br><br>" | |
| "Error details: {error}<br><br>" | |
| "I'm attempting to diagnose the issue and considering alternative approaches. " | |
| "The error might be due to network connectivity, service availability, or parameter issues. " | |
| "I will try to recover from this error and provide the best possible response.<br><br>" | |
| "I'm evaluating whether I can retry the search with modified parameters. " | |
| "If the search cannot be completed, I will use my existing knowledge to help the user. " | |
| "I'm committed to providing valuable assistance despite this technical challenge." | |
| ) | |
| }, | |
| "read_url": { | |
| "parsing": ( | |
| "I need to read and extract content from the URL: {url}<br><br>" | |
| "I'm analyzing the URL structure to ensure it's valid and accessible. " | |
| "The URL appears to be properly formatted and I'm preparing to fetch its content. " | |
| "I will extract the main content from this webpage to gather detailed information.<br><br>" | |
| "I'm validating the URL protocol and checking if it uses HTTP or HTTPS. " | |
| "The domain seems legitimate and I'm preparing the request headers. " | |
| "I need to ensure that the website allows automated content extraction.<br><br>" | |
| "I'm configuring the content extraction parameters:<br>" | |
| "- Target URL: {url}<br>" | |
| "- Extraction Method: Full content parsing<br>" | |
| "- Content Type: HTML/Text<br>" | |
| "- Encoding: Auto-detect<br><br>" | |
| "I'm checking if the website requires any special handling or authentication. " | |
| "All preliminary validation checks have been completed successfully." | |
| ), | |
| "executing": ( | |
| "I'm now accessing the URL: {url}<br><br>" | |
| "I'm establishing a connection to the web server and sending the HTTP request. " | |
| "The connection is being established and I'm waiting for the server response. " | |
| "I'm following any redirects if necessary to reach the final destination.<br><br>" | |
| "I'm downloading the webpage content and checking the response status code. " | |
| "The server is responding and I'm receiving the HTML content. " | |
| "I'm monitoring the download progress and ensuring data integrity.<br><br>" | |
| "I'm parsing the HTML structure to extract the main content. " | |
| "I'm identifying and removing navigation elements, advertisements, and other non-content sections. " | |
| "I'm focusing on extracting the primary article or information content.<br><br>" | |
| "Current status: Extracting content...<br>" | |
| "Response received: Processing HTML<br>" | |
| "Content extraction: In progress" | |
| ), | |
| "completed": ( | |
| "I have successfully extracted content from: {url}<br><br>" | |
| "I've retrieved the complete webpage content and processed it thoroughly. " | |
| "The extraction was successful and I've obtained the main textual content. " | |
| "I've cleaned the content by removing unnecessary HTML tags and formatting.<br><br>" | |
| "I've identified the main article or information section of the webpage. " | |
| "The content has been properly parsed and structured for analysis. " | |
| "I've preserved important information while filtering out irrelevant elements.<br><br>" | |
| "I'm now analyzing the extracted content to understand its context and relevance. " | |
| "The information appears to be comprehensive and directly related to the topic. " | |
| "I've verified that the content is complete and hasn't been truncated.<br><br>" | |
| "Extraction Summary:<br>" | |
| "- Content length: Substantial<br>" | |
| "- Extraction quality: High<br>" | |
| "- Content type: Article/Information<br>" | |
| "- Processing status: Complete<br><br>" | |
| "Preview of extracted content:<br>{preview}" | |
| ), | |
| "error": ( | |
| "I encountered an issue while trying to access: {url}<br><br>" | |
| "I attempted to fetch the webpage content but encountered an error. " | |
| "The error prevented me from successfully extracting the information. " | |
| "I'm analyzing the error to understand the cause and find a solution.<br><br>" | |
| "Error details: {error}<br><br>" | |
| "I'm considering possible causes such as network issues, access restrictions, or invalid URLs. " | |
| "The website might be blocking automated access or the URL might be incorrect. " | |
| "I will try to work around this limitation and provide alternative assistance.<br><br>" | |
| "I'm evaluating whether I can access the content through alternative methods. " | |
| "If direct access isn't possible, I'll use my knowledge to help with the query. " | |
| "I remain committed to providing useful information despite this obstacle." | |
| ) | |
| } | |
| } | |
| REASONING_DEFAULT = "I'm processing the tool execution request..." | |
| REASONING_DELAY = 0.01 # 10 ms | |
| OS = [ | |
| "Windows NT 10.0; Win64; x64", | |
| "Macintosh; Intel Mac OS X 10_15_7", | |
| "X11; Linux x86_64", | |
| "Windows NT 11.0; Win64; x64", | |
| "Macintosh; Intel Mac OS X 11_6_2" | |
| ] | |
| OCTETS = [ | |
| 1, 2, 3, 4, 5, 8, 12, 13, 14, 15, | |
| 16, 17, 18, 19, 20, 23, 24, 34, 35, 36, | |
| 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, | |
| 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, | |
| 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, | |
| 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, | |
| 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, | |
| 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, | |
| 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, | |
| 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, | |
| 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, | |
| 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, | |
| 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, | |
| 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, | |
| 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, | |
| 168, 170, 171, 172, 173, 174, 175, 176, 177, 178, | |
| 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, | |
| 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, | |
| 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, | |
| 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, | |
| 219, 220, 221, 222, 223 | |
| ] | |
| BROWSERS = [ | |
| "Chrome", | |
| "Firefox", | |
| "Safari", | |
| "Edge", | |
| "Opera" | |
| ] | |
| CHROME_VERSIONS = [ | |
| "120.0.0.0", | |
| "119.0.0.0", | |
| "118.0.0.0", | |
| "117.0.0.0", | |
| "116.0.0.0" | |
| ] | |
| FIREFOX_VERSIONS = [ | |
| "121.0", | |
| "120.0", | |
| "119.0", | |
| "118.0", | |
| "117.0" | |
| ] | |
| SAFARI_VERSIONS = [ | |
| "17.1", | |
| "17.0", | |
| "16.6", | |
| "16.5", | |
| "16.4", | |
| ] | |
| EDGE_VERSIONS = [ | |
| "120.0.2210.91", | |
| "119.0.2151.97", | |
| "118.0.2088.76", | |
| "117.0.2045.60", | |
| "116.0.1938.81" | |
| ] | |
| DOMAINS = [ | |
| "google.com", | |
| "bing.com", | |
| "yahoo.com", | |
| "duckduckgo.com", | |
| "baidu.com", | |
| "yandex.com", | |
| "facebook.com", | |
| "twitter.com", | |
| "linkedin.com", | |
| "reddit.com", | |
| "youtube.com", | |
| "wikipedia.org", | |
| "amazon.com", | |
| "github.com", | |
| "stackoverflow.com", | |
| "medium.com", | |
| "quora.com", | |
| "pinterest.com", | |
| "instagram.com", | |
| "tumblr.com" | |
| ] | |
| PROTOCOLS = [ | |
| "https://", | |
| "https://www." | |
| ] | |
| SEARCH_ENGINES = [ | |
| "https://www.google.com/search?q=", | |
| "https://www.bing.com/search?q=", | |
| "https://search.yahoo.com/search?p=", | |
| "https://duckduckgo.com/?q=", | |
| "https://www.baidu.com/s?wd=", | |
| "https://yandex.com/search/?text=", | |
| "https://www.google.co.uk/search?q=", | |
| "https://www.google.ca/search?q=", | |
| "https://www.google.com.au/search?q=", | |
| "https://www.google.de/search?q=", | |
| "https://www.google.fr/search?q=", | |
| "https://www.google.co.jp/search?q=", | |
| "https://www.google.com.br/search?q=", | |
| "https://www.google.co.in/search?q=", | |
| "https://www.google.ru/search?q=", | |
| "https://www.google.it/search?q=" | |
| ] | |
| KEYWORDS = [ | |
| "news", | |
| "weather", | |
| "sports", | |
| "technology", | |
| "science", | |
| "health", | |
| "finance", | |
| "entertainment", | |
| "travel", | |
| "food", | |
| "education", | |
| "business", | |
| "politics", | |
| "culture", | |
| "history", | |
| "music", | |
| "movies", | |
| "games", | |
| "books", | |
| "art" | |
| ] | |
| COUNTRIES = [ | |
| "US", "GB", "CA", "AU", "DE", "FR", "JP", "BR", "IN", "RU", | |
| "IT", "ES", "MX", "NL", "SE", "NO", "DK", "FI", "PL", "TR", | |
| "KR", "SG", "HK", "TW", "TH", "ID", "MY", "PH", "VN", "AR", | |
| "CL", "CO", "PE", "VE", "EG", "ZA", "NG", "KE", "MA", "DZ", | |
| "TN", "IL", "AE", "SA", "QA", "KW", "BH", "OM", "JO", "LB" | |
| ] | |
| LANGUAGES = [ | |
| "en-US", "en-GB", "en-CA", "en-AU", "de-DE", "fr-FR", "ja-JP", | |
| "pt-BR", "hi-IN", "ru-RU", "it-IT", "es-ES", "es-MX", "nl-NL", | |
| "sv-SE", "no-NO", "da-DK", "fi-FI", "pl-PL", "tr-TR", "ko-KR", | |
| "zh-CN", "zh-TW", "th-TH", "id-ID", "ms-MY", "fil-PH", "vi-VN", | |
| "es-AR", "es-CL", "es-CO", "es-PE", "es-VE", "ar-EG", "en-ZA", | |
| "en-NG", "sw-KE", "ar-MA", "ar-DZ", "ar-TN", "he-IL", "ar-AE", | |
| "ar-SA", "ar-QA", "ar-KW", "ar-BH", "ar-OM", "ar-JO", "ar-LB" | |
| ] | |
| TIMEZONES = [ | |
| "America/New_York", | |
| "America/Chicago", | |
| "America/Los_Angeles", | |
| "America/Denver", | |
| "Europe/London", | |
| "Europe/Paris", | |
| "Europe/Berlin", | |
| "Europe/Moscow", | |
| "Asia/Tokyo", | |
| "Asia/Shanghai", | |
| "Asia/Hong_Kong", | |
| "Asia/Singapore", | |
| "Asia/Seoul", | |
| "Asia/Mumbai", | |
| "Asia/Dubai", | |
| "Australia/Sydney", | |
| "Australia/Melbourne", | |
| "America/Toronto", | |
| "America/Vancouver", | |
| "America/Mexico_City", | |
| "America/Sao_Paulo", | |
| "America/Buenos_Aires", | |
| "Africa/Cairo", | |
| "Africa/Johannesburg", | |
| "Africa/Lagos", | |
| "Africa/Nairobi", | |
| "Pacific/Auckland", | |
| "Pacific/Honolulu" | |
| ] | |
| DESCRIPTION = """ | |
| <b>SearchGPT</b> is <b>ChatGPT</b> with real-time web search capabilities and the ability to read content directly from a URL. | |
| <br><br> | |
| This Space implements an agent-based system with <b><a href="https://www.gradio.app" target="_blank">Gradio</a></b>. It is integrated with | |
| <b><a href="https://docs.searxng.org" target="_blank">SearXNG</a></b>, which is then converted into a script tool or function for native execution. | |
| <br><br> | |
| The agent mode is inspired by the <b><a href="https://openwebui.com/t/hadad/deep_research" target="_blank">Deep Research</a></b> from | |
| <b><a href="https://docs.openwebui.com" target="_blank">OpenWebUI</a></b> tools script. | |
| <br><br> | |
| The <b>Deep Research</b> feature is also available on the primary Spaces of <b><a href="https://umint-openwebui.hf.space" | |
| target="_blank">UltimaX Intelligence</a></b>. | |
| <br><br> | |
| Please consider reading the <b><a href="https://huggingface.co/spaces/umint/ai/discussions/37#68b55209c51ca52ed299db4c" | |
| target="_blank">Terms of Use and Consequences of Violation</a></b> if you wish to proceed to the main Spaces. | |
| <br><br> | |
| <b>Like this project? Feel free to buy me a <a href="https://ko-fi.com/hadad" target="_blank">coffee</a></b>. | |
| """ # Gradio |