Spaces:

linktimecloud
/

ask.py

Sleeping

App Files Files Community

linktimecloud commited on Jun 17, 2025

Commit

2c5299d

verified ·

1 Parent(s): 25bffe0

Upload folder using huggingface_hub

Browse files

Files changed (19) hide show

.gitattributes +1 -0
.github/workflows/update_space.yml +28 -0
.gitignore +3 -1
README.md +200 -134
ask.py +763 -231
data/README.pdf +3 -0
demos/local_files.md +48 -0
demos/run_with_ollama.md +82 -0
demos/search_and_answer.md +71 -0
demos/search_and_extract.md +291 -0
demos/search_on_site_and_date.md +85 -0
env.deepseek.tpl +8 -0
env.ollama.tpl +6 -0
env.tpl +16 -0
instructions/extract_example.txt +3 -0
instructions/links.txt +2 -1
requirements.txt +6 -4
scripts/draw_flow.py +68 -0
svc.leettools.com +0 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+data/README.pdf filter=lfs diff=lfs merge=lfs -text

.github/workflows/update_space.yml ADDED Viewed

	@@ -0,0 +1,28 @@

+name: Run Python script
+on:
+  push:
+    branches:
+      - main
+jobs:
+  build:
+    runs-on: ubuntu-latest
+    steps:
+    - name: Checkout
+      uses: actions/checkout@v2
+    - name: Set up Python
+      uses: actions/setup-python@v2
+      with:
+        python-version: '3.9'
+    - name: Install Gradio
+      run: python -m pip install gradio
+    - name: Log in to Hugging Face
+      run: python -c 'import huggingface_hub; huggingface_hub.login(token="${{ secrets.hf_token }}")'
+    - name: Deploy to Spaces
+      run: gradio deploy

.gitignore CHANGED Viewed

@@ -161,4 +161,6 @@ cython_debug/
 #  option (not recommended) you can uncomment the following to ignore the entire idea folder.
 #.idea/
-.gradio

 #  option (not recommended) you can uncomment the following to ignore the entire idea folder.
 #.idea/
+.gradio
+.DS_Store
+.env*

README.md CHANGED Viewed

@@ -4,34 +4,88 @@ app_file: ask.py
 sdk: gradio
 sdk_version: 5.3.0
 ---
-# ask.py
 [![License](https://img.shields.io/github/license/pengfeng/ask.py)](LICENSE)
-A single Python program to implement the search-extract-summarize flow, similar to AI search
-engines such as Perplexity.
 > [UPDATE]
->
 > - 2024-10-22: add GradIO integation
 > - 2024-10-21: use DuckDB for the vector search and use API for embedding
 > - 2024-10-20: allow to specify a list of input urls
 > - 2024-10-18: output-language and output-length parameters for LLM
 > - 2024-10-18: date-restrict and target-site parameters for seach
 > [!NOTE]
-> Our main goal is to illustrate the basic concepts of AI search engines with the raw constructs.
-> Performance or scalability is not in the scope of this program.
 ## The search-extract-summarize flow
 Given a query, the program will
-- search Google for the top 10 web pages
-- crawl and scape the pages for their text content
 - chunk the text content into chunks and save them into a vectordb
-- perform a vector search with the query and find the top 10 matched chunks
-- use the top 10 chunks as the context to ask an LLM to generate the answer
 - output the answer with the references
 Of course this flow is a very simplified version of the real AI search engines, but it is a good
@@ -47,33 +101,47 @@ For example, we can:
 - ask LLM to answer with a specific length.
 - crawl a specific list of urls and answer based on those contents only.
-## Quick start
-```bash
-pip install -r requirements.txt
 # modify .env file to set the API keys or export them as environment variables as below
-# right now we use Google search API
-export SEARCH_API_KEY="your-google-search-api-key"
-export SEARCH_PROJECT_KEY="your-google-cx-key"
-# right now we use OpenAI API
-export LLM_API_KEY="your-openai-api-key"
-# run the program
-python ask.py -q "What is an LLM agent?"
 # we can specify more parameters to control the behavior such as date_restrict and target_site
-python ask.py --help
 Usage: ask.py [OPTIONS]
-  Search web for the query and summarize the results
 Options:
-  --web-ui                        Launch the web interface
   -q, --query TEXT                Query to search
   -d, --date-restrict INTEGER     Restrict search results to a specific date
                                   range, default is no restriction
   -s, --target-site TEXT          Restrict search results to a specific site,
@@ -83,130 +151,128 @@ Options:
   --url-list-file TEXT            Instead of doing web search, scrape the
                                   target URL list and answer the query based
                                   on the content
-  -m, --model-name TEXT           Model name to use for inference
   -l, --log-level [DEBUG|INFO|WARNING|ERROR]
                                   Set the logging level  [default: INFO]
   --help                          Show this message and exit.
 ```
-## Libraries and APIs used
-- [Google Search API](https://developers.google.com/custom-search/v1/overview)
-- [OpenAI API](https://beta.openai.com/docs/api-reference/completions/create)
-- [Jinja2](https://jinja.palletsprojects.com/en/3.0.x/)
-- [bs4](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)
-- [DuckDB](https://github.com/duckdb/duckdb)
-- [GradIO](https://grad.io)
-## Screenshot for the GradIO integration
-![image](https://github.com/user-attachments/assets/0483e6a2-75d7-4fbd-813f-bfa13839c836)
-## Sample output
-### General Search
 ```
-% python ask.py -q "Why do we need agentic RAG even if we have ChatGPT?"
-✅ Found 10 links for query: Why do we need agentic RAG even if we have ChatGPT?
-✅ Scraping the URLs ...
-✅ Scraped 10 URLs ...
-✅ Chunking the text ...
-✅ Saving to vector DB ...
-✅ Querying the vector DB ...
-✅ Running inference with context ...
-# Answer
-Agentic RAG (Retrieval-Augmented Generation) is needed alongside ChatGPT for several reasons:
-1. **Precision and Contextual Relevance**: While ChatGPT offers generative responses, it may not
-reliably provide precise answers, especially when specific, accurate information is critical[5].
-Agentic RAG enhances this by integrating retrieval mechanisms that improve response context and
-accuracy, allowing users to access the most relevant and recent data without the need for costly
-model fine-tuning[2].
-2. **Customizability**: RAG allows businesses to create tailored chatbots that can securely
-reference company-specific data[2]. In contrast, ChatGPT’s broader capabilities may not be
-directly suited for specialized, domain-specific questions without comprehensive customization[3].
-3. **Complex Query Handling**: RAG can be optimized for complex queries and can be adjusted to
-work better with specific types of inputs, such as comparing and contrasting information, a task
-where ChatGPT may struggle under certain circumstances[9]. This level of customization can lead to
-better performance in niche applications where precise retrieval of information is crucial.
-4. **Asynchronous Processing Capabilities**: Future agentic systems aim to integrate asynchronous
-handling of actions, allowing for parallel processing and reducing wait times for retrieval and
-computation, which is a limitation in the current form of ChatGPT[7]. This advancement would enhance
-overall efficiency and responsiveness in conversations.
-5. **Incorporating Retrieved Information Effectively**: Using RAG can significantly improve how
-retrieved information is utilized within a conversation. By effectively managing the context and
-relevance of retrieved documents, RAG helps in framing prompts that can guide ChatGPT towards
-delivering more accurate responses[10].
-In summary, while ChatGPT excels in generating conversational responses, agentic RAG brings
-precision, customization, and efficiency that can significantly enhance the overall conversational
-AI experience.
-# References
-[1] https://community.openai.com/t/how-to-use-rag-properly-and-what-types-of-query-it-is-good-at/658204
-[2] https://www.linkedin.com/posts/brianjuliusdc_dax-powerbi-chatgpt-activity-7235953280177041408-wQqq
-[3] https://community.openai.com/t/how-to-use-rag-properly-and-what-types-of-query-it-is-good-at/658204
-[4] https://community.openai.com/t/prompt-engineering-for-rag/621495
-[5] https://www.ben-evans.com/benedictevans/2024/6/8/building-ai-products
-[6] https://community.openai.com/t/prompt-engineering-for-rag/621495
-[7] https://www.linkedin.com/posts/kurtcagle_agentic-rag-personalizing-and-optimizing-activity-7198097129993613312-z7Sm
-[8] https://community.openai.com/t/how-to-use-rag-properly-and-what-types-of-query-it-is-good-at/658204
-[9] https://community.openai.com/t/how-to-use-rag-properly-and-what-types-of-query-it-is-good-at/658204
-[10] https://community.openai.com/t/prompt-engineering-for-rag/621495
-```
-### Only use the latest information from a specific site
-This following query will only use the information from openai.com that are updated in the previous
-day. The behavior is similar to the "site:openai.com" and "date-restrict" search parameters in Google
-search.
 ```
-% python ask.py -q "OpenAI Swarm Framework" -d 1 -s openai.com
-✅ Found 10 links for query: OpenAI Swarm Framework
-✅ Scraping the URLs ...
-✅ Scraped 10 URLs ...
-✅ Chunking the text ...
-✅ Saving to vector DB ...
-✅ Querying the vector DB to get context ...
-✅ Running inference with context ...
-# Answer
-OpenAI Swarm Framework is an experimental platform designed for building, orchestrating, and
-deploying multi-agent systems, enabling multiple AI agents to collaborate on complex tasks. It contrasts
-with traditional single-agent models by facilitating agent interaction and coordination, thus enhancing
-efficiency[5][9]. The framework provides developers with a way to orchestrate these agent systems in
-a lightweight manner, leveraging Node.js for scalable applications[1][4].
-One implementation of this framework is Swarm.js, which serves as a Node.js SDK, allowing users to
-create and manage agents that perform tasks and hand off conversations. Swarm.js is positioned as
-an educational tool, making it accessible for both beginners and experts, although it may still contain
-bugs and is currently lightweight[1][3][7]. This new approach emphasizes multi-agent collaboration and is
-well-suited for back-end development, requiring some programming expertise for effective implementation[9].
-Overall, OpenAI Swarm facilitates a shift in how AI systems can collaborate, differing from existing
-OpenAI tools by focusing on backend orchestration rather than user-interactive front-end applications[9].
-# References
-[1] https://community.openai.com/t/introducing-swarm-js-node-js-implementation-of-openai-swarm/977510
-[2] https://community.openai.com/t/introducing-swarm-js-a-node-js-implementation-of-openai-swarm/977510
-[3] https://community.openai.com/t/introducing-swarm-js-node-js-implementation-of-openai-swarm/977510
-[4] https://community.openai.com/t/introducing-swarm-js-a-node-js-implementation-of-openai-swarm/977510
-[5] https://community.openai.com/t/swarm-some-initial-insights/976602
-[6] https://community.openai.com/t/swarm-some-initial-insights/976602
-[7] https://community.openai.com/t/introducing-swarm-js-node-js-implementation-of-openai-swarm/977510
-[8] https://community.openai.com/t/introducing-swarm-js-a-node-js-implementation-of-openai-swarm/977510
-[9] https://community.openai.com/t/swarm-some-initial-insights/976602
-[10] https://community.openai.com/t/swarm-some-initial-insights/976602
 ```

 sdk: gradio
 sdk_version: 5.3.0
 ---
 [![License](https://img.shields.io/github/license/pengfeng/ask.py)](LICENSE)
+- [🚀 **Updates!** 🚀](#-updates-)
+- [Introduction](#introduction)
+  - [Demo use cases](#demo-use-cases)
+  - [The search-extract-summarize flow](#the-search-extract-summarize-flow)
+- [Quick start](#quick-start)
+- [Use Different LLM Endpoints](#use-different-llm-endpoints)
+  - [Use local Ollama inference and embedding models](#use-local-ollama-inference-and-embedding-models)
+  - [Use DeepSeek API inference with OpenAI embedding models](#use-deepseek-api-inference-with-openai-embedding-models)
+- [GradIO Deployment](#gradio-deployment)
+- [Community](#community)
+# 🚀 **Updates!** 🚀
+A full version with db support and configurable components is open sourced here:
+[LeetTools](https://github.com/leettools-dev/leettools). A demo web site has been setup
+[here](https://svc.leettools.com). Please check them out!
+We also added support for local Ollama inference and embedding models, as well as for other API
+providers such as DeepSeek. Please see the [`Use Different LLM Endpoints`](#use-different-llm-endpoints) secton for more details.
 > [UPDATE]
+> - 2025-01-20: add support for separate API endpoints for inference and embedding
+> - 2025-01-20: add support for .env file switch and Ollama example
+> - 2025-01-20: add support for default search proxy
+> - 2024-12-20: add the full function version link
+> - 2024-11-20: add Docling converter and local mode to query against local files
+> - 2024-11-10: add Chonkie as the default chunker
+> - 2024-10-28: add extract function as a new output mode
+> - 2024-10-25: add hybrid search demo using DuckDB full-text search
 > - 2024-10-22: add GradIO integation
 > - 2024-10-21: use DuckDB for the vector search and use API for embedding
 > - 2024-10-20: allow to specify a list of input urls
 > - 2024-10-18: output-language and output-length parameters for LLM
 > - 2024-10-18: date-restrict and target-site parameters for seach
+# Introduction
+A single Python program to implement the search-extract-summarize flow, similar to AI search
+engines such as Perplexity.
+- You can run it with local Ollama inference and embedding models.
+- You can run it on command line or with a GradIO UI.
+- You can control the output behavior, e.g., extract structured data or change output language,
+- You can control the search behavior, e.g., restrict to a specific site or date, or just scrape
+  a specified list of URLs.
+- You can run it in a cron job or bash script to automate complex search/data extraction tasks.
+- You can ask questions against local files.
+We have a running UI example [in HuggingFace Spaces](https://huggingface.co/spaces/leettools/AskPy).
+![image](https://github.com/user-attachments/assets/0483e6a2-75d7-4fbd-813f-bfa13839c836)
+## Demo use cases
+- [Search like Perplexity](demos/search_and_answer.md)
+- [Only use the latest information from a specific site](demos/search_on_site_and_date.md)
+- [Extract information from web search results](demos/search_and_extract.md)
+- [Ask questions against local files](demos/local_files.md)
+- [Use Ollama local LLM and Embedding models](demos/run_with_ollama.md)
 > [!NOTE]
+>
+> - Our main goal is to illustrate the basic concepts of AI search engines with the raw constructs.
+>   Performance or scalability is not in the scope of this program.
+> - We are planning to open source a real search-enabled AI toolset with real DB setup, real document
+>   pipeline, and real query engine soon. Star and watch this repo for updates!
 ## The search-extract-summarize flow
 Given a query, the program will
+- in search mode: search Google for the top 10 web pages
+- in local mode: use the local files under the 'data' directory
+- crawl and scape the result documents for their text content
 - chunk the text content into chunks and save them into a vectordb
+- perform a hybrid search (vector and BM25 FTS) with the query and find the top 10 matched chunks
+- [Optional] use a reranker to re-rank the top chunks
+- use the top chunks as the context to ask an LLM to generate the answer
 - output the answer with the references
 Of course this flow is a very simplified version of the real AI search engines, but it is a good
 - ask LLM to answer with a specific length.
 - crawl a specific list of urls and answer based on those contents only.
+This program can serve as a playground to understand and experiment with different components in
+the pipeline.
+# Quick start
+```bash
+# recommend to use Python 3.10 or later and use venv or conda to create a virtual environment
+% pip install -r requirements.txt
 # modify .env file to set the API keys or export them as environment variables as below
+# you can use the Google search API, if not set we provide a default search engine proxy for testing
+# % export SEARCH_API_KEY="your-google-search-api-key"
+# % export SEARCH_PROJECT_KEY="your-google-cx-key"
+# right now we use OpenAI API, default using OpenAI
+# % export LLM_BASE_URL=https://api.openai.com/v1
+% export LLM_API_KEY=<your-openai-api-key>
+# By default, the program will start a web UI. See GradIO Deployment section for more info.
+# Run the program on command line with -c option
+% python ask.py -c -q "What is an LLM agent?"
+# You can also query your local files under the 'data' directory using the local mode
+% python ask.py -i local -c -q "How does Ask.py work?"
 # we can specify more parameters to control the behavior such as date_restrict and target_site
+% python ask.py --help
 Usage: ask.py [OPTIONS]
+  Search web for the query and summarize the results.
 Options:
   -q, --query TEXT                Query to search
+  -i, --input-mode [search|local]
+                                  Input mode for the query, default is search.
+                                  When using local, files under 'data' folder
+                                  will be used as input.
+  -o, --output-mode [answer|extract]
+                                  Output mode for the answer, default is a
+                                  simple answer
   -d, --date-restrict INTEGER     Restrict search results to a specific date
                                   range, default is no restriction
   -s, --target-site TEXT          Restrict search results to a specific site,
   --url-list-file TEXT            Instead of doing web search, scrape the
                                   target URL list and answer the query based
                                   on the content
+  --extract-schema-file TEXT      Pydantic schema for the extract mode
+  --inference-model-name TEXT     Model name to use for inference
+  --vector-search-only            Do not use hybrid search mode, use vector
+                                  search only.
+  -c, --run-cli                   Run as a command line tool instead of
+                                  launching the Gradio UI
+  -e, --env TEXT                  The environment file to use, absolute path
+                                  or related to package root.
   -l, --log-level [DEBUG|INFO|WARNING|ERROR]
                                   Set the logging level  [default: INFO]
   --help                          Show this message and exit.
 ```
+# Use Different LLM Endpoints
+## Use local Ollama inference and embedding models
+We can run Ask.py with different env files to use different LLM endpoints and other
+related settings. For example, if you have a local Ollama serving instance, you can set
+to use it as follows:
+```bash
+# you may need to pull the models first
+% ollama pull llama3.2
+% ollama pull nomic-embed-text
+% ollama serve
+% cat > .env.ollama <<EOF
+LLM_BASE_URL=http://localhost:11434/v1
+LLM_API_KEY=dummy-key
+DEFAULT_INFERENCE_MODEL=llama3.2
+EMBEDDING_MODEL=nomic-embed-text
+EMBEDDING_DIMENSIONS=768
+EOF
+# Then run the command with the -e option to specify the .env file to use
+% python ask.py -e .env.ollama -c -q "How does Ollama work?"
+```
+## Use DeepSeek API inference with OpenAI embedding models
+We can also use one provider for inference and another for embedding. For example, we can use
+DeepSeek API for inference and OpenAI for embedding since DeepSeek does not provide an embedding
+endpoint as of Jan 2025:
+```bash
+% cat > .env.deepseek <<EOF
+LLM_BASE_URL=https://api.deepseek.com/v1
+LLM_API_KEY=<deepseek-api-key>
+DEFAULT_INFERENCE_MODEL=deepseek-chat
+EMBED_BASE_URL=https://api.openai.com/v1
+EMBED_API_KEY=<openai-api-key>
+EMBEDDING_MODEL=text-embedding-3-small
+EMBEDDING_DIMENSIONS=1536
+EOF
+% python ask.py -e .env.deepseek -c -q "How does DeepSeek work?"
 ```
+# GradIO Deployment
+> [!NOTE]
+> Original GradIO app-sharing document [here](https://www.gradio.app/guides/sharing-your-app).
+**Quick test and sharing**
+By default, the program will start a web UI and share through GradIO.
+```bash
+% python ask.py
+* Running on local URL:  http://127.0.0.1:7860
+* Running on public URL: https://77c277af0330326587.gradio.live
+# you can also specify SHARE_GRADIO_UI to only run locally
+% export SHARE_GRADIO_UI=False
+% python ask.py
+* Running on local URL:  http://127.0.0.1:7860
 ```
+**To share a more permanent link using HuggingFace Spaces**
+- First, you need to [create a free HuggingFace account](https://huggingface.co/welcome).
+- Then in your [settings/token page](https://huggingface.co/settings/tokens), create a new token with Write permissions.
+- In your terminal, run the following commands in you app directory to deploy your program to
+  HuggingFace Spaces:
+```bash
+% pip install gradio
+% gradio deploy
+Creating new Spaces Repo in '/home/you/ask.py'. Collecting metadata, press Enter to accept default value.
+Enter Spaces app title [ask.py]: ask.py
+Enter Gradio app file [ask.py]:
+Enter Spaces hardware (cpu-basic, cpu-upgrade, t4-small, t4-medium, l4x1, l4x4, zero-a10g, a10g-small, a10g-large, a10g-largex2, a10g-largex4, a100-large, v5e-1x1, v5e-2x2, v5e-2x4) [cpu-basic]:
+Any Spaces secrets (y/n) [n]: y
+Enter secret name (leave blank to end): SEARCH_API_KEY
+Enter secret value for SEARCH_API_KEY: YOUR_SEARCH_API_KEY
+Enter secret name (leave blank to end): SEARCH_PROJECT_KEY
+Enter secret value for SEARCH_API_KEY: YOUR_SEARCH_PROJECT_KEY
+Enter secret name (leave blank to end): LLM_API_KEY
+Enter secret value for LLM_API_KEY: YOUR_LLM_API_KEY
+Enter secret name (leave blank to end):
+Create Github Action to automatically update Space on 'git push'? [n]: n
+Space available at https://huggingface.co/spaces/your_user_name/ask.py
 ```
+Now you can use the HuggingFace space app to run your queries.
+# Community
+**License and Acknowledgment**
+The source code is licensed under MIT license. Thanks for these amazing open-source projects and API
+providers:
+- [Google Search API](https://developers.google.com/custom-search/v1/overview)
+- [OpenAI API](https://beta.openai.com/docs/api-reference/completions/create)
+- [Jinja2](https://jinja.palletsprojects.com/en/3.0.x/)
+- [bs4](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)
+- [DuckDB](https://github.com/duckdb/duckdb)
+- [Docling](https://github.com/DS4SD/docling)
+- [GradIO](https://github.com/gradio-app/gradio)
+- [Chonkie](https://github.com/bhavnicksm/chonkie)

ask.py CHANGED Viewed

@@ -1,27 +1,62 @@
 import json
 import logging
 import os
 import urllib.parse
 from concurrent.futures import ThreadPoolExecutor
 from functools import partial
-from typing import Any, Dict, List, Optional, Tuple
 import click
-import duckdb
 import gradio as gr
 import requests
 from bs4 import BeautifulSoup
 from dotenv import load_dotenv
 from jinja2 import BaseLoader, Environment
 from openai import OpenAI
 script_dir = os.path.dirname(os.path.abspath(__file__))
 default_env_file = os.path.abspath(os.path.join(script_dir, ".env"))
-def get_logger(log_level: str) -> logging.Logger:
     logger = logging.getLogger(__name__)
     logger.setLevel(log_level)
     handler = logging.StreamHandler()
     formatter = logging.Formatter("%(asctime)s - %(levelname)s - %(message)s")
     handler.setFormatter(formatter)
@@ -29,35 +64,63 @@ def get_logger(log_level: str) -> logging.Logger:
     return logger
 class Ask:
     def __init__(self, logger: Optional[logging.Logger] = None):
-        self.read_env_variables()
         if logger is not None:
             self.logger = logger
         else:
-            self.logger = get_logger("INFO")
-        self.table_name = "document_chunks"
-        self.db_con = duckdb.connect(":memory:")
-        self.db_con.install_extension("vss")
-        self.db_con.load_extension("vss")
-        self.db_con.install_extension("fts")
-        self.db_con.load_extension("fts")
-        self.db_con.sql("CREATE SEQUENCE seq_docid START 1000")
-        self.db_con.execute(
-            f"""
-CREATE TABLE {self.table_name} (
-    doc_id INTEGER PRIMARY KEY DEFAULT nextval('seq_docid'),
-    url TEXT,
-    chunk TEXT,
-    vec FLOAT[{self.embedding_dimensions}]
-);
-"""
-        )
         self.session = requests.Session()
         user_agent: str = (
@@ -70,22 +133,56 @@ CREATE TABLE {self.table_name} (
     def read_env_variables(self) -> None:
         err_msg = ""
-        self.search_api_key = os.environ.get("SEARCH_API_KEY")
-        if self.search_api_key is None:
-            err_msg += "SEARCH_API_KEY env variable not set.\n"
-        self.search_project_id = os.environ.get("SEARCH_PROJECT_KEY")
-        if self.search_project_id is None:
-            err_msg += "SEARCH_PROJECT_KEY env variable not set.\n"
         self.llm_api_key = os.environ.get("LLM_API_KEY")
         if self.llm_api_key is None:
             err_msg += "LLM_API_KEY env variable not set.\n"
-        if err_msg != "":
-            raise Exception(f"\n{err_msg}\n")
-        self.llm_base_url = os.environ.get("LLM_BASE_URL")
-        if self.llm_base_url is None:
-            self.llm_base_url = "https://api.openai.com/v1"
         self.embedding_model = os.environ.get("EMBEDDING_MODEL")
         self.embedding_dimensions = os.environ.get("EMBEDDING_DIMENSIONS")
@@ -94,17 +191,50 @@ CREATE TABLE {self.table_name} (
             self.embedding_model = "text-embedding-3-small"
             self.embedding_dimensions = 1536
-    def search_web(self, query: str, date_restrict: int, target_site: str) -> List[str]:
         escaped_query = urllib.parse.quote(query)
         url_base = (
-            f"https://www.googleapis.com/customsearch/v1?key={self.search_api_key}"
             f"&cx={self.search_project_id}&q={escaped_query}"
         )
         url_paras = f"&safe=active"
-        if date_restrict is not None and date_restrict > 0:
-            url_paras += f"&dateRestrict={date_restrict}"
-        if target_site is not None and target_site != "":
-            url_paras += f"&siteSearch={target_site}&siteSearchFilter=i"
         url = f"{url_base}{url_paras}"
         self.logger.debug(f"Searching for query: {query}")
@@ -145,6 +275,7 @@ CREATE TABLE {self.table_name} (
         return found_links
     def _scape_url(self, url: str) -> Tuple[str, str]:
         try:
             response = self.session.get(url, timeout=10)
             soup = BeautifulSoup(response.content, "lxml", from_encoding="utf-8")
@@ -155,6 +286,9 @@ CREATE TABLE {self.table_name} (
                 body_text = " ".join(body_text.split()).strip()
                 self.logger.debug(f"Scraped {url}: {body_text}...")
                 if len(body_text) > 100:
                     return url, body_text
                 else:
                     self.logger.warning(
@@ -182,15 +316,10 @@ CREATE TABLE {self.table_name} (
         return scrape_results
-    def chunk_results(
-        self, scrape_results: Dict[str, str], size: int, overlap: int
-    ) -> Dict[str, List[str]]:
         chunking_results: Dict[str, List[str]] = {}
         for url, text in scrape_results.items():
-            chunks = []
-            for pos in range(0, len(text), size - overlap):
-                chunks.append(text[pos : pos + size])
-            chunking_results[url] = chunks
         return chunking_results
     def get_embedding(self, client: OpenAI, texts: List[str]) -> List[List[float]]:
@@ -221,24 +350,49 @@ CREATE TABLE {self.table_name} (
         embeddings = self.get_embedding(client, texts)
         return chunk_batch, embeddings
-    def save_to_db(self, chunking_results: Dict[str, List[str]]) -> None:
-        client = self._get_api_client()
         embed_batch_size = 50
         query_batch_size = 100
         insert_data = []
         batches: List[Tuple[str, List[str]]] = []
-        for url, list_chunks in chunking_results.items():
             for i in range(0, len(list_chunks), embed_batch_size):
-                list_chunks = list_chunks[i : i + embed_batch_size]
-                batches.append((url, list_chunks))
         self.logger.info(f"Embedding {len(batches)} batches of chunks ...")
-        partial_get_embedding = partial(self.batch_get_embedding, client)
         with ThreadPoolExecutor(max_workers=10) as executor:
             all_embeddings = executor.map(partial_get_embedding, batches)
         self.logger.info(f"✅ Finished embedding.")
         for chunk_batch, embeddings in all_embeddings:
             url = chunk_batch[0]
             list_chunks = chunk_batch[1]
@@ -250,7 +404,6 @@ CREATE TABLE {self.table_name} (
             )
         for i in range(0, len(insert_data), query_batch_size):
-            # insert the batch into DuckDB
             value_str = ", ".join(
                 [
                     f"('{url}', '{chunk}', {embedding})"
@@ -258,13 +411,13 @@ CREATE TABLE {self.table_name} (
                 ]
             )
             query = f"""
-            INSERT INTO {self.table_name} (url, chunk, vec) VALUES {value_str};
             """
             self.db_con.execute(query)
         self.db_con.execute(
             f"""
-                CREATE INDEX cos_idx ON {self.table_name} USING HNSW (vec)
                 WITH (metric = 'cosine');
             """
         )
@@ -272,19 +425,28 @@ CREATE TABLE {self.table_name} (
         self.db_con.execute(
             f"""
                 PRAGMA create_fts_index(
-                {self.table_name}, 'doc_id', 'chunk'
                 );
             """
         )
         self.logger.info(f"✅ Created the full text search index ...")
-    def vector_search(self, query: str) -> List[Dict[str, Any]]:
-        client = self._get_api_client()
-        embeddings = self.get_embedding(client, [query])[0]
         query_result: duckdb.DuckDBPyRelation = self.db_con.sql(
             f"""
-            SELECT * FROM {self.table_name}
             ORDER BY array_distance(vec, {embeddings}::FLOAT[{self.embedding_dimensions}])
             LIMIT 10;
         """
@@ -292,31 +454,92 @@ CREATE TABLE {self.table_name} (
         self.logger.debug(query_result)
-        matched_chunks = []
-        for record in query_result.fetchall():
             result_record = {
-                "url": record[1],
-                "chunk": record[2],
             }
-            matched_chunks.append(result_record)
-        return matched_chunks
-    def _get_api_client(self) -> OpenAI:
         return OpenAI(api_key=self.llm_api_key, base_url=self.llm_base_url)
     def _render_template(self, template_str: str, variables: Dict[str, Any]) -> str:
         env = Environment(loader=BaseLoader(), autoescape=False)
         template = env.from_string(template_str)
         return template.render(variables)
     def run_inference(
         self,
         query: str,
-        model_name: str,
         matched_chunks: List[Dict[str, Any]],
-        output_language: str,
-        output_length: int,
     ) -> str:
         system_prompt = (
             "You are an expert summarizing the answers based on the provided contents."
@@ -343,11 +566,11 @@ Here is the context:
         for i, chunk in enumerate(matched_chunks):
             context += f"[{i+1}] {chunk['chunk']}\n"
-        if output_length is None or output_length == 0:
             length_instructions = ""
         else:
             length_instructions = (
-                f"Please provide the answer in { output_length } words."
             )
         user_prompt = self._render_template(
@@ -355,17 +578,21 @@ Here is the context:
             {
                 "query": query,
                 "context": context,
-                "language": output_language,
                 "length_instructions": length_instructions,
             },
         )
-        self.logger.debug(f"Running inference with model: {model_name}")
         self.logger.debug(f"Final user prompt: {user_prompt}")
-        api_client = self._get_api_client()
         completion = api_client.chat.completions.create(
-            model=model_name,
             messages=[
                 {
                     "role": "system",
@@ -383,158 +610,415 @@ Here is the context:
         response_str = completion.choices[0].message.content
         return response_str
-def _read_url_list(url_list_file: str) -> str:
-    if url_list_file is None:
-        return None
-    with open(url_list_file, "r") as f:
-        links = f.readlines()
-    links = [
-        link.strip()
-        for link in links
-        if link.strip() != "" and not link.startswith("#")
-    ]
-    return "\n".join(links)
-def _run_query(
-    query: str,
-    date_restrict: int,
-    target_site: str,
-    output_language: str,
-    output_length: int,
-    url_list_str: str,
-    model_name: str,
-    log_level: str,
-) -> str:
-    logger = get_logger(log_level)
-    ask = Ask(logger=logger)
-    if url_list_str is None or url_list_str.strip() == "":
-        logger.info("Searching the web ...")
-        links = ask.search_web(query, date_restrict, target_site)
-        logger.info(f"✅ Found {len(links)} links for query: {query}")
-        for i, link in enumerate(links):
-            logger.debug(f"{i+1}. {link}")
-    else:
-        links = url_list_str.split("\n")
-    logger.info("Scraping the URLs ...")
-    scrape_results = ask.scrape_urls(links)
-    logger.info(f"✅ Scraped {len(scrape_results)} URLs.")
-    logger.info("Chunking the text ...")
-    chunking_results = ask.chunk_results(scrape_results, 1000, 100)
-    total_chunks = 0
-    for url, chunks in chunking_results.items():
-        logger.debug(f"URL: {url}")
-        total_chunks += len(chunks)
-        for i, chunk in enumerate(chunks):
-            logger.debug(f"Chunk {i+1}: {chunk}")
-    logger.info(f"✅ Generated {total_chunks} chunks ...")
-    logger.info(f"Saving {total_chunks} chunks to DB ...")
-    ask.save_to_db(chunking_results)
-    logger.info(f"✅ Successfully embedded and saved chunks to DB.")
-    logger.info("Querying the vector DB to get context ...")
-    matched_chunks = ask.vector_search(query)
-    for i, result in enumerate(matched_chunks):
-        logger.debug(f"{i+1}. {result}")
-    logger.info(f"✅ Got {len(matched_chunks)} matched chunks.")
-    logger.info("Running inference with context ...")
-    answer = ask.run_inference(
-        query=query,
-        model_name=model_name,
-        matched_chunks=matched_chunks,
-        output_language=output_language,
-        output_length=output_length,
-    )
-    logger.info("✅ Finished inference API call.")
-    logger.info("generateing output ...")
-    answer = f"# Answer\n\n{answer}\n"
-    references = "\n".join(
-        [f"[{i+1}] {result['url']}" for i, result in enumerate(matched_chunks)]
-    )
-    return f"{answer}\n\n# References\n\n{references}"
 def launch_gradio(
     query: str,
-    date_restrict: int,
-    target_site: str,
-    output_language: str,
-    output_length: int,
-    url_list_str: str,
-    model_name: str,
-    log_level: str,
     share_ui: bool,
 ) -> None:
-    iface = gr.Interface(
-        fn=_run_query,
-        inputs=[
-            gr.Textbox(label="Query", value=query),
-            gr.Number(
-                label="Date Restrict (Optional) [0 or empty means no date limit.]",
-                value=date_restrict,
-            ),
-            gr.Textbox(
-                label="Target Sites (Optional) [Empty means seach the whole web.]",
-                value=target_site,
-            ),
-            gr.Textbox(
-                label="Output Language (Optional) [Default is English.]",
-                value=output_language,
-            ),
-            gr.Number(
-                label="Output Length in words (Optional) [Default is automatically decided by LLM.]",
-                value=output_length,
-            ),
-            gr.Textbox(
-                label="URL List (Optional) [When specified, scrape the urls instead of searching the web.]",
-                lines=5,
-                max_lines=20,
-                value=url_list_str,
-            ),
-        ],
-        additional_inputs=[
-            gr.Textbox(label="Model Name", value=model_name),
-            gr.Textbox(label="Log Level", value=log_level),
-        ],
-        outputs="text",
-        show_progress=True,
-        flagging_options=[("Report Error", None)],
-        title="Ask.py - Web Search-Extract-Summarize",
-        description="Search the web with the query and summarize the results. Source code: https://github.com/pengfeng/ask.py",
-    )
-    iface.launch(share=share_ui)
-@click.command(help="Search web for the query and summarize the results")
 @click.option(
-    "--web-ui",
-    is_flag=True,
-    help="Launch the web interface",
 )
-@click.option("--query", "-q", required=False, help="Query to search")
 @click.option(
     "--date-restrict",
     "-d",
     type=int,
     required=False,
-    default=None,
     help="Restrict search results to a specific date range, default is no restriction",
 )
 @click.option(
     "--target-site",
     "-s",
     required=False,
-    default=None,
     help="Restrict search results to a specific site, default is no restriction",
 )
 @click.option(
@@ -547,24 +1031,50 @@ def launch_gradio(
     "--output-length",
     type=int,
     required=False,
-    default=None,
     help="Output length for the answer",
 )
 @click.option(
     "--url-list-file",
     type=str,
     required=False,
-    default=None,
     show_default=True,
     help="Instead of doing web search, scrape the target URL list and answer the query based on the content",
 )
 @click.option(
-    "--model-name",
-    "-m",
     required=False,
-    default="gpt-4o-mini",
     help="Model name to use for inference",
 )
 @click.option(
     "-l",
     "--log-level",
@@ -575,49 +1085,71 @@ def launch_gradio(
     show_default=True,
 )
 def search_extract_summarize(
-    web_ui: bool,
     query: str,
     date_restrict: int,
     target_site: str,
     output_language: str,
     output_length: int,
     url_list_file: str,
-    model_name: str,
     log_level: str,
 ):
     load_dotenv(dotenv_path=default_env_file, override=False)
-    if web_ui or os.environ.get("RUN_GRADIO_UI", "false").lower() != "false":
         if os.environ.get("SHARE_GRADIO_UI", "false").lower() == "true":
             share_ui = True
         else:
             share_ui = False
         launch_gradio(
             query=query,
-            date_restrict=date_restrict,
-            target_site=target_site,
-            output_language=output_language,
-            output_length=output_length,
-            url_list_str=_read_url_list(url_list_file),
-            model_name=model_name,
-            log_level=log_level,
             share_ui=share_ui,
         )
-    else:
-        if query is None:
-            raise Exception("Query is required for the command line mode")
-        result = _run_query(
-            query=query,
-            date_restrict=date_restrict,
-            target_site=target_site,
-            output_language=output_language,
-            output_length=output_length,
-            url_list_str=_read_url_list(url_list_file),
-            model_name=model_name,
-            log_level=log_level,
-        )
-        click.echo(result)
 if __name__ == "__main__":

+import csv
+import io
 import json
 import logging
 import os
+import queue
 import urllib.parse
 from concurrent.futures import ThreadPoolExecutor
+from datetime import datetime
+from enum import Enum
 from functools import partial
+from queue import Queue
+from typing import Any, Dict, Generator, List, Optional, Tuple, TypeVar
 import click
 import gradio as gr
 import requests
 from bs4 import BeautifulSoup
+from chonkie import Chunk
 from dotenv import load_dotenv
 from jinja2 import BaseLoader, Environment
 from openai import OpenAI
+from pydantic import BaseModel, create_model
+TypeVar_BaseModel = TypeVar("TypeVar_BaseModel", bound=BaseModel)
 script_dir = os.path.dirname(os.path.abspath(__file__))
 default_env_file = os.path.abspath(os.path.join(script_dir, ".env"))
+class OutputMode(str, Enum):
+    answer = "answer"
+    extract = "extract"
+class InputMode(str, Enum):
+    search = "search"
+    local = "local"
+class AskSettings(BaseModel):
+    date_restrict: int
+    target_site: str
+    output_language: str
+    output_length: int
+    url_list: List[str]
+    inference_model_name: str
+    hybrid_search: bool
+    input_mode: InputMode
+    output_mode: OutputMode
+    extract_schema_str: str
+def _get_logger(log_level: str) -> logging.Logger:
     logger = logging.getLogger(__name__)
     logger.setLevel(log_level)
+    if len(logger.handlers) > 0:
+        return logger
     handler = logging.StreamHandler()
     formatter = logging.Formatter("%(asctime)s - %(levelname)s - %(message)s")
     handler.setFormatter(formatter)
     return logger
+def _read_url_list(url_list_file: str) -> List[str]:
+    if not url_list_file:
+        return []
+    with open(url_list_file, "r") as f:
+        links = f.readlines()
+    url_list = [
+        link.strip()
+        for link in links
+        if link.strip() != "" and not link.startswith("#")
+    ]
+    return url_list
+def _read_extract_schema_str(extract_schema_file: str) -> str:
+    if not extract_schema_file:
+        return ""
+    with open(extract_schema_file, "r") as f:
+        schema_str = f.read()
+    return schema_str
+def _output_csv(result_dict: Dict[str, List[BaseModel]], key_name: str) -> str:
+    # generate the CSV content from a Dict of URL and list of extracted items
+    output = io.StringIO()
+    csv_writer = None
+    for src_url, items in result_dict.items():
+        for item in items:
+            value_dict = item.model_dump()
+            item_with_url = {**value_dict, key_name: src_url}
+            if csv_writer is None:
+                headers = list(value_dict.keys()) + [key_name]
+                csv_writer = csv.DictWriter(output, fieldnames=headers)
+                csv_writer.writeheader()
+            csv_writer.writerow(item_with_url)
+    csv_content = output.getvalue()
+    output.close()
+    return csv_content
 class Ask:
     def __init__(self, logger: Optional[logging.Logger] = None):
         if logger is not None:
             self.logger = logger
         else:
+            self.logger = _get_logger("INFO")
+        self.read_env_variables()
+        self.init_converter()
+        self.init_chunker()
+        self.init_db()
         self.session = requests.Session()
         user_agent: str = (
     def read_env_variables(self) -> None:
         err_msg = ""
+        self.search_api_url = os.environ.get("SEARCH_API_URL")
+        if self.search_api_url is None:
+            self.search_api_key = os.environ.get("SEARCH_API_KEY")
+            if self.search_api_key:
+                self.search_api_url = "https://www.googleapis.com/customsearch/v1"
+                self.search_project_id = os.environ.get("SEARCH_PROJECT_KEY")
+                if self.search_project_id is None:
+                    err_msg += "SEARCH_PROJECT_KEY env variable not set while SEARCH_API_KEY is set.\n"
+            else:
+                self.logger.info("No SEARCH_API_URL or SEARCH_API_KEYenv variable set.")
+                self.logger.info(
+                    "Using the default proxy at https://svc.leettools.com:8098"
+                )
+                self.search_api_url = "https://svc.leettools.com:8098/customsearch/v1"
+                self.search_api_key = "dummy-search-api-key"
+                self.search_project_id = "dummy-search-project-id"
+        else:
+            self.search_api_key = os.environ.get("SEARCH_API_KEY")
+            if self.search_api_key is None:
+                err_msg += (
+                    f"SEARCH_API_KEY env variable not set for {self.search_api_url}.\n"
+                )
+            self.search_project_id = os.environ.get("SEARCH_PROJECT_KEY")
+            if self.search_project_id is None:
+                err_msg += f"SEARCH_PROJECT_KEY env variable not set for {self.search_api_url}.\n"
+        self.llm_base_url = os.environ.get("LLM_BASE_URL")
+        if self.llm_base_url is None:
+            self.llm_base_url = "https://api.openai.com/v1"
         self.llm_api_key = os.environ.get("LLM_API_KEY")
         if self.llm_api_key is None:
             err_msg += "LLM_API_KEY env variable not set.\n"
+        self.default_inference_model = os.environ.get("DEFAULT_INFERENCE_MODEL")
+        if self.default_inference_model is None:
+            self.default_inference_model = "gpt-4o-mini"
+        self.embed_base_url = os.environ.get("EMBED_BASE_URL")
+        if self.embed_base_url is None:
+            self.embed_base_url = self.llm_base_url
+        self.embed_api_key = os.environ.get("EMBED_API_KEY")
+        if self.embed_api_key is None:
+            if self.embed_base_url == self.llm_base_url:
+                self.embed_api_key = self.llm_api_key
+            else:
+                err_msg += (
+                    f"EMBED_API_KEY env variable not set for {self.embed_base_url}.\n"
+                )
         self.embedding_model = os.environ.get("EMBEDDING_MODEL")
         self.embedding_dimensions = os.environ.get("EMBEDDING_DIMENSIONS")
             self.embedding_model = "text-embedding-3-small"
             self.embedding_dimensions = 1536
+        if err_msg != "":
+            raise Exception(f"\n{err_msg}\n")
+    def init_converter(self) -> None:
+        from docling.document_converter import DocumentConverter
+        self.logger.info("Initializing converter ...")
+        self.converter = DocumentConverter()
+        self.logger.info("✅ Successfully initialized Docling.")
+    def init_chunker(self) -> None:
+        from chonkie import TokenChunker
+        self.logger.info("Initializing chunker ...")
+        self.chunker = TokenChunker(chunk_size=1000, chunk_overlap=100)
+        self.logger.info("✅ Successfully initialized Chonkie.")
+    def init_db(self) -> None:
+        import duckdb
+        self.logger.info("Initializing database ...")
+        self.db_con = duckdb.connect(":memory:")
+        self.db_con.install_extension("vss")
+        self.db_con.load_extension("vss")
+        self.db_con.install_extension("fts")
+        self.db_con.load_extension("fts")
+        self.db_con.sql("CREATE SEQUENCE seq_docid START 1000")
+        self.logger.info("✅ Successfully initialized DuckDB.")
+    def convert_file_to_md(self, file_path: str) -> str:
+        result = self.converter.convert(file_path)
+        return result.document.export_to_markdown()
+    def search_web(self, query: str, settings: AskSettings) -> List[str]:
         escaped_query = urllib.parse.quote(query)
         url_base = (
+            f"{self.search_api_url}?key={self.search_api_key}"
             f"&cx={self.search_project_id}&q={escaped_query}"
         )
         url_paras = f"&safe=active"
+        if settings.date_restrict > 0:
+            url_paras += f"&dateRestrict={settings.date_restrict}"
+        if settings.target_site:
+            url_paras += f"&siteSearch={settings.target_site}&siteSearchFilter=i"
         url = f"{url_base}{url_paras}"
         self.logger.debug(f"Searching for query: {query}")
         return found_links
     def _scape_url(self, url: str) -> Tuple[str, str]:
+        self.logger.info(f"Scraping {url} ...")
         try:
             response = self.session.get(url, timeout=10)
             soup = BeautifulSoup(response.content, "lxml", from_encoding="utf-8")
                 body_text = " ".join(body_text.split()).strip()
                 self.logger.debug(f"Scraped {url}: {body_text}...")
                 if len(body_text) > 100:
+                    self.logger.info(
+                        f"✅ Successfully scraped {url} with length: {len(body_text)}"
+                    )
                     return url, body_text
                 else:
                     self.logger.warning(
         return scrape_results
+    def chunk_results(self, scrape_results: Dict[str, str]) -> Dict[str, List[Chunk]]:
         chunking_results: Dict[str, List[str]] = {}
         for url, text in scrape_results.items():
+            chunking_results[url] = self.chunker.chunk(text)
         return chunking_results
     def get_embedding(self, client: OpenAI, texts: List[str]) -> List[List[float]]:
         embeddings = self.get_embedding(client, texts)
         return chunk_batch, embeddings
+    def _create_table(self) -> str:
+        # Simple ways to get a unique table name
+        timestamp = datetime.now().strftime("%Y_%m_%d_%H_%M_%S_%f")
+        table_name = f"document_chunks_{timestamp}"
+        self.db_con.execute(
+            f"""
+CREATE TABLE {table_name} (
+    doc_id INTEGER PRIMARY KEY DEFAULT nextval('seq_docid'),
+    url TEXT,
+    chunk TEXT,
+    vec FLOAT[{self.embedding_dimensions}]
+);
+"""
+        )
+        return table_name
+    def save_chunks_to_db(self, all_chunks: Dict[str, List[Chunk]]) -> str:
+        """
+        The key of chunking_results is the URL and the value is the list of chunks.
+        """
+        embed_client = self._get_embed_api_client()
         embed_batch_size = 50
         query_batch_size = 100
         insert_data = []
+        table_name = self._create_table()
         batches: List[Tuple[str, List[str]]] = []
+        for url, list_chunks in all_chunks.items():
             for i in range(0, len(list_chunks), embed_batch_size):
+                batch = [chunk.text for chunk in list_chunks[i : i + embed_batch_size]]
+                batches.append((url, batch))
         self.logger.info(f"Embedding {len(batches)} batches of chunks ...")
+        partial_get_embedding = partial(self.batch_get_embedding, embed_client)
         with ThreadPoolExecutor(max_workers=10) as executor:
             all_embeddings = executor.map(partial_get_embedding, batches)
         self.logger.info(f"✅ Finished embedding.")
+        # We batch the insert data to speed up the insertion operation.
+        # Although the DuckDB doc says executeMany is optimized for batch insert,
+        # we found that it is faster to batch the insert data and run a single insert.
         for chunk_batch, embeddings in all_embeddings:
             url = chunk_batch[0]
             list_chunks = chunk_batch[1]
             )
         for i in range(0, len(insert_data), query_batch_size):
             value_str = ", ".join(
                 [
                     f"('{url}', '{chunk}', {embedding})"
                 ]
             )
             query = f"""
+            INSERT INTO {table_name} (url, chunk, vec) VALUES {value_str};
             """
             self.db_con.execute(query)
         self.db_con.execute(
             f"""
+                CREATE INDEX {table_name}_cos_idx ON {table_name} USING HNSW (vec)
                 WITH (metric = 'cosine');
             """
         )
         self.db_con.execute(
             f"""
                 PRAGMA create_fts_index(
+                {table_name}, 'doc_id', 'chunk'
                 );
             """
         )
         self.logger.info(f"✅ Created the full text search index ...")
+        return table_name
+    def vector_search(
+        self, table_name: str, query: str, settings: AskSettings
+    ) -> List[Dict[str, Any]]:
+        import duckdb
+        """
+        The return value is a list of {url: str, chunk: str} records.
+        In a real world, we will define a class of Chunk to have more metadata such as offsets.
+        """
+        embed_client = self._get_embed_api_client()
+        embeddings = self.get_embedding(embed_client, [query])[0]
         query_result: duckdb.DuckDBPyRelation = self.db_con.sql(
             f"""
+            SELECT * FROM {table_name}
             ORDER BY array_distance(vec, {embeddings}::FLOAT[{self.embedding_dimensions}])
             LIMIT 10;
         """
         self.logger.debug(query_result)
+        # use a dict to remove duplicates from vector search and full-text search
+        matched_chunks_dict = {}
+        for vec_result in query_result.fetchall():
+            doc_id = vec_result[0]
             result_record = {
+                "url": vec_result[1],
+                "chunk": vec_result[2],
             }
+            matched_chunks_dict[doc_id] = result_record
+        if settings.hybrid_search:
+            self.logger.info("Running full-text search ...")
+            self.db_con.execute(
+                f"""
+                PREPARE fts_query AS (
+                    WITH scored_docs AS (
+                        SELECT *, fts_main_{table_name}.match_bm25(
+                            doc_id, ?, fields := 'chunk'
+                        ) AS score FROM {table_name})
+                    SELECT doc_id, url, chunk, score
+                    FROM scored_docs
+                    WHERE score IS NOT NULL
+                    ORDER BY score DESC
+                    LIMIT 10)
+                """
+            )
+            self.db_con.execute("PRAGMA threads=4")
+            # You can run more complex query rewrite methods here
+            # usually: stemming, stop words, etc.
+            escaped_query = query.replace("'", " ")
+            fts_result: duckdb.DuckDBPyRelation = self.db_con.execute(
+                f"EXECUTE fts_query('{escaped_query}')"
+            )
+            index = 0
+            for fts_record in fts_result.fetchall():
+                index += 1
+                self.logger.debug(f"The full text search record #{index}: {fts_record}")
+                doc_id = fts_record[0]
+                result_record = {
+                    "url": fts_record[1],
+                    "chunk": fts_record[2],
+                }
+                # You can configure the score threashold and top-k
+                if fts_record[3] > 1:
+                    matched_chunks_dict[doc_id] = result_record
+                else:
+                    break
+                if index >= 10:
+                    break
+        return matched_chunks_dict.values()
+    def _get_inference_api_client(self) -> OpenAI:
         return OpenAI(api_key=self.llm_api_key, base_url=self.llm_base_url)
+    def _get_embed_api_client(self) -> OpenAI:
+        return OpenAI(api_key=self.embed_api_key, base_url=self.embed_base_url)
     def _render_template(self, template_str: str, variables: Dict[str, Any]) -> str:
         env = Environment(loader=BaseLoader(), autoescape=False)
         template = env.from_string(template_str)
         return template.render(variables)
+    def _get_target_class(self, extract_schema_str: str) -> TypeVar_BaseModel:
+        local_namespace = {"BaseModel": BaseModel}
+        exec(extract_schema_str, local_namespace, local_namespace)
+        for key, value in local_namespace.items():
+            if key == "__builtins__":
+                continue
+            if key == "BaseModel":
+                continue
+            if isinstance(value, type):
+                if issubclass(value, BaseModel):
+                    return value
+        raise Exception("No Pydantic schema found in the extract schema str.")
     def run_inference(
         self,
         query: str,
         matched_chunks: List[Dict[str, Any]],
+        settings: AskSettings,
     ) -> str:
         system_prompt = (
             "You are an expert summarizing the answers based on the provided contents."
         for i, chunk in enumerate(matched_chunks):
             context += f"[{i+1}] {chunk['chunk']}\n"
+        if not settings.output_length:
             length_instructions = ""
         else:
             length_instructions = (
+                f"Please provide the answer in { settings.output_length } words."
             )
         user_prompt = self._render_template(
             {
                 "query": query,
                 "context": context,
+                "language": settings.output_language,
                 "length_instructions": length_instructions,
             },
         )
+        final_inference_model = settings.inference_model_name
+        if settings.inference_model_name is None:
+            final_inference_model = self.default_inference_model
+        self.logger.debug(f"Running inference with model: {final_inference_model}")
         self.logger.debug(f"Final user prompt: {user_prompt}")
+        api_client = self._get_inference_api_client()
         completion = api_client.chat.completions.create(
+            model=final_inference_model,
             messages=[
                 {
                     "role": "system",
         response_str = completion.choices[0].message.content
         return response_str
+    def run_extract(
+        self,
+        query: str,
+        extract_schema_str: str,
+        target_content: str,
+        settings: AskSettings,
+    ) -> List[TypeVar_BaseModel]:
+        target_class = self._get_target_class(extract_schema_str)
+        system_prompt = (
+            "You are an expert of extract structual information from the document."
+        )
+        user_promt_template = """
+Given the provided content, if it contains information about {{ query }}, please extract the
+list of structured data items as defined in the following Pydantic schema:
+{{ extract_schema_str }}
+Below is the provided content:
+{{ content }}
+"""
+        user_prompt = self._render_template(
+            user_promt_template,
+            {
+                "query": query,
+                "content": target_content,
+                "extract_schema_str": extract_schema_str,
+            },
+        )
+        self.logger.debug(
+            f"Running extraction with model: {settings.inference_model_name}"
+        )
+        self.logger.debug(f"Final user prompt: {user_prompt}")
+        class_name = target_class.__name__
+        list_class_name = f"{class_name}_list"
+        response_pydantic_model = create_model(
+            list_class_name,
+            items=(List[target_class], ...),
+        )
+        api_client = self._get_inference_api_client()
+        completion = api_client.beta.chat.completions.parse(
+            model=settings.inference_model_name,
+            messages=[
+                {
+                    "role": "system",
+                    "content": system_prompt,
+                },
+                {
+                    "role": "user",
+                    "content": user_prompt,
+                },
+            ],
+            response_format=response_pydantic_model,
+        )
+        if completion is None:
+            raise Exception("No completion from the API")
+        message = completion.choices[0].message
+        if message.refusal:
+            raise Exception(
+                f"Refused to extract information from the document: {message.refusal}."
+            )
+        extract_result = message.parsed
+        return extract_result.items
+    def run_query_gradio(
+        self,
+        query: str,
+        date_restrict: int,
+        target_site: str,
+        output_language: str,
+        output_length: int,
+        url_list_str: str,
+        inference_model_name: str,
+        hybrid_search: bool,
+        input_mode_str: str,
+        output_mode_str: str,
+        extract_schema_str: str,
+    ) -> Generator[Tuple[str, str], None, Tuple[str, str]]:
+        logger = self.logger
+        log_queue = Queue()
+        if url_list_str:
+            url_list = url_list_str.split("\n")
+        else:
+            url_list = []
+        settings = AskSettings(
+            date_restrict=date_restrict,
+            target_site=target_site,
+            output_language=output_language,
+            output_length=output_length,
+            url_list=url_list,
+            inference_model_name=inference_model_name,
+            hybrid_search=hybrid_search,
+            input_mode=InputMode(input_mode_str),
+            output_mode=OutputMode(output_mode_str),
+            extract_schema_str=extract_schema_str,
+        )
+        # add a queue handler to the logger to capture the logs
+        queue_handler = logging.Handler()
+        formatter = logging.Formatter("%(asctime)s - %(levelname)s - %(message)s")
+        queue_handler.emit = lambda record: log_queue.put(formatter.format(record))
+        logger.addHandler(queue_handler)
+        def update_logs():
+            logs = []
+            while True:
+                try:
+                    log = log_queue.get_nowait()
+                    logs.append(log)
+                except queue.Empty:
+                    break
+            return "\n".join(logs)
+        # wrap the process in a generator to yield the logs to integrate with GradIO
+        def process_with_logs():
+            # the key is the URI and the result is the scraped text
+            target_documents: Dict[str, str] = {}
+            if settings.input_mode == InputMode.search:
+                if len(settings.url_list) > 0:
+                    links = settings.url_list
+                else:
+                    logger.info("Searching the web ...")
+                    yield "", update_logs()
+                    links = self.search_web(query, settings)
+                    logger.info(f"✅ Found {len(links)} links for query: {query}")
+                    for i, link in enumerate(links):
+                        logger.debug(f"{i+1}. {link}")
+                    yield "", update_logs()
+                logger.info("Scraping the URLs ...")
+                yield "", update_logs()
+                target_documents = self.scrape_urls(links)
+                logger.info(f"✅ Scraped {len(target_documents)} URLs.")
+                yield "", update_logs()
+            elif settings.input_mode == InputMode.local:
+                logger.info("Processing the local data directory ...")
+                yield "", update_logs()
+                # read the files from the data folder
+                data_folder = os.path.join(script_dir, "data")
+                if not os.path.exists(data_folder):
+                    raise Exception("Data folder not found.")
+                for file_name in os.listdir(data_folder):
+                    logger.info(f"Processing {file_name} ...")
+                    yield "", update_logs()
+                    file_path = os.path.join(data_folder, file_name)
+                    file_uri = f"file://{file_path}"
+                    target_documents[file_uri] = self.convert_file_to_md(file_path)
+                    logger.info(f"✅ Finished processing {file_name}.")
+                    yield "", update_logs()
+            else:
+                raise Exception(f"Invalid input mode: {settings.input_mode}")
+            if settings.output_mode == OutputMode.answer:
+                logger.info("Chunking the text ...")
+                yield "", update_logs()
+                all_chunks = self.chunk_results(target_documents)
+                chunk_count = 0
+                for url, chunks in all_chunks.items():
+                    logger.debug(f"URL: {url}")
+                    chunk_count += len(chunks)
+                    for i, chunk in enumerate(chunks):
+                        logger.debug(f"Chunk {i+1}: {chunk.text}")
+                logger.info(f"✅ Generated {chunk_count} chunks ...")
+                yield "", update_logs()
+                logger.info(f"Saving {chunk_count} chunks to DB ...")
+                yield "", update_logs()
+                table_name = self.save_chunks_to_db(all_chunks)
+                logger.info(f"✅ Successfully embedded and saved chunks to DB.")
+                yield "", update_logs()
+                logger.info("Querying the vector DB to get context ...")
+                matched_chunks = self.vector_search(table_name, query, settings)
+                for i, result in enumerate(matched_chunks):
+                    logger.debug(f"{i+1}. {result}")
+                logger.info(f"✅ Got {len(matched_chunks)} matched chunks.")
+                yield "", update_logs()
+                logger.info("Running inference with context ...")
+                yield "", update_logs()
+                answer = self.run_inference(
+                    query=query,
+                    matched_chunks=matched_chunks,
+                    settings=settings,
+                )
+                logger.info("✅ Finished inference API call.")
+                logger.info("Generating output ...")
+                yield "", update_logs()
+                answer = f"# Answer\n\n{answer}\n"
+                references = "\n".join(
+                    [
+                        f"[{i+1}] {result['url']}"
+                        for i, result in enumerate(matched_chunks)
+                    ]
+                )
+                yield f"{answer}\n\n# References\n\n{references}", update_logs()
+            elif settings.output_mode == OutputMode.extract:
+                logger.info("Extracting structured data ...")
+                yield "", update_logs()
+                aggregated_output = {}
+                for url, text in target_documents.items():
+                    items = self.run_extract(
+                        query=query,
+                        extract_schema_str=extract_schema_str,
+                        target_content=text,
+                        settings=settings,
+                    )
+                    self.logger.info(
+                        f"✅ Finished inference API call. Extracted {len(items)} items from {url}."
+                    )
+                    yield "", update_logs()
+                    self.logger.debug(items)
+                    aggregated_output[url] = items
+                logger.info("✅ Finished extraction from all urls.")
+                logger.info("Generating output ...")
+                yield "", update_logs()
+                answer = _output_csv(aggregated_output, "SourceURL")
+                yield f"{answer}", update_logs()
+            else:
+                raise Exception(f"Invalid output mode: {settings.output_mode}")
+        logs = ""
+        final_result = ""
+        try:
+            for result, log_update in process_with_logs():
+                logs += log_update + "\n"
+                final_result = result
+                yield final_result, logs
+        finally:
+            logger.removeHandler(queue_handler)
+        return final_result, logs
+    def run_query(
+        self,
+        query: str,
+        settings: AskSettings,
+    ) -> str:
+        url_list_str = "\n".join(settings.url_list)
+        for result, logs in self.run_query_gradio(
+            query=query,
+            date_restrict=settings.date_restrict,
+            target_site=settings.target_site,
+            output_language=settings.output_language,
+            output_length=settings.output_length,
+            url_list_str=url_list_str,
+            inference_model_name=settings.inference_model_name,
+            hybrid_search=settings.hybrid_search,
+            input_mode_str=settings.input_mode,
+            output_mode_str=settings.output_mode,
+            extract_schema_str=settings.extract_schema_str,
+        ):
+            final_result = result
+        return final_result
 def launch_gradio(
     query: str,
+    init_settings: AskSettings,
     share_ui: bool,
+    logger: logging.Logger,
 ) -> None:
+    ask = Ask(logger=logger)
+    def toggle_schema_textbox(option):
+        if option == "extract":
+            return gr.update(visible=True)
+        else:
+            return gr.update(visible=False)
+    with gr.Blocks() as demo:
+        gr.Markdown("# Ask.py - Web Search-Extract-Summarize")
+        gr.Markdown(
+            "Search the web with the query and summarize the results. Source code: https://github.com/pengfeng/ask.py"
+        )
+        with gr.Row():
+            with gr.Column():
+                query_input = gr.Textbox(label="Query", value=query)
+                input_mode_input = gr.Radio(
+                    label="Input Mode [search: from search or url, local: from local data]",
+                    choices=["search", "local"],
+                    value=init_settings.input_mode,
+                )
+                output_mode_input = gr.Radio(
+                    label="Output Mode [answer: simple answer, extract: get structured data]",
+                    choices=["answer", "extract"],
+                    value=init_settings.output_mode,
+                )
+                extract_schema_input = gr.Textbox(
+                    label="Extract Pydantic Schema",
+                    visible=(init_settings.output_mode == "extract"),
+                    value=init_settings.extract_schema_str,
+                    lines=5,
+                    max_lines=20,
+                )
+                output_mode_input.change(
+                    fn=toggle_schema_textbox,
+                    inputs=output_mode_input,
+                    outputs=extract_schema_input,
+                )
+                date_restrict_input = gr.Number(
+                    label="Date Restrict (Optional) [0 or empty means no date limit.]",
+                    value=init_settings.date_restrict,
+                )
+                target_site_input = gr.Textbox(
+                    label="Target Sites (Optional) [Empty means searching the whole web.]",
+                    value=init_settings.target_site,
+                )
+                output_language_input = gr.Textbox(
+                    label="Output Language (Optional) [Default is English.]",
+                    value=init_settings.output_language,
+                )
+                output_length_input = gr.Number(
+                    label="Output Length in words (Optional) [Default is automatically decided by LLM.]",
+                    value=init_settings.output_length,
+                )
+                url_list_input = gr.Textbox(
+                    label="URL List (Optional) [When specified, scrape the urls instead of searching the web.]",
+                    lines=5,
+                    max_lines=20,
+                    value="\n".join(init_settings.url_list),
+                )
+                with gr.Accordion("More Options", open=False):
+                    hybrid_search_input = gr.Checkbox(
+                        label="Hybrid Search [Use both vector search and full-text search.]",
+                        value=init_settings.hybrid_search,
+                    )
+                    inference_model_name_input = gr.Textbox(
+                        label="Inference Model Name",
+                        value=init_settings.inference_model_name,
+                    )
+                submit_button = gr.Button("Submit")
+            with gr.Column():
+                answer_output = gr.Textbox(label="Answer")
+                logs_output = gr.Textbox(label="Logs", lines=10)
+        submit_button.click(
+            fn=ask.run_query_gradio,
+            inputs=[
+                query_input,
+                date_restrict_input,
+                target_site_input,
+                output_language_input,
+                output_length_input,
+                url_list_input,
+                inference_model_name_input,
+                hybrid_search_input,
+                input_mode_input,
+                output_mode_input,
+                extract_schema_input,
+            ],
+            outputs=[answer_output, logs_output],
+        )
+    demo.queue().launch(share=share_ui)
+@click.command(help="Search web for the query and summarize the results.")
+@click.option("--query", "-q", required=False, help="Query to search")
 @click.option(
+    "--input-mode",
+    "-i",
+    type=click.Choice(["search", "local"], case_sensitive=False),
+    default="search",
+    required=False,
+    help=(
+        "Input mode for the query, default is search. "
+        "When using local, files under 'data' folder will be used as input."
+    ),
+)
+@click.option(
+    "--output-mode",
+    "-o",
+    type=click.Choice(["answer", "extract"], case_sensitive=False),
+    default="answer",
+    required=False,
+    help="Output mode for the answer, default is a simple answer",
 )
 @click.option(
     "--date-restrict",
     "-d",
     type=int,
     required=False,
+    default=0,
     help="Restrict search results to a specific date range, default is no restriction",
 )
 @click.option(
     "--target-site",
     "-s",
     required=False,
+    default="",
     help="Restrict search results to a specific site, default is no restriction",
 )
 @click.option(
     "--output-length",
     type=int,
     required=False,
+    default=0,
     help="Output length for the answer",
 )
 @click.option(
     "--url-list-file",
     type=str,
     required=False,
+    default="",
     show_default=True,
     help="Instead of doing web search, scrape the target URL list and answer the query based on the content",
 )
 @click.option(
+    "--extract-schema-file",
+    type=str,
+    required=False,
+    default="",
+    show_default=True,
+    help="Pydantic schema for the extract mode",
+)
+@click.option(
+    "--inference-model-name",
     required=False,
+    default=None,
     help="Model name to use for inference",
 )
+@click.option(
+    "--vector-search-only",
+    is_flag=True,
+    help="Do not use hybrid search mode, use vector search only.",
+)
+@click.option(
+    "--run-cli",
+    "-c",
+    is_flag=True,
+    help="Run as a command line tool instead of launching the Gradio UI",
+)
+@click.option(
+    "-e",
+    "--env",
+    "env",
+    default=None,
+    required=False,
+    help="The environment file to use, absolute path or related to package root.",
+)
 @click.option(
     "-l",
     "--log-level",
     show_default=True,
 )
 def search_extract_summarize(
     query: str,
+    input_mode: str,
+    output_mode: str,
     date_restrict: int,
     target_site: str,
     output_language: str,
     output_length: int,
     url_list_file: str,
+    extract_schema_file: str,
+    inference_model_name: str,
+    vector_search_only: bool,
+    run_cli: bool,
+    env: str,
     log_level: str,
 ):
     load_dotenv(dotenv_path=default_env_file, override=False)
+    logger = _get_logger(log_level)
+    if env:
+        load_dotenv(dotenv_path=env, override=True)
+    final_inference_model_name = inference_model_name
+    if final_inference_model_name is None:
+        final_inference_model_name = os.environ.get("DEFAULT_INFERENCE_MODEL")
+    if final_inference_model_name is None:
+        final_inference_model_name = "gpt-4o-mini"
+    if output_mode == "extract":
+        if not extract_schema_file:
+            raise Exception("Extract mode requires the --extract-schema-file argument.")
+        if not final_inference_model_name.lower().startswith("gpt"):
+            raise Exception("Extract mode requires the OpenAI GPT model.")
+    settings = AskSettings(
+        date_restrict=date_restrict,
+        target_site=target_site,
+        output_language=output_language,
+        output_length=output_length,
+        url_list=_read_url_list(url_list_file),
+        inference_model_name=final_inference_model_name,
+        hybrid_search=(not vector_search_only),
+        input_mode=InputMode(input_mode),
+        output_mode=OutputMode(output_mode),
+        extract_schema_str=_read_extract_schema_str(extract_schema_file),
+    )
+    if run_cli:
+        if query is None:
+            raise Exception("Query is required for the command line mode")
+        ask = Ask(logger=logger)
+        final_result = ask.run_query(query=query, settings=settings)
+        click.echo(final_result)
+    else:
         if os.environ.get("SHARE_GRADIO_UI", "false").lower() == "true":
             share_ui = True
         else:
             share_ui = False
         launch_gradio(
             query=query,
+            init_settings=settings,
             share_ui=share_ui,
+            logger=logger,
         )
 if __name__ == "__main__":

data/README.pdf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7507701ae3d2ee84506216ffca698c59f61bca1df77adb3312919bff1b049cd5
+size 234937

demos/local_files.md ADDED Viewed

	@@ -0,0 +1,48 @@

+```bash
+% python ask.py -i local -c -q "How does Ask.py work?"
+2024-11-20 10:00:09,335 - INFO - Initializing converter ...
+2024-11-20 10:00:09,335 - INFO - ✅ Successfully initialized Docling.
+2024-11-20 10:00:09,335 - INFO - Initializing chunker ...
+2024-11-20 10:00:09,550 - INFO - ✅ Successfully initialized Chonkie.
+2024-11-20 10:00:09,850 - INFO - Initializing database ...
+2024-11-20 10:00:09,933 - INFO - ✅ Successfully initialized DuckDB.
+2024-11-20 10:00:09,933 - INFO - Processing the local data directory ...
+2024-11-20 10:00:09,933 - INFO - Processing README.pdf ...
+Fetching 9 files: 100%|████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 11781.75it/s]
+2024-11-20 10:00:29,629 - INFO - ✅ Finished processing README.pdf.
+2024-11-20 10:00:29,629 - INFO - Chunking the text ...
+2024-11-20 10:00:29,639 - INFO - ✅ Generated 2 chunks ...
+2024-11-20 10:00:29,639 - INFO - Saving 2 chunks to DB ...
+2024-11-20 10:00:29,681 - INFO - Embedding 1 batches of chunks ...
+2024-11-20 10:00:30,337 - INFO - ✅ Finished embedding.
+2024-11-20 10:00:30,423 - INFO - ✅ Created the vector index ...
+2024-11-20 10:00:30,483 - INFO - ✅ Created the full text search index ...
+2024-11-20 10:00:30,483 - INFO - ✅ Successfully embedded and saved chunks to DB.
+2024-11-20 10:00:30,483 - INFO - Querying the vector DB to get context ...
+2024-11-20 10:00:30,773 - INFO - Running full-text search ...
+2024-11-20 10:00:30,796 - INFO - ✅ Got 2 matched chunks.
+2024-11-20 10:00:30,797 - INFO - Running inference with context ...
+2024-11-20 10:00:34,939 - INFO - ✅ Finished inference API call.
+2024-11-20 10:00:34,939 - INFO - Generating output ...
+# Answer
+Ask.py is a Python program designed to implement a search-extract-summarize flow, similar to AI search engines like Perplexity. It can be run through a command line interface or a GradIO user interface and allows for flexibility in controlling output and search behaviors[1].
+When a query is executed, Ask.py performs the following steps:
+1. Searches Google for the top 10 web pages related to the query.
+2. Crawls and scrapes the content of these pages.
+3. Breaks down the scraped text into chunks and saves them in a vector database.
+4. Conducts a vector search with the initial query to identify the top 10 matched text chunks.
+5. Optionally integrates full-text search results and uses a reranker to refine the results.
+6. Utilizes the selected chunks as context to query a language model (LLM) to generate a comprehensive answer.
+7. Outputs the answer along with references to the sources[1].
+Moreover, the program allows various configurations such as date restrictions, site targeting, output language, and output length. It can also scrape specified URL lists instead of performing a web search, making it highly versatile for search and data extraction tasks[2].
+# References
+[1] file:///Users/feng/work/github/ask.py/data/README.pdf
+[2] file:///Users/feng/work/github/ask.py/data/README.pdf
+```

demos/run_with_ollama.md ADDED Viewed

	@@ -0,0 +1,82 @@

+```bash
+% python ask.py -e .env.ollama -c -q "How does Ollama work?"
+2025-01-20 13:36:15,026 - INFO - No SEARCH_API_URL or SEARCH_API_KEYenv variable set.
+2025-01-20 13:36:15,026 - INFO - Using the default proxy at https://svc.leettools.com:8098
+2025-01-20 13:36:19,395 - INFO - Initializing converter ...
+2025-01-20 13:36:19,395 - INFO - ✅ Successfully initialized Docling.
+2025-01-20 13:36:19,395 - INFO - Initializing chunker ...
+2025-01-20 13:36:19,614 - INFO - ✅ Successfully initialized Chonkie.
+2025-01-20 13:36:19,917 - INFO - Initializing database ...
+2025-01-20 13:36:19,992 - INFO - ✅ Successfully initialized DuckDB.
+2025-01-20 13:36:19,992 - INFO - Searching the web ...
+2025-01-20 13:36:20,653 - INFO - ✅ Found 10 links for query: How does Ollama work?
+2025-01-20 13:36:20,653 - INFO - Scraping the URLs ...
+2025-01-20 13:36:20,653 - INFO - Scraping https://www.reddit.com/r/ollama/comments/197thp1/does_anyone_know_how_ollama_works_under_the_hood/ ...
+2025-01-20 13:36:20,654 - INFO - Scraping https://medium.com/@mauryaanoop3/ollama-a-deep-dive-into-running-large-language-models-locally-part-1-0a4b70b30982 ...
+2025-01-20 13:36:20,655 - INFO - Scraping https://www.reddit.com/r/LocalLLaMA/comments/1dhyxq8/why_use_ollama/ ...
+2025-01-20 13:36:20,656 - INFO - Scraping https://github.com/jmorganca/ollama/issues/1014 ...
+2025-01-20 13:36:20,657 - INFO - Scraping https://www.listedai.co/ai/ollama ...
+2025-01-20 13:36:20,657 - INFO - Scraping https://www.andreagrandi.it/posts/ollama-running-llm-locally/ ...
+2025-01-20 13:36:20,658 - INFO - Scraping https://itsfoss.com/ollama/ ...
+2025-01-20 13:36:20,659 - INFO - Scraping https://community.n8n.io/t/ollama-embedding-does-not-accept-the-model-but-using-it-with-http-request-works/64457 ...
+2025-01-20 13:36:20,659 - INFO - Scraping https://community.frame.work/t/ollama-framework-13-amd/53848 ...
+2025-01-20 13:36:20,660 - INFO - Scraping https://abvijaykumar.medium.com/ollama-brings-runtime-to-serve-llms-everywhere-8a23b6f6a1b4 ...
+2025-01-20 13:36:20,802 - INFO - ✅ Successfully scraped https://abvijaykumar.medium.com/ollama-brings-runtime-to-serve-llms-everywhere-8a23b6f6a1b4 with length: 6408
+2025-01-20 13:36:20,861 - INFO - ✅ Successfully scraped https://www.andreagrandi.it/posts/ollama-running-llm-locally/ with length: 10535
+2025-01-20 13:36:20,891 - INFO - ✅ Successfully scraped https://itsfoss.com/ollama/ with length: 8772
+2025-01-20 13:36:20,969 - INFO - ✅ Successfully scraped https://community.frame.work/t/ollama-framework-13-amd/53848 with length: 4434
+2025-01-20 13:36:21,109 - WARNING - Body text too short for url: https://github.com/jmorganca/ollama/issues/1014, length: 9
+2025-01-20 13:36:21,370 - INFO - ✅ Successfully scraped https://www.reddit.com/r/ollama/comments/197thp1/does_anyone_know_how_ollama_works_under_the_hood/ with length: 2116
+2025-01-20 13:36:21,378 - INFO - ✅ Successfully scraped https://medium.com/@mauryaanoop3/ollama-a-deep-dive-into-running-large-language-models-locally-part-1-0a4b70b30982 with length: 6594
+2025-01-20 13:36:21,432 - INFO - ✅ Successfully scraped https://www.reddit.com/r/LocalLLaMA/comments/1dhyxq8/why_use_ollama/ with length: 2304
+2025-01-20 13:36:21,734 - INFO - ✅ Successfully scraped https://community.n8n.io/t/ollama-embedding-does-not-accept-the-model-but-using-it-with-http-request-works/64457 with length: 2875
+2025-01-20 13:36:21,776 - INFO - ✅ Successfully scraped https://www.listedai.co/ai/ollama with length: 5516
+2025-01-20 13:36:21,776 - INFO - ✅ Scraped 9 URLs.
+2025-01-20 13:36:21,776 - INFO - Chunking the text ...
+2025-01-20 13:36:21,784 - INFO - ✅ Generated 18 chunks ...
+2025-01-20 13:36:21,784 - INFO - Saving 18 chunks to DB ...
+2025-01-20 13:36:21,807 - INFO - Embedding 9 batches of chunks ...
+2025-01-20 13:36:40,752 - INFO - ✅ Finished embedding.
+2025-01-20 13:36:40,930 - INFO - ✅ Created the vector index ...
+2025-01-20 13:36:41,010 - INFO - ✅ Created the full text search index ...
+2025-01-20 13:36:41,010 - INFO - ✅ Successfully embedded and saved chunks to DB.
+2025-01-20 13:36:41,011 - INFO - Querying the vector DB to get context ...
+2025-01-20 13:36:41,091 - INFO - Running full-text search ...
+2025-01-20 13:36:41,118 - INFO - ✅ Got 10 matched chunks.
+2025-01-20 13:36:41,118 - INFO - Running inference with context ...
+2025-01-20 13:37:59,233 - INFO - ✅ Finished inference API call.
+2025-01-20 13:37:59,234 - INFO - Generating output ...
+# Answer
+Here is the reformatted output:
+**Conclusion**
+Though there are plenty of similar tools, Ollama has become the most popular tool to run LLMs locally. The ease of use in installing different LLMs quickly make it ideal for beginners who want to use local AI.
+**Dealing with Issues**
+If you still have some questions, please feel free to ask in the comment section.
+**AI Tools**
+Here are some additional resources:
+* 20 Jan 2025 7 Raspberry Pi-Based Laptops and Tablets for Tinkerers
+* 17 Jan 2025 Adding Grouped Items in Waybar
+* Become a Better Linux User With the FOSS Weekly Newsletter, you learn useful Linux tips, discover applications, explore new distros and stay updated with the latest from Linux world,
+* I Ran the Famed SmolLM on Raspberry Pi TEN AI: Open Source Framework for Quickly Creating Real-Time Multimodal AI Agents
+# References
+[1] https://www.reddit.com/r/ollama/comments/197thp1/does_anyone_know_how_ollama_works_under_the_hood/
+[2] https://community.frame.work/t/ollama-framework-13-amd/53848
+[3] https://www.reddit.com/r/LocalLLaMA/comments/1dhyxq8/why_use_ollama/
+[4] https://itsfoss.com/ollama/
+[5] https://abvijaykumar.medium.com/ollama-brings-runtime-to-serve-llms-everywhere-8a23b6f6a1b4
+[6] https://www.listedai.co/ai/ollama
+[7] https://community.n8n.io/t/ollama-embedding-does-not-accept-the-model-but-using-it-with-http-request-works/64457
+[8] https://medium.com/@mauryaanoop3/ollama-a-deep-dive-into-running-large-language-models-locally-part-1-0a4b70b30982
+[9] https://itsfoss.com/ollama/
+[10] https://itsfoss.com/ollama
+```

demos/search_and_answer.md ADDED Viewed

	@@ -0,0 +1,71 @@

+```bash
+% python ask.py -c -q "Why do we need agentic RAG even if we have ChatGPT?"
+2024-11-20 10:03:49,810 - INFO - Initializing converter ...
+2024-11-20 10:03:49,810 - INFO - ✅ Successfully initialized Docling.
+2024-11-20 10:03:49,810 - INFO - Initializing chunker ...
+2024-11-20 10:03:50,052 - INFO - ✅ Successfully initialized Chonkie.
+2024-11-20 10:03:50,414 - INFO - Initializing database ...
+2024-11-20 10:03:50,544 - INFO - ✅ Successfully initialized DuckDB.
+2024-11-20 10:03:50,545 - INFO - Searching the web ...
+2024-11-20 10:03:51,239 - INFO - ✅ Found 10 links for query: Why do we need agentic RAG even if we have ChatGPT?
+2024-11-20 10:03:51,239 - INFO - Scraping the URLs ...
+2024-11-20 10:03:51,239 - INFO - Scraping https://community.openai.com/t/how-to-use-rag-properly-and-what-types-of-query-it-is-good-at/658204 ...
+2024-11-20 10:03:51,240 - INFO - Scraping https://www.reddit.com/r/LangChain/comments/1ey94rs/is_rag_still_a_thing/ ...
+2024-11-20 10:03:51,242 - INFO - Scraping https://community.openai.com/t/prompt-engineering-for-rag/621495 ...
+2024-11-20 10:03:51,242 - INFO - Scraping https://www.linkedin.com/posts/elijahbutler_can-you-use-chat-gpt-as-a-data-analyst-activity-7227666801688461312-qk6v ...
+2024-11-20 10:03:51,243 - INFO - Scraping https://www.reddit.com/r/ChatGPTCoding/comments/1cft751/my_experience_with_github_copilot_vs_cursor/ ...
+2024-11-20 10:03:51,244 - INFO - Scraping https://www.ben-evans.com/benedictevans/2024/6/8/building-ai-products ...
+2024-11-20 10:03:51,244 - INFO - Scraping https://news.ycombinator.com/item?id=40739982 ...
+2024-11-20 10:03:51,245 - INFO - Scraping https://www.linkedin.com/posts/andrewyng_github-andrewyngtranslation-agent-activity-7206347897938866176-5tDJ ...
+2024-11-20 10:03:51,247 - INFO - Scraping https://medium.com/@sandyshah1990/starting-to-learn-agentic-rag-e7ec916c83a2 ...
+2024-11-20 10:03:51,248 - INFO - Scraping https://www.linkedin.com/posts/kurtcagle_agentic-rag-personalizing-and-optimizing-activity-7198097129993613312-z7Sm ...
+2024-11-20 10:03:51,836 - INFO - ✅ Successfully scraped https://www.ben-evans.com/benedictevans/2024/6/8/building-ai-products with length: 8824
+2024-11-20 10:03:51,839 - INFO - ✅ Successfully scraped https://medium.com/@sandyshah1990/starting-to-learn-agentic-rag-e7ec916c83a2 with length: 18260
+2024-11-20 10:03:51,852 - INFO - ✅ Successfully scraped https://community.openai.com/t/how-to-use-rag-properly-and-what-types-of-query-it-is-good-at/658204 with length: 9895
+2024-11-20 10:03:51,869 - INFO - ✅ Successfully scraped https://community.openai.com/t/prompt-engineering-for-rag/621495 with length: 21898
+2024-11-20 10:03:52,038 - INFO - ✅ Successfully scraped https://news.ycombinator.com/item?id=40739982 with length: 122350
+2024-11-20 10:03:52,227 - INFO - ✅ Successfully scraped https://www.linkedin.com/posts/andrewyng_github-andrewyngtranslation-agent-activity-7206347897938866176-5tDJ with length: 35845
+2024-11-20 10:03:52,425 - INFO - ✅ Successfully scraped https://www.linkedin.com/posts/kurtcagle_agentic-rag-personalizing-and-optimizing-activity-7198097129993613312-z7Sm with length: 24524
+2024-11-20 10:03:52,480 - INFO - ✅ Successfully scraped https://www.linkedin.com/posts/elijahbutler_can-you-use-chat-gpt-as-a-data-analyst-activity-7227666801688461312-qk6v with length: 25621
+2024-11-20 10:03:52,949 - INFO - ✅ Successfully scraped https://www.reddit.com/r/ChatGPTCoding/comments/1cft751/my_experience_with_github_copilot_vs_cursor/ with length: 5138
+2024-11-20 10:03:52,996 - INFO - ✅ Successfully scraped https://www.reddit.com/r/LangChain/comments/1ey94rs/is_rag_still_a_thing/ with length: 2486
+2024-11-20 10:03:52,996 - INFO - ✅ Scraped 10 URLs.
+2024-11-20 10:03:52,996 - INFO - Chunking the text ...
+2024-11-20 10:03:53,044 - INFO - ✅ Generated 75 chunks ...
+2024-11-20 10:03:53,044 - INFO - Saving 75 chunks to DB ...
+2024-11-20 10:03:53,065 - INFO - Embedding 10 batches of chunks ...
+2024-11-20 10:03:54,563 - INFO - ✅ Finished embedding.
+2024-11-20 10:03:55,583 - INFO - ✅ Created the vector index ...
+2024-11-20 10:03:55,677 - INFO - ✅ Created the full text search index ...
+2024-11-20 10:03:55,679 - INFO - ✅ Successfully embedded and saved chunks to DB.
+2024-11-20 10:03:55,679 - INFO - Querying the vector DB to get context ...
+2024-11-20 10:03:56,092 - INFO - Running full-text search ...
+2024-11-20 10:03:56,118 - INFO - ✅ Got 15 matched chunks.
+2024-11-20 10:03:56,118 - INFO - Running inference with context ...
+2024-11-20 10:04:00,968 - INFO - ✅ Finished inference API call.
+2024-11-20 10:04:00,969 - INFO - Generating output ...
+# Answer
+Agentic RAG (Retrieval-Augmented Generation) is necessary even with the existence of ChatGPT due to its multi-faceted capabilities that enhance the overall processing and retrieval of information. Specifically, Agentic RAG employs multiple agents that can manage retrieval tasks, document comparisons, and even perform specific operations like calculations, which are not inherently available in a single model like ChatGPT. This allows for a more streamlined and efficient process when addressing complex queries that require synthesis from various data points, ensuring that no critical context is lost during retrieval and generation processes[1][4]. Additionally, RAG's framework allows for greater flexibility and precision in handling varied types of queries, especially those that require comparative analysis or handling large volumes of data that exceed typical model limitations[2][5][6]. Furthermore, it enables the use of specialized agents that can focus on unique tasks, making the whole system more dynamic and capable of tackling intricate demands in real-time applications[4][6].
+In short, while ChatGPT offers robust conversational capabilities, the agentic approach of RAG significantly broadens the scope and effectiveness of information processing for complex tasks.
+# References
+[1] https://community.openai.com/t/how-to-use-rag-properly-and-what-types-of-query-it-is-good-at/658204
+[2] https://community.openai.com/t/how-to-use-rag-properly-and-what-types-of-query-it-is-good-at/658204
+[3] https://community.openai.com/t/prompt-engineering-for-rag/621495
+[4] https://community.openai.com/t/how-to-use-rag-properly-and-what-types-of-query-it-is-good-at/658204
+[5] https://news.ycombinator.com/item?id=40739982
+[6] https://www.linkedin.com/posts/elijahbutler_can-you-use-chat-gpt-as-a-data-analyst-activity-7227666801688461312-qk6v
+[7] https://community.openai.com/t/prompt-engineering-for-rag/621495
+[8] https://news.ycombinator.com/item?id=40739982
+[9] https://www.linkedin.com/posts/elijahbutler_can-you-use-chat-gpt-as-a-data-analyst-activity-7227666801688461312-qk6v
+[10] https://www.linkedin.com/posts/elijahbutler_can-you-use-chat-gpt-as-a-data-analyst-activity-7227666801688461312-qk6v
+[11] https://community.openai.com/t/prompt-engineering-for-rag/621495
+[12] https://news.ycombinator.com/item?id=40739982
+[13] https://news.ycombinator.com/item?id=40739982
+[14] https://news.ycombinator.com/item?id=40739982
+[15] https://news.ycombinator.com/item?id=40739982
+```

demos/search_and_extract.md ADDED Viewed

	@@ -0,0 +1,291 @@

+```bash
+python ask.py -c -q "LLM Gen-AI Startups" -o extract --extract-schema-file instructions/extract_example.txt
+2024-11-20 10:06:34,308 - INFO - Initializing converter ...
+2024-11-20 10:06:34,308 - INFO - ✅ Successfully initialized Docling.
+2024-11-20 10:06:34,308 - INFO - Initializing chunker ...
+2024-11-20 10:06:34,546 - INFO - ✅ Successfully initialized Chonkie.
+2024-11-20 10:06:34,902 - INFO - Initializing database ...
+2024-11-20 10:06:35,047 - INFO - ✅ Successfully initialized DuckDB.
+2024-11-20 10:06:35,047 - INFO - Searching the web ...
+2024-11-20 10:06:35,409 - INFO - ✅ Found 10 links for query: LLM Gen-AI Startups
+2024-11-20 10:06:35,409 - INFO - Scraping the URLs ...
+2024-11-20 10:06:35,409 - INFO - Scraping https://www.ycombinator.com/companies/industry/generative-ai ...
+2024-11-20 10:06:35,409 - INFO - Scraping https://app.dealroom.co/lists/33530 ...
+2024-11-20 10:06:35,410 - INFO - Scraping https://explodingtopics.com/blog/generative-ai-startups ...
+2024-11-20 10:06:35,410 - INFO - Scraping https://www.reddit.com/r/Startup_Ideas/comments/1djstai/thoughts_on_llm_based_startups/ ...
+2024-11-20 10:06:35,411 - INFO - Scraping https://www.linkedin.com/pulse/20-gen-ai-healthcare-startups-shaping-future-recap-from-renee-yao-q7lkc ...
+2024-11-20 10:06:35,413 - INFO - Scraping https://www.reddit.com/r/learnprogramming/comments/1e0gzbo/are_most_ai_startups_these_days_just_openai/ ...
+2024-11-20 10:06:35,414 - INFO - Scraping https://a16z.com/ai/ ...
+2024-11-20 10:06:35,415 - INFO - Scraping https://praful-krishna.medium.com/thinking-of-an-llm-based-project-or-startup-dont-dd92c1a54237 ...
+2024-11-20 10:06:35,415 - INFO - Scraping https://medium.com/point-nine-news/where-are-the-opportunities-for-new-startups-in-generative-ai-f48068b5f8f9 ...
+2024-11-20 10:06:35,416 - INFO - Scraping https://www.eweek.com/artificial-intelligence/generative-ai-startups/ ...
+2024-11-20 10:06:35,636 - INFO - ✅ Successfully scraped https://explodingtopics.com/blog/generative-ai-startups with length: 17632
+2024-11-20 10:06:35,992 - INFO - ✅ Successfully scraped https://praful-krishna.medium.com/thinking-of-an-llm-based-project-or-startup-dont-dd92c1a54237 with length: 8612
+2024-11-20 10:06:36,133 - INFO - ✅ Successfully scraped https://medium.com/point-nine-news/where-are-the-opportunities-for-new-startups-in-generative-ai-f48068b5f8f9 with length: 3649
+2024-11-20 10:06:36,608 - INFO - ✅ Successfully scraped https://www.linkedin.com/pulse/20-gen-ai-healthcare-startups-shaping-future-recap-from-renee-yao-q7lkc with length: 13736
+2024-11-20 10:06:36,675 - INFO - ✅ Successfully scraped https://app.dealroom.co/lists/33530 with length: 208
+2024-11-20 10:06:36,934 - INFO - ✅ Successfully scraped https://a16z.com/ai/ with length: 14737
+2024-11-20 10:06:37,217 - INFO - ✅ Successfully scraped https://www.reddit.com/r/learnprogramming/comments/1e0gzbo/are_most_ai_startups_these_days_just_openai/ with length: 2069
+2024-11-20 10:06:37,314 - INFO - ✅ Successfully scraped https://www.reddit.com/r/Startup_Ideas/comments/1djstai/thoughts_on_llm_based_startups/ with length: 3112
+2024-11-20 10:06:37,556 - INFO - ✅ Successfully scraped https://www.ycombinator.com/companies/industry/generative-ai with length: 53344
+2024-11-20 10:06:37,582 - INFO - ✅ Successfully scraped https://www.eweek.com/artificial-intelligence/generative-ai-startups/ with length: 69127
+2024-11-20 10:06:37,582 - INFO - ✅ Scraped 10 URLs.
+2024-11-20 10:06:37,582 - INFO - Extracting structured data ...
+2024-11-20 10:06:59,368 - INFO - ✅ Finished inference API call. Extracted 99 items from https://www.ycombinator.com/companies/industry/generative-ai.
+2024-11-20 10:06:59,869 - INFO - ✅ Finished inference API call. Extracted 0 items from https://app.dealroom.co/lists/33530.
+2024-11-20 10:07:07,198 - INFO - ✅ Finished inference API call. Extracted 33 items from https://explodingtopics.com/blog/generative-ai-startups.
+2024-11-20 10:07:08,094 - INFO - ✅ Finished inference API call. Extracted 1 items from https://www.reddit.com/r/Startup_Ideas/comments/1djstai/thoughts_on_llm_based_startups/.
+2024-11-20 10:07:12,658 - INFO - ✅ Finished inference API call. Extracted 20 items from https://www.linkedin.com/pulse/20-gen-ai-healthcare-startups-shaping-future-recap-from-renee-yao-q7lkc.
+2024-11-20 10:07:13,667 - INFO - ✅ Finished inference API call. Extracted 0 items from https://www.reddit.com/r/learnprogramming/comments/1e0gzbo/are_most_ai_startups_these_days_just_openai/.
+2024-11-20 10:07:15,321 - INFO - ✅ Finished inference API call. Extracted 6 items from https://a16z.com/ai/.
+2024-11-20 10:07:17,139 - INFO - ✅ Finished inference API call. Extracted 3 items from https://praful-krishna.medium.com/thinking-of-an-llm-based-project-or-startup-dont-dd92c1a54237.
+2024-11-20 10:07:19,724 - INFO - ✅ Finished inference API call. Extracted 7 items from https://medium.com/point-nine-news/where-are-the-opportunities-for-new-startups-in-generative-ai-f48068b5f8f9.
+2024-11-20 10:07:39,284 - INFO - ✅ Finished inference API call. Extracted 75 items from https://www.eweek.com/artificial-intelligence/generative-ai-startups/.
+2024-11-20 10:07:39,284 - INFO - ✅ Finished extraction from all urls.
+2024-11-20 10:07:39,284 - INFO - Generating output ...
+name,description,SourceURL
+Humanloop,"Humanloop is the LLM evals platform for enterprises. Teams at Gusto, Vanta and Duolingo use Humanloop to ship reliable AI products. We enable you to adopt best practices for prompt management, evaluation and observability.",https://www.ycombinator.com/companies/industry/generative-ai
+Truewind,"Truewind is AI-powered bookkeeping and finance software for startups. Using GPT-3, Truewind captures the business context that only founders have, making accounting easier and more accurate.",https://www.ycombinator.com/companies/industry/generative-ai
+Shepherd,"Shepherd is a Learning assistant for schools to provide to their students. Shepherd seamlessly combines AI-enabled self-study, affordable tutoring, peer collaboration, and analytics for a personalized learning experience.",https://www.ycombinator.com/companies/industry/generative-ai
+Remy,"Use Remy to discover upcoming engineering work, perform automatic triage and speed up your design reviews.",https://www.ycombinator.com/companies/industry/generative-ai
+Hyperbound,Hyperbound is a simulated AI sales roleplay platform that turns ICP descriptions into interactive AI buyers in less than 2 minutes.,https://www.ycombinator.com/companies/industry/generative-ai
+AI.Fashion,AI.Fashion is the AI creative suite for the fashion industry - modernizing the traditional design and go to market fashion processes with our advanced AI platform and design tools.,https://www.ycombinator.com/companies/industry/generative-ai
+Infobot,"By using LLMs to generate news content, we reduce the cost of generating an article by over 1000x.",https://www.ycombinator.com/companies/industry/generative-ai
+Magic Loops,Magic Loops are the fastest way to automate (almost) anything by combining generative AI with code.,https://www.ycombinator.com/companies/industry/generative-ai
+Humanlike,"A better alternative to outsourcing accounts payable and receivable, using human-like AI to process invoices more efficiently.",https://www.ycombinator.com/companies/industry/generative-ai
+Atla,"Atla helps developers find AI mistakes at scale, so they can build more reliable GenAI applications.",https://www.ycombinator.com/companies/industry/generative-ai
+Contour,"Contour is building next-generation quality assurance to free engineering time and test products, end-to-end.",https://www.ycombinator.com/companies/industry/generative-ai
+Mandel AI,Mandel surfaces supply chain disruptions and supplier updates with email AI.,https://www.ycombinator.com/companies/industry/generative-ai
+Aqua Voice,Aqua is a voice-driven text editor that lets you speak naturally and writes down what you meant.,https://www.ycombinator.com/companies/industry/generative-ai
+Sapling.ai,Sapling offers an API and SDK to help businesses integrate language models into their applications.,https://www.ycombinator.com/companies/industry/generative-ai
+askLio,"askLio builds AI Copilots to help procurement teams at enterprises, reducing the procurement process from weeks to hours.",https://www.ycombinator.com/companies/industry/generative-ai
+Marblism,"Marblism helps user describe their app, generating the database, back-end, and front-end.",https://www.ycombinator.com/companies/industry/generative-ai
+Lumona,Lumona is an AI-enabled search engine featuring perspectives from social media to help understand search results.,https://www.ycombinator.com/companies/industry/generative-ai
+DraftWise,DraftWise harnesses the power of AI for drafting and negotiation in the legal industry.,https://www.ycombinator.com/companies/industry/generative-ai
+Montrey AI,Montrey AI helps companies analyze qualitative feedback and user engagement data.,https://www.ycombinator.com/companies/industry/generative-ai
+Synch,Your Sales and Sales Ops team in a unified platform.,https://www.ycombinator.com/companies/industry/generative-ai
+Tegon,Tegon is an open-source issue tracking tool designed for engineering teams.,https://www.ycombinator.com/companies/industry/generative-ai
+Empower,Empower is a developer platform for fine-tuned LLMs.,https://www.ycombinator.com/companies/industry/generative-ai
+Spine AI,Spine AI effectively translates business context and data schema into an AI analyst.,https://www.ycombinator.com/companies/industry/generative-ai
+TruthSuite,TruthSuite provides a platform to enhance due diligence and research processes.,https://www.ycombinator.com/companies/industry/generative-ai
+Senso,Senso is building an AI-powered knowledge base for customer support.,https://www.ycombinator.com/companies/industry/generative-ai
+Parea AI,Parea AI is the essential developer platform for debugging and monitoring LLM applications.,https://www.ycombinator.com/companies/industry/generative-ai
+Shasta Health,Shasta Health enables physical therapists to go independent using AI agents.,https://www.ycombinator.com/companies/industry/generative-ai
+Arcimus,Arcimus uses LLMs to automate insurance premium audits.,https://www.ycombinator.com/companies/industry/generative-ai
+Tavus,"At Tavus, we're building the human layer of AI for natural interaction.",https://www.ycombinator.com/companies/industry/generative-ai
+Leena AI,"Leena AI answers employee questions automatically, streamlining HR processes.",https://www.ycombinator.com/companies/industry/generative-ai
+Vocode,Vocode is an open-source voice AI platform.,https://www.ycombinator.com/companies/industry/generative-ai
+OfOne,OfOne builds software to automate order taking at fast-food drive-thrus.,https://www.ycombinator.com/companies/industry/generative-ai
+Spellbrush,Spellbrush is the world's leading generative AI studio.,https://www.ycombinator.com/companies/industry/generative-ai
+VetRec,VetRec automates the process of taking clinical notes for veterinarians.,https://www.ycombinator.com/companies/industry/generative-ai
+Orangewood Labs,Orangewood Labs creates affordable AI-powered industrial robotic arms.,https://www.ycombinator.com/companies/industry/generative-ai
+Credal.ai,Credal.ai allows any employee to build AI Assistants for enterprise.,https://www.ycombinator.com/companies/industry/generative-ai
+Diffuse Bio,Diffuse is building generative AI for protein design.,https://www.ycombinator.com/companies/industry/generative-ai
+RenderNet,RenderNet transforms imaginative concepts into high-quality images.,https://www.ycombinator.com/companies/industry/generative-ai
+Reworkd,Reworkd works on multimodal LLM agents to extract web data at scale.,https://www.ycombinator.com/companies/industry/generative-ai
+Maven Bio,Maven Bio empowers business development teams with AI for BioPharma.,https://www.ycombinator.com/companies/industry/generative-ai
+Mathos,Mathos AI is the leading AI math solver for educational productivity.,https://www.ycombinator.com/companies/industry/generative-ai
+Traceloop,Traceloop monitors the quality of LLM applications in production.,https://www.ycombinator.com/companies/industry/generative-ai
+MediSearch,MediSearch provides direct answers to medical questions.,https://www.ycombinator.com/companies/industry/generative-ai
+Syncly,Syncly helps product teams analyze communications to prevent churn.,https://www.ycombinator.com/companies/industry/generative-ai
+Magic Patterns,Magic Patterns helps software teams prototype product ideas.,https://www.ycombinator.com/companies/industry/generative-ai
+Glade,Glade uses AI to create a new genre of video games.,https://www.ycombinator.com/companies/industry/generative-ai
+Pyq AI,Pyq AI builds automations to streamline information extraction.,https://www.ycombinator.com/companies/industry/generative-ai
+Indexical,Indexical is a developer tool for SaaS and B2B.,https://www.ycombinator.com/companies/industry/generative-ai
+Kobalt Labs,Kobalt automates manual risk and compliance operations.,https://www.ycombinator.com/companies/industry/generative-ai
+Khoj,Khoj is an open-source AI application for personalized assistance.,https://www.ycombinator.com/companies/industry/generative-ai
+Flint,Flint is an AI platform for K-12 education.,https://www.ycombinator.com/companies/industry/generative-ai
+Reforged Labs,Reforged Labs launches AI-powered video creation service.,https://www.ycombinator.com/companies/industry/generative-ai
+Unsloth AI,Unsloth helps builders create custom models better and faster.,https://www.ycombinator.com/companies/industry/generative-ai
+Rosebud AI,Rosebud builds the AI Roblox for easy game creation.,https://www.ycombinator.com/companies/industry/generative-ai
+VectorShift,VectorShift is an AI automations platform for knowledge generation.,https://www.ycombinator.com/companies/industry/generative-ai
+Inari,Inari surfaces customer insights from feedback automatically.,https://www.ycombinator.com/companies/industry/generative-ai
+VideoGen,"VideoGen makes it easy to create professional, copyright-free videos.",https://www.ycombinator.com/companies/industry/generative-ai
+Infeedo AI,Infeedo AI helps enhance employee experience with conversational AI.,https://www.ycombinator.com/companies/industry/generative-ai
+sudocode,sudocode lets users code in plain English.,https://www.ycombinator.com/companies/industry/generative-ai
+ideate.xyz,ideate.xyz is a graphics design as API platform.,https://www.ycombinator.com/companies/industry/generative-ai
+PlayHT,Play is a Voice AI company specializing in conversational voice models.,https://www.ycombinator.com/companies/industry/generative-ai
+Inventive AI,Inventive is an AI-powered platform for managing RFP & questionnaire responses.,https://www.ycombinator.com/companies/industry/generative-ai
+Proxis,Proxis is dedicated to LLM distillation unlock production ready models.,https://www.ycombinator.com/companies/industry/generative-ai
+Zuni,Zuni is an AI productivity tool.,https://www.ycombinator.com/companies/industry/generative-ai
+reworks,reworks helps integrate agentic AI companies with external software.,https://www.ycombinator.com/companies/industry/generative-ai
+Kalam Labs,Kalam Labs is creating a space for kids to participate in ambitious space missions.,https://www.ycombinator.com/companies/industry/generative-ai
+Passage,Passage is a co-pilot for the customs brokering space.,https://www.ycombinator.com/companies/industry/generative-ai
+camfer,camfer helps mechanical engineers collaborate on design tasks.,https://www.ycombinator.com/companies/industry/generative-ai
+Pibit.ai,Pibit transforms loss run files into comprehensive reports.,https://www.ycombinator.com/companies/industry/generative-ai
+Merse,Merse builds visual stories like comics but with voices and sound effects.,https://www.ycombinator.com/companies/industry/generative-ai
+Letterdrop,Letterdrop helps understand what content drives revenue.,https://www.ycombinator.com/companies/industry/generative-ai
+Pulse AI,Pulse automates procurement with AI.,https://www.ycombinator.com/companies/industry/generative-ai
+Tara AI,Tara AI measures and improves engineering efficiency.,https://www.ycombinator.com/companies/industry/generative-ai
+Jasper.ai,Jasper is an AI content platform for creators and companies.,https://www.ycombinator.com/companies/industry/generative-ai
+Ego,Ego is a generative AI-powered simulation engine for creators.,https://www.ycombinator.com/companies/industry/generative-ai
+Sameday,Sameday's AI Sales Agent answers calls for home service businesses.,https://www.ycombinator.com/companies/industry/generative-ai
+dmodel,dmodel lets companies manipulate AI model thoughts in real time.,https://www.ycombinator.com/companies/industry/generative-ai
+Playground,Playground combines AI research and product design.,https://www.ycombinator.com/companies/industry/generative-ai
+Hypotenuse AI,Hypotenuse turns keywords into blog articles and copywriting.,https://www.ycombinator.com/companies/industry/generative-ai
+Simplify,Simplify is re-imagining the job-searching process.,https://www.ycombinator.com/companies/industry/generative-ai
+Mem0,Mem0 provides a memory layer for LLM applications.,https://www.ycombinator.com/companies/industry/generative-ai
+Benchify,Benchify is a code review tool that tests code rigorously.,https://www.ycombinator.com/companies/industry/generative-ai
+Saturn,Saturn is an AI-powered operating system for wealth management.,https://www.ycombinator.com/companies/industry/generative-ai
+MagiCode,MagiCode automates testing code in the frontend.,https://www.ycombinator.com/companies/industry/generative-ai
+Redouble AI,Redouble AI scales human-in-the-loop for AI workflows.,https://www.ycombinator.com/companies/industry/generative-ai
+Ankr Health,Ankr uses generative AI to recreate clinic functions.,https://www.ycombinator.com/companies/industry/generative-ai
+innkeeper,innkeeper provides dynamic pricing and other automations for hotels.,https://www.ycombinator.com/companies/industry/generative-ai
+AlphaWatch AI,AlphaWatch AI improves research for hedge funds using LLMs.,https://www.ycombinator.com/companies/industry/generative-ai
+D-ID,D-ID generates realistic high-quality AI personas using deep-learning.,https://www.ycombinator.com/companies/industry/generative-ai
+iollo,iollo is an at-home metabolomics test for health optimization.,https://www.ycombinator.com/companies/industry/generative-ai
+Unify,Unify allows building evals for LLMs for production.,https://www.ycombinator.com/companies/industry/generative-ai
+Activeloop,Activeloop provides APIs for collaborative AI datasets.,https://www.ycombinator.com/companies/industry/generative-ai
+Moonvalley,Moonvalley is building a creative studio powered by generative AI.,https://www.ycombinator.com/companies/industry/generative-ai
+Kura AI,Kura is SOTA for giving AI agents the tools for website interactions.,https://www.ycombinator.com/companies/industry/generative-ai
+MixerBox,MixerBox helps people live easier through mobile apps.,https://www.ycombinator.com/companies/industry/generative-ai
+SchemeFlow,SchemeFlow automates approvals for construction projects.,https://www.ycombinator.com/companies/industry/generative-ai
+ZOKO,Zoko facilitates business communication on WhatsApp.,https://www.ycombinator.com/companies/industry/generative-ai
+Praxos,Praxos allows insurance professionals to automate their operations.,https://www.ycombinator.com/companies/industry/generative-ai
+Odo,Odo helps companies win government contracts using AI.,https://www.ycombinator.com/companies/industry/generative-ai
+Cohere,Cohere is an AI startup that builds multilingual LLMs for enterprise businesses to streamline tasks.,https://explodingtopics.com/blog/generative-ai-startups
+Hugging Face,"Hugging Face is a collaborative AI community that creates tools for developers, with over 61,000 pre-trained models and 7,000 datasets.",https://explodingtopics.com/blog/generative-ai-startups
+Tabnine,Tabnine is an AI assistant for software developers that uses generative AI to predict or suggest the next lines of code.,https://explodingtopics.com/blog/generative-ai-startups
+Soundraw,Soundraw is a royalty-free AI music generator that allows creators to make original songs and retain ownership.,https://explodingtopics.com/blog/generative-ai-startups
+Tome.app,Tome is an AI-powered storytelling platform that facilitates the creation of presentations using generative AI.,https://explodingtopics.com/blog/generative-ai-startups
+AssemblyAI,AssemblyAI is an AI-as-a-service startup providing APIs for automated speech transcription and advanced content moderation.,https://explodingtopics.com/blog/generative-ai-startups
+Promptbase,Promptbase is a marketplace for buying and selling prompts to generate predictive results using generative AI tools.,https://explodingtopics.com/blog/generative-ai-startups
+PhotoRoom,PhotoRoom is an AI-powered photo editing tool that blends generative AI with traditional editing tools.,https://explodingtopics.com/blog/generative-ai-startups
+Taskade,"Taskade is a generative AI productivity tool focused on task management, note-taking, and team collaboration.",https://explodingtopics.com/blog/generative-ai-startups
+Synthesia,Synthesia AI is a generative AI video maker that creates videos from text inputs.,https://explodingtopics.com/blog/generative-ai-startups
+Humata AI,Humata AI integrates with desktop to let users ask questions and get answers about specific documents.,https://explodingtopics.com/blog/generative-ai-startups
+Chatbase,Chatbase is an integrated chatbot for websites that provides instant answers to customer inquiries.,https://explodingtopics.com/blog/generative-ai-startups
+Stability AI,Stability AI is the creator of Stable Diffusion and develops open-source models for image generation.,https://explodingtopics.com/blog/generative-ai-startups
+Anyword,Anyword is a generative AI content generation platform using natural language processing to write copy.,https://explodingtopics.com/blog/generative-ai-startups
+Rephrase AI,Rephrase AI is a text-to-video generation platform allowing customers to create videos with customizable avatars.,https://explodingtopics.com/blog/generative-ai-startups
+Inworld AI,Inworld AI implements AI-powered character generation for video games using natural language processing.,https://explodingtopics.com/blog/generative-ai-startups
+Runway,Runway is a generative AI video editing platform that creates video clips based on text prompts.,https://explodingtopics.com/blog/generative-ai-startups
+Sudowrite,Sudowrite is an AI writing assistant specifically designed for novel writing and storytelling.,https://explodingtopics.com/blog/generative-ai-startups
+Steve.ai,Steve.ai is an online video creation platform that turns text prompts into animated videos.,https://explodingtopics.com/blog/generative-ai-startups
+PlayHT,PlayHT is a text-to-speech software using generative AI to convert written text into human-like audio.,https://explodingtopics.com/blog/generative-ai-startups
+Elicit,Elicit is a generative AI research tool for analyzing and summarizing academic papers.,https://explodingtopics.com/blog/generative-ai-startups
+TalkPal,TalkPal is an AI-powered language learning platform offering personalized tutor sessions in 57 languages.,https://explodingtopics.com/blog/generative-ai-startups
+Dubverse,Dubverse is an AI video dubbing platform that translates videos into multiple languages.,https://explodingtopics.com/blog/generative-ai-startups
+Codeium,Codeium is an AI-powered toolkit for developers to assist with code creation and translation.,https://explodingtopics.com/blog/generative-ai-startups
+Fliki,Fliki is an AI video and audio generation platform allowing for quick video creation from text prompts.,https://explodingtopics.com/blog/generative-ai-startups
+LOVO AI,LOVO is an AI voice generator capable of creating realistic voice cloning and text-to-speech functionality.,https://explodingtopics.com/blog/generative-ai-startups
+Decktopus,Decktopus helps users create presentations from prompts by generating personalized slide content.,https://explodingtopics.com/blog/generative-ai-startups
+Character.ai,Character AI is a generative AI platform for creating animated 3D characters that interact in conversations.,https://explodingtopics.com/blog/generative-ai-startups
+Descript,Descript is a generative AI video and audio editing application designed for podcasters and videographers.,https://explodingtopics.com/blog/generative-ai-startups
+Papercup,Papercup uses machine learning to translate speech and create voiceovers for video content.,https://explodingtopics.com/blog/generative-ai-startups
+Vizcom,Vizcom is a generative AI tool that assists designers by turning sketches into 3D concept drawings.,https://explodingtopics.com/blog/generative-ai-startups
+Vidnoz,Vidnoz is a free AI video platform enabling users to create videos with various AI features.,https://explodingtopics.com/blog/generative-ai-startups
+Scalenut,Scalenut is a generative AI-powered SEO and content marketing platform useful for content creation and optimization.,https://explodingtopics.com/blog/generative-ai-startups
+Autonomous Agents,"Startups focused on autonomous agents, which have potential for genuine problem-solving using AI.",https://www.reddit.com/r/Startup_Ideas/comments/1djstai/thoughts_on_llm_based_startups/
+Huma.AI,"A generative AI for life sciences SaaS platform, recognized by Gartner, following its collaboration with OpenAI to deploy a validated GenAI solution for medical affairs.",https://www.linkedin.com/pulse/20-gen-ai-healthcare-startups-shaping-future-recap-from-renee-yao-q7lkc
+Viz.ai,"A medical imaging startup specializing in stroke care, using LLMs for early disease detection.",https://www.linkedin.com/pulse/20-gen-ai-healthcare-startups-shaping-future-recap-from-renee-yao-q7lkc
+Arionkoder,"A product development studio and AI lab service company with expertise in AI, computer vision, and natural language processing.",https://www.linkedin.com/pulse/20-gen-ai-healthcare-startups-shaping-future-recap-from-renee-yao-q7lkc
+HeHealth,Employs AI and LLM technologies to deliver efficient recommendations for male care.,https://www.linkedin.com/pulse/20-gen-ai-healthcare-startups-shaping-future-recap-from-renee-yao-q7lkc
+HOPPR,A multimodal imaging platform that facilitates deep image analysis and improves medical processes.,https://www.linkedin.com/pulse/20-gen-ai-healthcare-startups-shaping-future-recap-from-renee-yao-q7lkc
+Medical IP,A medical metaverse solution utilizing generative AI for streamlined medical imaging segmentation.,https://www.linkedin.com/pulse/20-gen-ai-healthcare-startups-shaping-future-recap-from-renee-yao-q7lkc
+NexusMD,An LLM-powered medical imaging platform that automates medical imaging data capture.,https://www.linkedin.com/pulse/20-gen-ai-healthcare-startups-shaping-future-recap-from-renee-yao-q7lkc
+Abridge,A generative AI for clinical documentation that converts patient-clinician conversations into structured clinical notes.,https://www.linkedin.com/pulse/20-gen-ai-healthcare-startups-shaping-future-recap-from-renee-yao-q7lkc
+Autonomize AI,A healthcare-optimized AI platform utilizing several LLMs for various operational efficiencies.,https://www.linkedin.com/pulse/20-gen-ai-healthcare-startups-shaping-future-recap-from-renee-yao-q7lkc
+DeepScribe,A med-tech firm leveraging LLMs to automate clinical documentation.,https://www.linkedin.com/pulse/20-gen-ai-healthcare-startups-shaping-future-recap-from-renee-yao-q7lkc
+HiLabs,Works with major health plans to refine dirty data using advanced AI and LLMs.,https://www.linkedin.com/pulse/20-gen-ai-healthcare-startups-shaping-future-recap-from-renee-yao-q7lkc
+Nabla,"Offers Copilot, an ambient AI solution for clinical note generation.",https://www.linkedin.com/pulse/20-gen-ai-healthcare-startups-shaping-future-recap-from-renee-yao-q7lkc
+AgentifAI,A voice-first AI assistant for healthcare that enhances patient customer experience.,https://www.linkedin.com/pulse/20-gen-ai-healthcare-startups-shaping-future-recap-from-renee-yao-q7lkc
+Artisight,Deployed in hospitals with an end-to-end sensor fusion platform solution leveraging an encoder LLM.,https://www.linkedin.com/pulse/20-gen-ai-healthcare-startups-shaping-future-recap-from-renee-yao-q7lkc
+dacadoo,A digital health platform connecting to various devices and integrating an LLM-based streaming model.,https://www.linkedin.com/pulse/20-gen-ai-healthcare-startups-shaping-future-recap-from-renee-yao-q7lkc
+Hippocratic AI,Developing the healthcare industry’s first safety-focused LLM for patient-facing applications.,https://www.linkedin.com/pulse/20-gen-ai-healthcare-startups-shaping-future-recap-from-renee-yao-q7lkc
+Idoven,"Developed Willem-AI, an AI-powered cardiology platform for identifying and diagnosing patients.",https://www.linkedin.com/pulse/20-gen-ai-healthcare-startups-shaping-future-recap-from-renee-yao-q7lkc
+Inference Analytics,A generative AI healthcare platform trained on 450M+ medical records with applications for healthcare parties.,https://www.linkedin.com/pulse/20-gen-ai-healthcare-startups-shaping-future-recap-from-renee-yao-q7lkc
+Pingoo,An AI health chatbot that provides personalized health education and engagement for diabetes patients.,https://www.linkedin.com/pulse/20-gen-ai-healthcare-startups-shaping-future-recap-from-renee-yao-q7lkc
+Talkie.ai,Automates patient phone interactions using AI voice and LLM technology.,https://www.linkedin.com/pulse/20-gen-ai-healthcare-startups-shaping-future-recap-from-renee-yao-q7lkc
+LLMflation,LLM inference cost is going down fast.,https://a16z.com/ai/
+How to Build a Thriving AI Ecosystem,Insights on building a successful AI ecosystem.,https://a16z.com/ai/
+The Economic Case for Generative AI and Foundation Models,Exploring the financial implications and advantages of generative AI.,https://a16z.com/ai/
+Emerging Architectures for LLM Applications,Discussing new architectural models for LLM applications.,https://a16z.com/ai/
+How Generative AI Is Remaking UI/UX Design,Impact of generative AI on user interface and user experience design.,https://a16z.com/ai/
+The Top 100 Gen AI Consumer Apps,Analyzing the most popular generative AI consumer applications.,https://a16z.com/ai/
+OpenAI,"Developer of ChatGPT and GPT-4, providing LLM APIs with functionalities like plugins, function calling, and integration with Whisper models.",https://praful-krishna.medium.com/thinking-of-an-llm-based-project-or-startup-dont-dd92c1a54237
+Coseer,A startup that faced challenges in convincing the market to adopt its LLM-based solutions.,https://praful-krishna.medium.com/thinking-of-an-llm-based-project-or-startup-dont-dd92c1a54237
+Anthropic,An LLM provider known for having reasonable and transparent security policies.,https://praful-krishna.medium.com/thinking-of-an-llm-based-project-or-startup-dont-dd92c1a54237
+beautiful.ai,A startup creating innovative tools for presentations.,https://medium.com/point-nine-news/where-are-the-opportunities-for-new-startups-in-generative-ai-f48068b5f8f9
+Tome,A startup providing a platform for presentations.,https://medium.com/point-nine-news/where-are-the-opportunities-for-new-startups-in-generative-ai-f48068b5f8f9
+Rows,A startup offering tools for spreadsheets.,https://medium.com/point-nine-news/where-are-the-opportunities-for-new-startups-in-generative-ai-f48068b5f8f9
+mem,A startup focused on note-taking solutions.,https://medium.com/point-nine-news/where-are-the-opportunities-for-new-startups-in-generative-ai-f48068b5f8f9
+Clio,A practice management solution for law firms that has access to extensive data.,https://medium.com/point-nine-news/where-are-the-opportunities-for-new-startups-in-generative-ai-f48068b5f8f9
+Bench,An accounting service that exemplifies the auto-pilot business model.,https://medium.com/point-nine-news/where-are-the-opportunities-for-new-startups-in-generative-ai-f48068b5f8f9
+Pilot,A recent accounting service exploring the auto-pilot approach.,https://medium.com/point-nine-news/where-are-the-opportunities-for-new-startups-in-generative-ai-f48068b5f8f9
+OpenAI,"OpenAI is the highest profile company in the generative AI space, known for its prebuilt AI solutions and API and application development support for developers.",https://www.eweek.com/artificial-intelligence/generative-ai-startups/
+Anthropic,"Anthropic’s Claude platform focuses on content generation, providing a customizable chatbot experience.",https://www.eweek.com/artificial-intelligence/generative-ai-startups/
+Cohere,Cohere offers NLP solutions designed to support business operations through its conversational AI agent.,https://www.eweek.com/artificial-intelligence/generative-ai-startups/
+Glean,Glean is an enterprise search company that uses deep-learning models to understand and answer natural language queries.,https://www.eweek.com/artificial-intelligence/generative-ai-startups/
+Jasper,"Jasper's core product is designed for marketing content generation, helping users create social media, advertising, and blog content.",https://www.eweek.com/artificial-intelligence/generative-ai-startups/
+Hugging Face,"Hugging Face is a community forum for AI and ML model development, known for its open-source LLM that generates content in multiple languages.",https://www.eweek.com/artificial-intelligence/generative-ai-startups/
+Inflection AI,"Inflection AI focuses on personal AI tools, including Pi, which emphasizes colloquial conversation.",https://www.eweek.com/artificial-intelligence/generative-ai-startups/
+Stability AI,"Stability AI is known for its popular app Stable Diffusion, a tool for image and video content generation.",https://www.eweek.com/artificial-intelligence/generative-ai-startups/
+MOSTLY AI,"MOSTLY AI’s platform balances data democratization with data security, specializing in synthetic data generation.",https://www.eweek.com/artificial-intelligence/generative-ai-startups/
+Lightricks,"Lightricks creates AI-powered apps for media editing, including notable products like Facetune.",https://www.eweek.com/artificial-intelligence/generative-ai-startups/
+AI21 Labs,AI21 Labs creates tools for contextual natural language processing and offers third-party developers access to its language models.,https://www.eweek.com/artificial-intelligence/generative-ai-startups/
+Tabnine,"Tabnine offers generative AI code assistance for software development, focusing on code completion and automation.",https://www.eweek.com/artificial-intelligence/generative-ai-startups/
+Mistral AI,Mistral AI provides access to open generative AI models and developer-friendly resources.,https://www.eweek.com/artificial-intelligence/generative-ai-startups/
+Codeium,Codeium provides resources for generating logical code and autocompletion for users.,https://www.eweek.com/artificial-intelligence/generative-ai-startups/
+Clarifai,"Clarifai's platform supports AI-driven data labeling and preparation, alongside model building capabilities.",https://www.eweek.com/artificial-intelligence/generative-ai-startups/
+Gong,Gong offers revenue intelligence solutions using AI to support customer service and sales effectiveness.,https://www.eweek.com/artificial-intelligence/generative-ai-startups/
+Twain,Twain is an AI writing assistant aimed at helping sales professionals generate effective outreach content.,https://www.eweek.com/artificial-intelligence/generative-ai-startups/
+Bertha.ai,Bertha.ai is a content generation application specifically designed for WordPress and similar platforms.,https://www.eweek.com/artificial-intelligence/generative-ai-startups/
+Tome,"Tome creates a versatile platform for AI-based presentations, helping users generate insightful content.",https://www.eweek.com/artificial-intelligence/generative-ai-startups/
+CopyAI,CopyAI focuses on enabling go-to-market workflows through generative content creation and task automation.,https://www.eweek.com/artificial-intelligence/generative-ai-startups/
+Narrative BI,Narrative BI turns business intelligence data into understandable narratives for decision-making.,https://www.eweek.com/artificial-intelligence/generative-ai-startups/
+Anyword,Anyword is a writing solution that optimizes content performance for marketing.,https://www.eweek.com/artificial-intelligence/generative-ai-startups/
+Synthesia,"Synthesia specializes in AI video production, allowing users to create videos from text inputs.",https://www.eweek.com/artificial-intelligence/generative-ai-startups/
+Midjourney,Midjourney is known for generating high-quality images based on natural language prompts.,https://www.eweek.com/artificial-intelligence/generative-ai-startups/
+MURF.AI,MURF.AI is a voice AI generation company with multilingual capabilities.,https://www.eweek.com/artificial-intelligence/generative-ai-startups/
+PlayHT,PlayHT specializes in AI-generated voice content and podcast production.,https://www.eweek.com/artificial-intelligence/generative-ai-startups/
+ElevenLabs,"ElevenLabs produces high-quality voice generation technology, offering features for text to speech.",https://www.eweek.com/artificial-intelligence/generative-ai-startups/
+Colossyan,Colossyan is focused on creating high-quality corporate training videos using AI.,https://www.eweek.com/artificial-intelligence/generative-ai-startups/
+AssemblyAI,AssemblyAI provides speech-to-text models tailored for enterprise usage.,https://www.eweek.com/artificial-intelligence/generative-ai-startups/
+Plask,"Plask offers tools for automated animation, making motion design easier.",https://www.eweek.com/artificial-intelligence/generative-ai-startups/
+LOVO,LOVO is a comprehensive AI platform for video and voice generation.,https://www.eweek.com/artificial-intelligence/generative-ai-startups/
+DeepBrain AI,DeepBrain AI focuses on video generation and interactive human avatars.,https://www.eweek.com/artificial-intelligence/generative-ai-startups/
+Elai.io,Elai.io provides AI video generation tools designed for the business sector.,https://www.eweek.com/artificial-intelligence/generative-ai-startups/
+Sudowrite,"Sudowrite is a writing support tool for authors, enhancing creativity and storytelling.",https://www.eweek.com/artificial-intelligence/generative-ai-startups/
+Tavus,Tavus personalizes video content for different viewer requirements through generative technology.,https://www.eweek.com/artificial-intelligence/generative-ai-startups/
+Hippocratic AI,"Hippocratic AI develops AI solutions for healthcare, ensuring compliance with privacy standards.",https://www.eweek.com/artificial-intelligence/generative-ai-startups/
+Paige AI,Paige AI optimizes cancer diagnostics using advanced machine learning techniques.,https://www.eweek.com/artificial-intelligence/generative-ai-startups/
+Iambic Therapeutics,Iambic focuses on drug discovery and development using advanced AI methodologies.,https://www.eweek.com/artificial-intelligence/generative-ai-startups/
+Insilico Medicine,Insilico utilizes generative AI for drug development and research in various medical fields.,https://www.eweek.com/artificial-intelligence/generative-ai-startups/
+Etcembly,Etcembly focuses on improving immunotherapies using machine learning.,https://www.eweek.com/artificial-intelligence/generative-ai-startups/
+Biomatter,Biomatter uses its Intelligent Architecture platform for protein design and manufacturing.,https://www.eweek.com/artificial-intelligence/generative-ai-startups/
+Activ Surgical,Activ Surgical enhances surgical intelligence with real-time data visualization.,https://www.eweek.com/artificial-intelligence/generative-ai-startups/
+Kaliber Labs,Kaliber develops AI-powered surgical software solutions for improved medical procedures.,https://www.eweek.com/artificial-intelligence/generative-ai-startups/
+Osmo,"Osmo applies machine learning to olfactory science, aiming to predict scents.",https://www.eweek.com/artificial-intelligence/generative-ai-startups/
+Aqemia,Aqemia leverages AI for faster drug discovery and development.,https://www.eweek.com/artificial-intelligence/generative-ai-startups/
+Synthetaic,Synthetaic generates AI models for analyzing unstructured datasets.,https://www.eweek.com/artificial-intelligence/generative-ai-startups/
+Synthesis AI,Synthesis AI specializes in synthetic data generation targeted for various industries.,https://www.eweek.com/artificial-intelligence/generative-ai-startups/
+Syntho,Syntho provides synthesized data generation and analytics solutions.,https://www.eweek.com/artificial-intelligence/generative-ai-startups/
+GenRocket,GenRocket emphasizes dynamic and automated test data generation.,https://www.eweek.com/artificial-intelligence/generative-ai-startups/
+Gridspace,Gridspace offers AI solutions to optimize customer interaction in contact centers.,https://www.eweek.com/artificial-intelligence/generative-ai-startups/
+Revery AI,Revery AI focuses on creating virtual try-on experiences in fashion.,https://www.eweek.com/artificial-intelligence/generative-ai-startups/
+Veesual,Veesual enables virtual try-ons through deep learning and image generation.,https://www.eweek.com/artificial-intelligence/generative-ai-startups/
+Frame AI,Frame AI uses AI to provide audience analytics and insights.,https://www.eweek.com/artificial-intelligence/generative-ai-startups/
+Zowie,Zowie produces AI-driven customer service solutions for e-commerce.,https://www.eweek.com/artificial-intelligence/generative-ai-startups/
+Forethought,Forethought develops generative AI technology for improved customer service.,https://www.eweek.com/artificial-intelligence/generative-ai-startups/
+Lily AI,Lily AI uses AI for product management and enhancing customer experiences.,https://www.eweek.com/artificial-intelligence/generative-ai-startups/
+Runway,Runway produces AI-powered video content creation tools.,https://www.eweek.com/artificial-intelligence/generative-ai-startups/
+Latitude.io,Latitude.io is known for creating AI-driven gaming experiences.,https://www.eweek.com/artificial-intelligence/generative-ai-startups/
+Character.AI,Character.AI allows users to interact with conversational characters.,https://www.eweek.com/artificial-intelligence/generative-ai-startups/
+Charisma Entertainment,Charisma offers tools for developing interactive storytelling in various mediums.,https://www.eweek.com/artificial-intelligence/generative-ai-startups/
+Replika,Replika creates AI companions for personal conversations and interactions.,https://www.eweek.com/artificial-intelligence/generative-ai-startups/
+Aimi.fm,Aimi.fm generates music content for various media and users.,https://www.eweek.com/artificial-intelligence/generative-ai-startups/
+Inworld AI,Inworld AI develops realistic NPC characters for gaming and training.,https://www.eweek.com/artificial-intelligence/generative-ai-startups/
+SOUNDRAW,SOUNDRAW offers music composition tools for content generation.,https://www.eweek.com/artificial-intelligence/generative-ai-startups/
+Notion,Notion provides a collaborative workspace solution with AI-enhanced features.,https://www.eweek.com/artificial-intelligence/generative-ai-startups/
+Harvey,Harvey offers legal AI solutions for document handling and services.,https://www.eweek.com/artificial-intelligence/generative-ai-startups/
+Ironclad,Ironclad focuses on AI contract management across various sectors.,https://www.eweek.com/artificial-intelligence/generative-ai-startups/
+Taskade,Taskade uses AI to aid in task and project management.,https://www.eweek.com/artificial-intelligence/generative-ai-startups/
+Humata,Humata offers AI-powered tools to extract insights from dense documents.,https://www.eweek.com/artificial-intelligence/generative-ai-startups/
+Simplifai,Simplifai provides automation tools for highly regulated industries.,https://www.eweek.com/artificial-intelligence/generative-ai-startups/
+PatentPal,PatentPal streamlines patent application processes with AI-generated content.,https://www.eweek.com/artificial-intelligence/generative-ai-startups/
+Adept AI,Adept AI automates workplace interactions with generative AI tools.,https://www.eweek.com/artificial-intelligence/generative-ai-startups/
+Perplexity AI,Perplexity AI is an AI search engine focused on providing personalized results.,https://www.eweek.com/artificial-intelligence/generative-ai-startups/
+Andi,Andi is a generative AI search bot designed for user-friendly information retrieval.,https://www.eweek.com/artificial-intelligence/generative-ai-startups/
+You.com,You.com is a secure search engine that personalizes results with generative AI.,https://www.eweek.com/artificial-intelligence/generative-ai-startups/
+```

demos/search_on_site_and_date.md ADDED Viewed

	@@ -0,0 +1,85 @@

+This following query will only use the information from openai.com that are updated in the previous
+day. The behavior is similar to the "site:openai.com" and "date-restrict" search parameters in Google
+search.
+```bash
+python ask.py -c -q "OpenAI Swarm Framework" -d 1 -s openai.com
+2024-11-20 10:05:45,949 - INFO - Initializing converter ...
+2024-11-20 10:05:45,949 - INFO - ✅ Successfully initialized Docling.
+2024-11-20 10:05:45,949 - INFO - Initializing chunker ...
+2024-11-20 10:05:46,185 - INFO - ✅ Successfully initialized Chonkie.
+2024-11-20 10:05:46,499 - INFO - Initializing database ...
+2024-11-20 10:05:46,591 - INFO - ✅ Successfully initialized DuckDB.
+2024-11-20 10:05:46,591 - INFO - Searching the web ...
+2024-11-20 10:05:47,055 - INFO - ✅ Found 10 links for query: OpenAI Swarm Framework
+2024-11-20 10:05:47,055 - INFO - Scraping the URLs ...
+2024-11-20 10:05:47,055 - INFO - Scraping https://community.openai.com/t/agent-swarm-what-actually-is-the-point/578347 ...
+2024-11-20 10:05:47,056 - INFO - Scraping https://community.openai.com/t/introducing-swarm-js-node-js-implementation-of-openai-swarm/977510 ...
+2024-11-20 10:05:47,056 - INFO - Scraping https://community.openai.com/t/openai-swarm-for-agents-and-agent-handoffs/976579 ...
+2024-11-20 10:05:47,057 - INFO - Scraping https://cookbook.openai.com/examples/orchestrating_agents ...
+2024-11-20 10:05:47,058 - INFO - Scraping https://community.openai.com/t/swarm-some-initial-insights/976602 ...
+2024-11-20 10:05:47,059 - INFO - Scraping https://community.openai.com/t/how-to-use-async-functions-with-swarm/994569 ...
+2024-11-20 10:05:47,060 - INFO - Scraping https://community.openai.com/t/messages-i-o-growing-now-what/990194 ...
+2024-11-20 10:05:47,061 - INFO - Scraping https://forum.openai.com/public/events/virtual-event-technical-success-office-hours-gwpi7fv9mz ...
+2024-11-20 10:05:47,062 - INFO - Scraping https://community.openai.com/t/new-reasoning-models-openai-o1-preview-and-o1-mini/938081?page=3 ...
+2024-11-20 10:05:47,063 - INFO - Scraping https://forum.openai.com/public/videos/technical-success-office-hours-swam-11-14-2024 ...
+2024-11-20 10:05:47,358 - INFO - ✅ Successfully scraped https://community.openai.com/t/how-to-use-async-functions-with-swarm/994569 with length: 781
+2024-11-20 10:05:47,540 - INFO - ✅ Successfully scraped https://community.openai.com/t/introducing-swarm-js-node-js-implementation-of-openai-swarm/977510 with length: 3081
+2024-11-20 10:05:47,625 - INFO - ✅ Successfully scraped https://community.openai.com/t/swarm-some-initial-insights/976602 with length: 5786
+2024-11-20 10:05:47,662 - INFO - ✅ Successfully scraped https://community.openai.com/t/messages-i-o-growing-now-what/990194 with length: 12642
+2024-11-20 10:05:47,664 - INFO - ✅ Successfully scraped https://community.openai.com/t/openai-swarm-for-agents-and-agent-handoffs/976579 with length: 6016
+2024-11-20 10:05:47,666 - INFO - ✅ Successfully scraped https://community.openai.com/t/agent-swarm-what-actually-is-the-point/578347 with length: 11872
+2024-11-20 10:05:47,670 - INFO - ✅ Successfully scraped https://community.openai.com/t/new-reasoning-models-openai-o1-preview-and-o1-mini/938081?page=3 with length: 13588
+2024-11-20 10:05:47,778 - INFO - ✅ Successfully scraped https://forum.openai.com/public/events/virtual-event-technical-success-office-hours-gwpi7fv9mz with length: 3655
+2024-11-20 10:05:48,018 - INFO - ✅ Successfully scraped https://forum.openai.com/public/videos/technical-success-office-hours-swam-11-14-2024 with length: 47441
+2024-11-20 10:05:48,334 - INFO - ✅ Successfully scraped https://cookbook.openai.com/examples/orchestrating_agents with length: 18586
+2024-11-20 10:05:48,334 - INFO - ✅ Scraped 10 URLs.
+2024-11-20 10:05:48,335 - INFO - Chunking the text ...
+2024-11-20 10:05:48,356 - INFO - ✅ Generated 37 chunks ...
+2024-11-20 10:05:48,356 - INFO - Saving 37 chunks to DB ...
+2024-11-20 10:05:48,376 - INFO - Embedding 10 batches of chunks ...
+2024-11-20 10:05:49,796 - INFO - ✅ Finished embedding.
+2024-11-20 10:05:50,338 - INFO - ✅ Created the vector index ...
+2024-11-20 10:05:50,409 - INFO - ✅ Created the full text search index ...
+2024-11-20 10:05:50,410 - INFO - ✅ Successfully embedded and saved chunks to DB.
+2024-11-20 10:05:50,410 - INFO - Querying the vector DB to get context ...
+2024-11-20 10:05:50,621 - INFO - Running full-text search ...
+2024-11-20 10:05:50,644 - INFO - ✅ Got 13 matched chunks.
+2024-11-20 10:05:50,644 - INFO - Running inference with context ...
+2024-11-20 10:05:56,986 - INFO - ✅ Finished inference API call.
+2024-11-20 10:05:56,986 - INFO - Generating output ...
+# Answer
+OpenAI Swarm is an experimental framework designed to create, manage, and deploy multi-agent systems. It allows multiple AI agents to collaborate on complex tasks, differing significantly from traditional single-agent models and other OpenAI tools like Custom GPTs, API Completions, Functions, and Assistants.
+Key differentiators of Swarm include:
+1. **Multi-Agent Collaboration**: Swarm enables agents to interact and coordinate, enhancing efficiency in problem-solving. Traditional models typically operate with single-agent interactions[1].
+2. **Orchestration and Coordination**: The framework provides mechanisms for task delegation, synchronization, and result aggregation essential for handling the complexity of multi-agent scenarios. Existing APIs primarily function within a single agent’s context without such coordination[1].
+3. **Scalability and Flexibility**: Swarm is designed to easily scale by adding specialized agents, offering customization for roles within the system. In contrast, existing APIs usually focus on increasing the capacity of a single model rather than expanding agent collaboration[1].
+4. **Ideal Use Cases**: Swarm is particularly useful for tasks that benefit from parallel processing and specialization, like complex simulations and large-scale data analysis. Other models are more suited to tasks manageable by single agents, such as content generation[1].
+5. **Back-End Integration**: Swarm is primarily tailored for back-end development, allowing integration into applications via programming languages like Python using APIs[1]. In contrast, other tools allow for more direct user interactions through front-end interfaces like ChatGPT[1].
+It should be noted that Swarm is an educational resource for exploring multi-agent orchestration and not intended for production-ready applications, highlighting the significance of programming expertise for its implementation[3][5][11].
+# References
+[1] https://community.openai.com/t/swarm-some-initial-insights/976602
+[2] https://community.openai.com/t/introducing-swarm-js-node-js-implementation-of-openai-swarm/977510
+[3] https://community.openai.com/t/openai-swarm-for-agents-and-agent-handoffs/976579
+[4] https://community.openai.com/t/swarm-some-initial-insights/976602
+[5] https://community.openai.com/t/openai-swarm-for-agents-and-agent-handoffs/976579
+[6] https://community.openai.com/t/how-to-use-async-functions-with-swarm/994569
+[7] https://forum.openai.com/public/videos/technical-success-office-hours-swam-11-14-2024
+[8] https://community.openai.com/t/agent-swarm-what-actually-is-the-point/578347
+[9] https://forum.openai.com/public/videos/technical-success-office-hours-swam-11-14-2024
+[10] https://community.openai.com/t/agent-swarm-what-actually-is-the-point/578347
+[11] https://forum.openai.com/public/events/virtual-event-technical-success-office-hours-gwpi7fv9mz
+[12] https://forum.openai.com/public/videos/technical-success-office-hours-swam-11-14-2024
+[13] https://forum.openai.com/public/videos/technical-success-office-hours-swam-11-14-2024
+```

env.deepseek.tpl ADDED Viewed

	@@ -0,0 +1,8 @@

+LLM_BASE_URL=https://api.deepseek.com/v1
+LLM_API_KEY=<deepseek-api-key>
+DEFAULT_INFERENCE_MODEL=deepseek-chat
+EMBED_BASE_URL=https://api.openai.com/v1
+EMBED_API_KEY=<openai-api-key>
+EMBEDDING_MODEL=text-embedding-3-small
+EMBEDDING_DIMENSIONS=1536

env.ollama.tpl ADDED Viewed

	@@ -0,0 +1,6 @@

+LLM_BASE_URL=http://localhost:11434/v1
+LLM_API_KEY=dummy-api-key
+DEFAULT_INFERENCE_MODEL=llama3.2
+EMBEDDING_MODEL=nomic-embed-text
+EMBEDDING_DIMENSIONS=768

env.tpl ADDED Viewed

	@@ -0,0 +1,16 @@

+# right now we use Google search API as the default search engine
+SEARCH_API_URL=https://www.googleapis.com/customsearch/v1
+SEARCH_API_KEY=<your-google-search-api-key>
+SEARCH_PROJECT_KEY=<your-google-cx-key>
+# right now we use OpenAI API as the default LLM inference engine and embedding model
+LLM_BASE_URL=https://api.openai.com/v1
+LLM_API_KEY=<your-openai-api-key>
+DEFAULT_INFERENCE_MODEL=gpt-4o-mini
+EMBEDDING_MODEL=text-embedding-3-small
+EMBEDDING_DIMENSIONS=1536
+# Run and share Gradio UI
+RUN_GRADIO_UI=False
+SHARE_GRADIO_UI=False

instructions/extract_example.txt ADDED Viewed

	@@ -0,0 +1,3 @@

+class CompanyInfo(BaseModel):
+    name: str
+    description: str

instructions/links.txt CHANGED Viewed

@@ -1,3 +1,4 @@
-# we will crawl these pages and answer the question based on their contents
 https://en.wikipedia.org/wiki/Large_language_model
 https://en.wikipedia.org/wiki/Retrieval-augmented_generation

+# you can specify a --url-list-file argument with links similar to the ones below
+# ask.py will crawl these pages and answer the question based on their contents
 https://en.wikipedia.org/wiki/Large_language_model
 https://en.wikipedia.org/wiki/Retrieval-augmented_generation

requirements.txt CHANGED Viewed

@@ -1,9 +1,11 @@
 click==8.1.7
-requests==2.31.0
-openai==1.40.2
 jinja2==3.1.3
 bs4==0.0.2
-lxml==4.8.0
 python-dotenv==1.0.1
 duckdb==1.1.2
-gradio==5.3.0

 click==8.1.7
+requests==2.32.3
+numpy==1.26.4
 jinja2==3.1.3
 bs4==0.0.2
 python-dotenv==1.0.1
+openai==1.57.2
 duckdb==1.1.2
+gradio==5.3.0
+chonkie==0.1.2
+docling==2.5.2

scripts/draw_flow.py ADDED Viewed

	@@ -0,0 +1,68 @@

+# Re-importing necessary libraries and recreating the flowchart due to a reset.
+import matplotlib.patches as patches
+import matplotlib.pyplot as plt
+# Create a figure
+fig, ax = plt.subplots(figsize=(10, 14))
+# Helper function to create a box
+def create_box(text, x, y, width=2.5, height=0.8, color="lightblue"):
+    ax.add_patch(
+        patches.Rectangle(
+            (x, y), width, height, edgecolor="black", facecolor=color, lw=1.5
+        )
+    )
+    ax.text(x + width / 2, y + height / 2, text, ha="center", va="center", fontsize=10)
+# Helper function to create an arrow
+def create_arrow(x_start, y_start, x_end, y_end):
+    ax.annotate(
+        "",
+        xy=(x_end, y_end),
+        xytext=(x_start, y_start),
+        arrowprops=dict(facecolor="black", shrink=0.05, width=1.5, headwidth=8),
+    )
+# Draw the flowchart components
+create_box("Start", 4, 12)
+create_box("Query Input", 4, 10.5)
+create_box("Mode Selection", 4, 9)
+create_box("Search Mode", 1.5, 7.5)
+create_box("Local Mode", 6.5, 7.5)
+create_box("Search Google", 1.5, 6)
+create_box("Crawl and Scrape Results", 1.5, 4.5)
+create_box("Use Local Files", 6.5, 6)
+create_box("Extract Text Content", 6.5, 4.5)
+create_box("Chunk Text Content", 4, 3)
+create_box("Save to VectorDB", 4, 1.5)
+create_box("Perform Hybrid Search", 4, 0)
+create_box("[Optional] Re-rank Results", 4, -1.5)
+create_box("Use Top Chunks as Context", 4, -3)
+create_box("Generate Answer with References", 4, -4.5)
+create_box("Output Answer", 4, -6)
+# Draw the arrows
+create_arrow(5.25, 12, 5.25, 11.3)
+create_arrow(5.25, 10.5, 5.25, 9.8)
+create_arrow(5.25, 9, 3.5, 8.3)  # to Search Mode
+create_arrow(5.25, 9, 6.5, 8.3)  # to Local Mode
+create_arrow(2.75, 7.5, 2.75, 6.8)  # to Search Google
+create_arrow(7.75, 7.5, 7.75, 6.8)  # to Use Local Files
+create_arrow(2.75, 6, 2.75, 5.3)  # to Crawl and Scrape Results
+create_arrow(7.75, 6, 7.75, 5.3)  # to Extract Text Content
+create_arrow(2.75, 4.5, 4, 3.8)  # to Chunk Text Content
+create_arrow(7.75, 4.5, 6, 3.8)  # to Chunk Text Content
+create_arrow(5.25, 3, 5.25, 2.3)  # to Save to VectorDB
+create_arrow(5.25, 1.5, 5.25, 0.8)  # to Perform Hybrid Search
+create_arrow(5.25, 0, 5.25, -0.8)  # to Optional Re-rank Results
+create_arrow(5.25, -1.5, 5.25, -2.3)  # to Use Top Chunks as Context
+create_arrow(5.25, -3, 5.25, -3.8)  # to Generate Answer with References
+create_arrow(5.25, -4.5, 5.25, -5.3)  # to Output Answer
+# Final touches
+ax.axis("off")
+plt.title("Flowchart of Query Processing System", fontsize=14)
+plt.show()

svc.leettools.com ADDED Viewed

File without changes