Spaces:

LeetTools
/

ask.py

Running

App Files Files Community

ask.py / README.md

LeetTools

Upload folder using huggingface_hub

44e8f0c verified 7 months ago

preview code

raw

history blame contribute delete

11.9 kB

	---
	title: ask.py
	app_file: ask.py
	sdk: gradio
	sdk_version: 5.3.0
	---

	[![License](https://img.shields.io/github/license/pengfeng/ask.py)](LICENSE)

	- [🚀 Updates! 🚀](#-updates-)
	- [Introduction](#introduction)
	- [Demo use cases](#demo-use-cases)
	- [The search-extract-summarize flow](#the-search-extract-summarize-flow)
	- [Quick start](#quick-start)
	- [Use Different LLM Endpoints](#use-different-llm-endpoints)
	- [Use local Ollama inference and embedding models](#use-local-ollama-inference-and-embedding-models)
	- [Use DeepSeek API inference with OpenAI embedding models](#use-deepseek-api-inference-with-openai-embedding-models)
	- [GradIO Deployment](#gradio-deployment)
	- [Community](#community)


	# 🚀 Updates! 🚀

	A full version with db support and configurable components is open sourced here:
	[LeetTools](https://github.com/leettools-dev/leettools). Please check it out!

	We also added support for local Ollama inference and embedding models, as well as for other API
	providers such as DeepSeek. Please see the [`Use Different LLM Endpoints`](#use-different-llm-endpoints) secton for more details.

	> [UPDATE]
	> - 2025-01-20: add support for separate API endpoints for inference and embedding
	> - 2025-01-20: add support for .env file switch and Ollama example
	> - 2025-01-20: add support for default search proxy
	> - 2024-12-20: add the full function version link
	> - 2024-11-20: add Docling converter and local mode to query against local files
	> - 2024-11-10: add Chonkie as the default chunker
	> - 2024-10-28: add extract function as a new output mode
	> - 2024-10-25: add hybrid search demo using DuckDB full-text search
	> - 2024-10-22: add GradIO integation
	> - 2024-10-21: use DuckDB for the vector search and use API for embedding
	> - 2024-10-20: allow to specify a list of input urls
	> - 2024-10-18: output-language and output-length parameters for LLM
	> - 2024-10-18: date-restrict and target-site parameters for seach

	# Introduction

	A single Python program to implement the search-extract-summarize flow, similar to AI search
	engines such as Perplexity.

	- You can run it with local Ollama inference and embedding models.
	- You can run it on command line or with a GradIO UI.
	- You can control the output behavior, e.g., extract structured data or change output language,
	- You can control the search behavior, e.g., restrict to a specific site or date, or just scrape
	a specified list of URLs.
	- You can run it in a cron job or bash script to automate complex search/data extraction tasks.
	- You can ask questions against local files.

	We have a running UI example [in HuggingFace Spaces](https://huggingface.co/spaces/LeetTools/ask.py).

	![image](https://github.com/user-attachments/assets/0483e6a2-75d7-4fbd-813f-bfa13839c836)

	## Demo use cases

	- [Search like Perplexity](demos/search_and_answer.md)
	- [Only use the latest information from a specific site](demos/search_on_site_and_date.md)
	- [Extract information from web search results](demos/search_and_extract.md)
	- [Ask questions against local files](demos/local_files.md)
	- [Use Ollama local LLM and Embedding models](demos/run_with_ollama.md)

	> [!NOTE]
	>
	> - Our main goal is to illustrate the basic concepts of AI search engines with the raw constructs.
	> Performance or scalability is not in the scope of this program.
	> - We are planning to open source a real search-enabled AI toolset with real DB setup, real document
	> pipeline, and real query engine soon. Star and watch this repo for updates!

	## The search-extract-summarize flow

	Given a query, the program will

	- in search mode: search Google for the top 10 web pages
	- in local mode: use the local files under the 'data' directory
	- crawl and scape the result documents for their text content
	- chunk the text content into chunks and save them into a vectordb
	- perform a hybrid search (vector and BM25 FTS) with the query and find the top 10 matched chunks
	- [Optional] use a reranker to re-rank the top chunks
	- use the top chunks as the context to ask an LLM to generate the answer
	- output the answer with the references

	Of course this flow is a very simplified version of the real AI search engines, but it is a good starting point to understand the basic concepts.

	One benefit is that we can manipulate the search function and output format.

	For example, we can:

	- search with date-restrict to only retrieve the latest information.
	- search within a target-site to only create the answer from the contents from it.
	- ask LLM to use a specific language to answer the question.
	- ask LLM to answer with a specific length.
	- crawl a specific list of urls and answer based on those contents only.

	This program can serve as a playground to understand and experiment with different components in
	the pipeline.

	# Quick start

	```bash
	# We recommend using uv as the virtual environment manager
	# First install uv if you haven't:
	% curl -LsSf https://astral.sh/uv/install.sh \| sh

	# Create a new virtual environment and install dependencies
	% uv venv
	% source .venv/bin/activate # On Windows use: .venv\Scripts\activate
	% uv pip install -e .

	# Alternatively, if you prefer not to install in editable mode, you can use:
	% uv pip install .

	# modify .env file to set the API keys or export them as environment variables as below

	# you need to set the Google search API
	% export SEARCH_API_KEY="your-google-search-api-key"
	% export SEARCH_PROJECT_KEY="your-google-cx-key"

	# right now we use OpenAI API, default using OpenAI
	# % export LLM_BASE_URL=https://api.openai.com/v1
	% export LLM_API_KEY=<your-openai-api-key>

	# By default, the program will start a web UI. See GradIO Deployment section for more info.
	# Run the program on command line with -c option
	% python ask.py -c -q "What is an LLM agent?"

	# You can also query your local files under the 'data' directory using the local mode
	% python ask.py -i local -c -q "How does Ask.py work?"

	# we can specify more parameters to control the behavior such as date_restrict and target_site
	% python ask.py --help
	Usage: ask.py [OPTIONS]

	Search web for the query and summarize the results.

	Options:
	-q, --query TEXT Query to search
	-i, --input-mode [search\|local]
	Input mode for the query, default is search.
	When using local, files under 'data' folder
	will be used as input.
	-o, --output-mode [answer\|extract]
	Output mode for the answer, default is a
	simple answer
	-d, --date-restrict INTEGER Restrict search results to a specific date
	range, default is no restriction
	-s, --target-site TEXT Restrict search results to a specific site,
	default is no restriction
	--output-language TEXT Output language for the answer
	--output-length INTEGER Output length for the answer
	--url-list-file TEXT Instead of doing web search, scrape the
	target URL list and answer the query based
	on the content
	--extract-schema-file TEXT Pydantic schema for the extract mode
	--inference-model-name TEXT Model name to use for inference
	--vector-search-only Do not use hybrid search mode, use vector
	search only.
	-c, --run-cli Run as a command line tool instead of
	launching the Gradio UI
	-e, --env TEXT The environment file to use, absolute path
	or related to package root.
	-l, --log-level [DEBUG\|INFO\|WARNING\|ERROR]
	Set the logging level [default: INFO]
	--help Show this message and exit.
	```

	# Use Different LLM Endpoints

	## Use local Ollama inference and embedding models
	We can run Ask.py with different env files to use different LLM endpoints and other
	related settings. For example, if you have a local Ollama serving instance, you can set
	to use it as follows:

	```bash
	# you may need to pull the models first
	% ollama pull llama3.2
	% ollama pull nomic-embed-text
	% ollama serve

	% cat > .env.ollama <<EOF
	LLM_BASE_URL=http://localhost:11434/v1
	LLM_API_KEY=dummy-key
	DEFAULT_INFERENCE_MODEL=llama3.2
	EMBEDDING_MODEL=nomic-embed-text
	EMBEDDING_DIMENSIONS=768
	EOF

	# Then run the command with the -e option to specify the .env file to use
	% python ask.py -e .env.ollama -c -q "How does Ollama work?"
	```

	## Use DeepSeek API inference with OpenAI embedding models

	We can also use one provider for inference and another for embedding. For example, we can use
	DeepSeek API for inference and OpenAI for embedding since DeepSeek does not provide an embedding
	endpoint as of Jan 2025:

	```bash
	% cat > .env.deepseek <<EOF
	LLM_BASE_URL=https://api.deepseek.com/v1
	LLM_API_KEY=<deepseek-api-key>
	DEFAULT_INFERENCE_MODEL=deepseek-chat

	EMBED_BASE_URL=https://api.openai.com/v1
	EMBED_API_KEY=<openai-api-key>
	EMBEDDING_MODEL=text-embedding-3-small
	EMBEDDING_DIMENSIONS=1536
	EOF

	% python ask.py -e .env.deepseek -c -q "How does DeepSeek work?"
	```


	# GradIO Deployment

	> [!NOTE]
	> Original GradIO app-sharing document [here](https://www.gradio.app/guides/sharing-your-app).

	Quick test and sharing

	By default, the program will start a web UI and share through GradIO.

	```bash
	% python ask.py
	* Running on local URL: http://127.0.0.1:7860
	* Running on public URL: https://77c277af0330326587.gradio.live

	# you can also specify SHARE_GRADIO_UI to only run locally
	% export SHARE_GRADIO_UI=False
	% python ask.py
	* Running on local URL: http://127.0.0.1:7860
	```

	To share a more permanent link using HuggingFace Spaces

	- First, you need to [create a free HuggingFace account](https://huggingface.co/welcome).
	- Then in your [settings/token page](https://huggingface.co/settings/tokens), create a new token with Write permissions.
	- In your terminal, run the following commands in you app directory to deploy your program to
	HuggingFace Spaces:

	```bash
	% pip install gradio
	% gradio deploy
	Creating new Spaces Repo in '/home/you/ask.py'. Collecting metadata, press Enter to accept default value.
	Enter Spaces app title [ask.py]: ask.py
	Enter Gradio app file [ask.py]:
	Enter Spaces hardware (cpu-basic, cpu-upgrade, t4-small, t4-medium, l4x1, l4x4, zero-a10g, a10g-small, a10g-large, a10g-largex2, a10g-largex4, a100-large, v5e-1x1, v5e-2x2, v5e-2x4) [cpu-basic]:
	Any Spaces secrets (y/n) [n]: y
	Enter secret name (leave blank to end): SEARCH_API_KEY
	Enter secret value for SEARCH_API_KEY: YOUR_SEARCH_API_KEY
	Enter secret name (leave blank to end): SEARCH_PROJECT_KEY
	Enter secret value for SEARCH_API_KEY: YOUR_SEARCH_PROJECT_KEY
	Enter secret name (leave blank to end): LLM_API_KEY
	Enter secret value for LLM_API_KEY: YOUR_LLM_API_KEY
	Enter secret name (leave blank to end):
	Create Github Action to automatically update Space on 'git push'? [n]: n
	Space available at https://huggingface.co/spaces/your_user_name/ask.py
	```

	Now you can use the HuggingFace space app to run your queries.


	# Community

	License and Acknowledgment

	The source code is licensed under MIT license. Thanks for these amazing open-source projects and API
	providers:

	- [Google Search API](https://developers.google.com/custom-search/v1/overview)
	- [OpenAI API](https://beta.openai.com/docs/api-reference/completions/create)
	- [Jinja2](https://jinja.palletsprojects.com/en/3.0.x/)
	- [bs4](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)
	- [DuckDB](https://github.com/duckdb/duckdb)
	- [Docling](https://github.com/DS4SD/docling)
	- [GradIO](https://github.com/gradio-app/gradio)
	- [Chonkie](https://github.com/bhavnicksm/chonkie)