humblebanana
1st
176a845

A newer version of the Gradio SDK is available: 6.3.0

Upgrade

Usage Guide

Quick Start (5 minutes)

Step 1: Install Dependencies

pip install -r requirements.txt

Step 2: Configure API Keys

Copy .env.example to .env and fill in your API keys:

cp .env.example .env
# Edit .env file with your keys

Minimum required:

  • HUGGINGFACEHUB_API_TOKEN (for LLM)
  • TAVILY_API_KEY (for web search)

Step 3: Setup Vector Database

python setup_chromadb.py

This will:

  • Download embedding model (~90MB)
  • Load metadata.jsonl
  • Create local ChromaDB database

Step 4: Run the Agent

Option A: Gradio UI (Recommended)

python app.py

Then:

  1. Log in with your HuggingFace account
  2. Click "Run Evaluation & Submit All Answers"
  3. Wait for results

Option B: Command Line

python agent.py

Edit the question in agent.py line 208.

Understanding the Output

Agent Messages

The agent produces several message types:

  1. SystemMessage: Instructions and guidelines
  2. HumanMessage: Your question + similar example (from RAG)
  3. AIMessage: Agent's reasoning and tool calls
  4. ToolMessage: Tool execution results
  5. Final AIMessage: The answer

Example:

SystemMessage: You are Alfred, an intelligent research assistant...
HumanMessage: What is quantum computing?
HumanMessage: Here is a similar question... (from vector DB)
AIMessage: I will use deep_research tool
   tool_calls: [{"name": "deep_research", "args": {"query": "..."}}]
ToolMessage: DEEP RESEARCH REPORT: ...
AIMessage: FINAL ANSWER: Quantum computing is...

Deep Research Report Structure

DEEP RESEARCH REPORT: [Query]
=====================================

πŸ“Š OVERVIEW
- Total sources: X
- Wikipedia: Y articles
- Web: Z pages
- Academic: W papers

πŸ“š WIKIPEDIA FINDINGS
[Source 1] ...
[Source 2] ...

🌐 WEB FINDINGS
[Source 1] ...
[Source 2] ...

πŸŽ“ ACADEMIC FINDINGS
[Source 1] ...
[Source 2] ...

πŸ“‹ ALL SOURCES
[1] https://...
[2] https://...

Customization

Changing LLM Provider

Edit agent.py line 217:

# Option 1: HuggingFace (Free, slower)
graph = build_graph(provider="huggingface")

# Option 2: Groq (Fast, free tier available)
graph = build_graph(provider="groq")

# Option 3: Google Gemini (Balanced, requires payment)
graph = build_graph(provider="google")

Adjusting Deep Research Behavior

Edit deep_research_tool.py:

# Line 54: Wikipedia results
WikipediaLoader(query=query, load_max_docs=2)  # Change to 1-5

# Line 68: Web results
TavilySearchResults(max_results=10)  # Change to 3-15

# Line 85: Academic results
ArxivLoader(query=query, load_max_docs=5)  # Change to 1-10

# Line 58, 75, 89: Content truncation
"content": doc.page_content[:2000]  # Change to 500-3000

Modifying System Prompt

Edit system_prompt.txt to:

  • Change tool selection strategy
  • Adjust reasoning guidelines
  • Modify output format

Testing Different Question Types

Mathematical Questions

question = "What is 15 multiplied by 23?"
# Expected: Uses multiply tool β†’ Direct answer

Simple Factual Questions

question = "Who invented the telephone?"
# Expected: Uses wiki_search β†’ Quick answer

Complex Conceptual Questions

question = "Explain quantum entanglement and its applications"
# Expected: Uses deep_research β†’ Comprehensive answer

Recent Events

question = "What are the latest AI developments in 2025?"
# Expected: Uses web_search or deep_research

Troubleshooting

Issue: "No module named 'sentence_transformers'"

pip install sentence-transformers

Issue: "TAVILY_API_KEY not found"

Make sure .env file exists and contains:

TAVILY_API_KEY=tvly-xxxxx

Issue: ChromaDB not working

Delete and recreate:

rm -rf chroma_db/
python setup_chromadb.py

Issue: Agent not using deep_research

Check system_prompt.txt - make sure it mentions when to use deep_research.

Or be explicit in your question:

question = "Use deep research to analyze quantum computing"

Performance Optimization

Speed vs Quality Trade-offs

Faster (for testing):

# deep_research_tool.py
WikipediaLoader(load_max_docs=1)
TavilySearchResults(max_results=3)
ArxivLoader(load_max_docs=1)
"content": doc.page_content[:500]

Balanced (recommended):

WikipediaLoader(load_max_docs=2)
TavilySearchResults(max_results=10)
ArxivLoader(load_max_docs=5)
"content": doc.page_content[:2000]

Comprehensive (slower but thorough):

WikipediaLoader(load_max_docs=5)
TavilySearchResults(max_results=15)
ArxivLoader(load_max_docs=10)
"content": doc.page_content[:5000]

Advanced Usage

Adding Custom Tools

  1. Define tool in agent.py:
@tool
def my_custom_tool(query: str) -> str:
    """Description of what this tool does."""
    # Your implementation
    return result
  1. Add to tools list:
tools = [
    multiply, add, subtract,
    wiki_search, web_search,
    deep_research,
    my_custom_tool,  # Add here
]
  1. Update system_prompt.txt to mention when to use it.

Batch Processing

from agent import build_graph
from langchain_core.messages import HumanMessage

graph = build_graph(provider="huggingface")

questions = [
    "Question 1",
    "Question 2",
    "Question 3",
]

for q in questions:
    result = graph.invoke({"messages": [HumanMessage(content=q)]})
    answer = result["messages"][-1].content
    print(f"Q: {q}\nA: {answer}\n")

Logging and Debugging

Add logging to track agent behavior:

# In agent.py, modify assistant function:
def assistant(state: MessagesState):
    """Assistant node"""
    print(f"\n{'='*60}")
    print("Assistant Input:")
    for msg in state["messages"]:
        print(f"  - {type(msg).__name__}: {msg.content[:100]}...")

    result = llm_with_tools.invoke(state["messages"])

    print("\nAssistant Output:")
    if hasattr(result, "tool_calls") and result.tool_calls:
        print(f"  Tool calls: {[tc['name'] for tc in result.tool_calls]}")
    print(f"{'='*60}\n")

    return {"messages": [result]}

FAQ

Q: How do I know which tool was used?

A: Check the AIMessage for tool_calls field, or add logging as shown above.

Q: Can I use without Tavily API?

A: Yes, but web_search and deep_research will partially fail. Consider removing them from the tools list.

Q: How long does setup take?

A: ~5 minutes (mostly downloading the embedding model).

Q: Can I run this offline?

A: No, it requires API calls to LLM and search services.

Q: How much does it cost?

A: Using HuggingFace Inference API is free (with rate limits). Tavily has a free tier (1000 queries/month).

Next Steps

  1. Test with different question types
  2. Optimize performance for your use case
  3. Customize system prompt
  4. Add domain-specific tools
  5. Integrate with your application

For more details, see the full documentation in the main repository.