aneeb15's picture
Upload folder using huggingface_hub
bf39d5e verified
metadata
title: Multi-Agent Research Generator
emoji: ๐Ÿ”ฌ
colorFrom: blue
colorTo: indigo
sdk: docker
sdk_version: 1.44.1
app_file: app.py
pinned: false
license: mit

๐Ÿ”ฌ Multi-Agent Research & Report Generator

Orchestrates multiple specialized AI agents to autonomously research, analyze, fact-check, and produce structured professional reports with cited sources.

Python LangGraph Groq Streamlit


๐Ÿง  What This Project Demonstrates

Most AI demos make a single LLM call and call it "AI research." This project does something fundamentally different โ€” it separates research, analysis, fact-checking, and writing into specialized agents that communicate through a shared state graph.

Naive approach:    prompt โ†’ LLM โ†’ output
This project:      orchestrated multi-agent pipeline with 
                   conditional routing, critic patterns, 
                   and real source verification

This is the architecture pattern used in production enterprise AI systems.


๐Ÿ—๏ธ Agent Architecture

User Input: Research Topic
                โ†“
    Orchestrator (LangGraph StateGraph)
                โ†“
    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
    โ”‚      Research Agent         โ”‚
    โ”‚  Tavily web search          โ”‚
    โ”‚  3 targeted queries         โ”‚
    โ”‚  Real-time sources          โ”‚
    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
               โ†“
    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
    โ”‚      Analyst Agent          โ”‚
    โ”‚  Synthesizes findings       โ”‚
    โ”‚  Identifies patterns        โ”‚
    โ”‚  Flags contradictions       โ”‚
    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
               โ†“
    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
    โ”‚       Critic Agent          โ”‚
    โ”‚  Cross-references claims    โ”‚
    โ”‚  against Tavily sources     โ”‚  โ† actual ground truth
    โ”‚  Assigns confidence score   โ”‚
    โ”‚  Flags unsupported claims   โ”‚
    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
               โ†“
    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
    โ”‚         Conditional Router              โ”‚
    โ”‚  needs_revision AND iterations < 2      โ”‚
    โ”‚    โ†’ back to Analyst (max 2 cycles)     โ”‚
    โ”‚  else โ†’ Writer Agent                    โ”‚
    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
               โ†“
    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
    โ”‚       Writer Agent          โ”‚
    โ”‚  Structured report          โ”‚
    โ”‚  Cited sources              โ”‚
    โ”‚  Confidence score disclosed โ”‚
    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
               โ†“
    Professional Report + Confidence Score

๐Ÿ”‘ Key Technical Decisions

Why LangGraph State?

Each agent reads from and writes to a shared ResearchState TypedDict. No agent needs to know about other agents โ€” they only interact with state. This makes the pipeline modular, debuggable, and extensible.

class ResearchState(TypedDict):
    topic: str
    search_results: List[dict]   # Research Agent writes
    analysis: str                 # Analyst Agent writes
    critic_feedback: str          # Critic Agent writes
    confidence_scores: dict       # Critic writes, UI reads
    final_report: str             # Writer Agent writes
    current_step: str             # Orchestrator tracks
    iteration_count: int          # Prevents infinite loops

Why Tavily for Critic Agent?

A naive critic agent evaluates analyst output using only LLM knowledge โ€” which means it's one LLM instance confirming another's biases. Our Critic Agent cross-references every claim against actual Tavily search results, giving it real ground truth to check against. This is the architectural difference between a critic that finds real issues vs one that just agrees.

Why iteration_count?

Without a cycle limit, Critic โ†’ Analyst โ†’ Critic creates an infinite loop that exhausts free-tier rate limits. The Orchestrator increments this counter as a separate node (single responsibility), and the router caps revision cycles at 2 before forcing the pipeline to the Writer Agent.

Why separate Orchestrator node for iteration?

Mixing counter increments into the Critic Agent violates single responsibility. Each node does exactly one thing โ€” the Critic evaluates, the iterate node increments, the router decides. This makes debugging straightforward: if routing fails, only one node is responsible.


๐Ÿ“Š Confidence Score โ€” What It Actually Means

High score (75-100): Most claims in analysis are directly 
                     supported by Tavily search results

Low score (40-65):   Topic is speculative or emerging โ€”
                     fewer verifiable claims in sources

Example:
  Pakistan job market report โ†’ 80/100  (established research exists)
  Iran-US war economic impact โ†’ 60/100 (speculative, fewer sources)

Honest limitation: The Critic verifies logical consistency and cross-references against search results. It cannot replace domain expert fact-checking for critical decisions.


๐Ÿ› ๏ธ Tech Stack

Tool Purpose Why Free
LangGraph Agent orchestration Open source
Groq API LLM inference (Llama 3.3 70B) Free tier, fastest inference
Tavily API Real-time web search Free tier, 1000 searches/month
LangChain Tool definitions Open source
Streamlit UI Open source

๐Ÿš€ Setup & Installation

Prerequisites

Installation

# Clone repo
git clone https://github.com/yourusername/multi-agent-research
cd multi-agent-research

# Create virtual environment
python -m venv venv
venv\Scripts\activate  # Windows
source venv/bin/activate  # Mac/Linux

# Install dependencies
pip install -r requirements.txt

# Configure environment
cp .env.example .env
# Add your API keys to .env

Environment Variables

GROQ_API_KEY=your_groq_key_here
TAVILY_API_KEY=your_tavily_key_here

Hugging Face Space users: Set GROQ_API_KEY and TAVILY_API_KEY in your Space's Settings โ†’ Variables and Secrets โ€” do NOT commit a .env file.

Run

# Streamlit UI
streamlit run app.py

# Terminal only
python main.py

๐Ÿ“ Project Structure

multi_agent_research/
โ”œโ”€โ”€ agents/
โ”‚   โ”œโ”€โ”€ research_agent.py      # Tavily web search (3 queries)
โ”‚   โ”œโ”€โ”€ analyst_agent.py       # Synthesizes findings via LLM
โ”‚   โ”œโ”€โ”€ critic_agent.py        # Cross-references vs sources
โ”‚   โ””โ”€โ”€ writer_agent.py        # Produces final report
โ”œโ”€โ”€ graph/
โ”‚   โ””โ”€โ”€ research_graph.py      # LangGraph StateGraph + routing
โ”œโ”€โ”€ state/
โ”‚   โ””โ”€โ”€ research_state.py      # Shared state TypedDict
โ”œโ”€โ”€ app.py                     # Streamlit UI (HF Space entrypoint)
โ”œโ”€โ”€ main.py                    # Pipeline runner
โ”œโ”€โ”€ requirements.txt           # Python dependencies
โ”œโ”€โ”€ .env.example               # API key template
โ””โ”€โ”€ README.md

โšก Performance

Metric Value
Average report time ~30 seconds - 1 min
Tavily searches per run 9 (3 queries ร— 3 results)
Max revision cycles 2
Token usage per run ~8,000-11,000 tokens

โš ๏ธ Known Limitations

  • Critic cannot verify all hallucinations โ€” it cross-references against Tavily results but cannot catch confidently stated errors absent from search results
  • Groq free tier โ€” 12,000 TPM limit may cause rate limiting on complex topics
  • Tavily free tier โ€” 1,000 searches/month; each run uses 9 searches
  • Report quality depends on search result quality โ€” niche or poorly documented topics produce lower confidence scores

๐Ÿ—บ๏ธ What I Learned

  • LangGraph state management and conditional routing
  • Critic pattern in multi-agent systems โ€” and its honest limitations
  • Why multi-agent genuinely outperforms single-agent for parallel specialization
  • Token optimization for free-tier LLM APIs
  • Separation of concerns in agent design (single responsibility per node)

๐Ÿ”ฎ Future Improvements

  • Parallel agent execution (Research + Analyst simultaneously)
  • Vector store memory for cross-session topic persistence
  • PDF export for reports
  • Domain-specific agent personas (legal, medical, financial)
  • Human-in-the-loop approval before Writer Agent runs

Built as Project 4 of an AI Engineering portfolio. Part of a progression from model fine-tuning โ†’ RAG systems โ†’ single agents โ†’ multi-agent orchestration.