Spaces:

aneeb15
/

multi-agent-research

Sleeping

App Files Files Community

multi-agent-research / Readme.md

aneeb15

Upload folder using huggingface_hub

bf39d5e verified 27 days ago

preview code

raw

history blame contribute delete

9.5 kB

metadata

title: Multi-Agent Research Generator
emoji: 🔬
colorFrom: blue
colorTo: indigo
sdk: docker
sdk_version: 1.44.1
app_file: app.py
pinned: false
license: mit

🔬 Multi-Agent Research & Report Generator

Orchestrates multiple specialized AI agents to autonomously research, analyze, fact-check, and produce structured professional reports with cited sources.

🧠 What This Project Demonstrates

Most AI demos make a single LLM call and call it "AI research." This project does something fundamentally different — it separates research, analysis, fact-checking, and writing into specialized agents that communicate through a shared state graph.

Naive approach:    prompt → LLM → output
This project:      orchestrated multi-agent pipeline with 
                   conditional routing, critic patterns, 
                   and real source verification

This is the architecture pattern used in production enterprise AI systems.

🏗️ Agent Architecture

User Input: Research Topic
                ↓
    Orchestrator (LangGraph StateGraph)
                ↓
    ┌─────────────────────────────┐
    │      Research Agent         │
    │  Tavily web search          │
    │  3 targeted queries         │
    │  Real-time sources          │
    └──────────┬──────────────────┘
               ↓
    ┌─────────────────────────────┐
    │      Analyst Agent          │
    │  Synthesizes findings       │
    │  Identifies patterns        │
    │  Flags contradictions       │
    └──────────┬──────────────────┘
               ↓
    ┌─────────────────────────────┐
    │       Critic Agent          │
    │  Cross-references claims    │
    │  against Tavily sources     │  ← actual ground truth
    │  Assigns confidence score   │
    │  Flags unsupported claims   │
    └──────────┬──────────────────┘
               ↓
    ┌─────────────────────────────────────────┐
    │         Conditional Router              │
    │  needs_revision AND iterations < 2      │
    │    → back to Analyst (max 2 cycles)     │
    │  else → Writer Agent                    │
    └──────────┬──────────────────────────────┘
               ↓
    ┌─────────────────────────────┐
    │       Writer Agent          │
    │  Structured report          │
    │  Cited sources              │
    │  Confidence score disclosed │
    └──────────┬──────────────────┘
               ↓
    Professional Report + Confidence Score

🔑 Key Technical Decisions

Why LangGraph State?

Each agent reads from and writes to a shared ResearchState TypedDict. No agent needs to know about other agents — they only interact with state. This makes the pipeline modular, debuggable, and extensible.

class ResearchState(TypedDict):
    topic: str
    search_results: List[dict]   # Research Agent writes
    analysis: str                 # Analyst Agent writes
    critic_feedback: str          # Critic Agent writes
    confidence_scores: dict       # Critic writes, UI reads
    final_report: str             # Writer Agent writes
    current_step: str             # Orchestrator tracks
    iteration_count: int          # Prevents infinite loops

Why Tavily for Critic Agent?

A naive critic agent evaluates analyst output using only LLM knowledge — which means it's one LLM instance confirming another's biases. Our Critic Agent cross-references every claim against actual Tavily search results, giving it real ground truth to check against. This is the architectural difference between a critic that finds real issues vs one that just agrees.

Why iteration_count?

Without a cycle limit, Critic → Analyst → Critic creates an infinite loop that exhausts free-tier rate limits. The Orchestrator increments this counter as a separate node (single responsibility), and the router caps revision cycles at 2 before forcing the pipeline to the Writer Agent.

Why separate Orchestrator node for iteration?

Mixing counter increments into the Critic Agent violates single responsibility. Each node does exactly one thing — the Critic evaluates, the iterate node increments, the router decides. This makes debugging straightforward: if routing fails, only one node is responsible.

📊 Confidence Score — What It Actually Means

High score (75-100): Most claims in analysis are directly 
                     supported by Tavily search results

Low score (40-65):   Topic is speculative or emerging —
                     fewer verifiable claims in sources

Example:
  Pakistan job market report → 80/100  (established research exists)
  Iran-US war economic impact → 60/100 (speculative, fewer sources)

Honest limitation: The Critic verifies logical consistency and cross-references against search results. It cannot replace domain expert fact-checking for critical decisions.

🛠️ Tech Stack

Tool	Purpose	Why Free
LangGraph	Agent orchestration	Open source
Groq API	LLM inference (Llama 3.3 70B)	Free tier, fastest inference
Tavily API	Real-time web search	Free tier, 1000 searches/month
LangChain	Tool definitions	Open source
Streamlit	UI	Open source

🚀 Setup & Installation

Prerequisites

Python 3.10+
Groq API key — console.groq.com
Tavily API key — tavily.com

Installation

# Clone repo
git clone https://github.com/yourusername/multi-agent-research
cd multi-agent-research

# Create virtual environment
python -m venv venv
venv\Scripts\activate  # Windows
source venv/bin/activate  # Mac/Linux

# Install dependencies
pip install -r requirements.txt

# Configure environment
cp .env.example .env
# Add your API keys to .env

Environment Variables

GROQ_API_KEY=your_groq_key_here
TAVILY_API_KEY=your_tavily_key_here

Hugging Face Space users: Set GROQ_API_KEY and TAVILY_API_KEY in your Space's Settings → Variables and Secrets — do NOT commit a .env file.

Run

# Streamlit UI
streamlit run app.py

# Terminal only
python main.py

📁 Project Structure

multi_agent_research/
├── agents/
│   ├── research_agent.py      # Tavily web search (3 queries)
│   ├── analyst_agent.py       # Synthesizes findings via LLM
│   ├── critic_agent.py        # Cross-references vs sources
│   └── writer_agent.py        # Produces final report
├── graph/
│   └── research_graph.py      # LangGraph StateGraph + routing
├── state/
│   └── research_state.py      # Shared state TypedDict
├── app.py                     # Streamlit UI (HF Space entrypoint)
├── main.py                    # Pipeline runner
├── requirements.txt           # Python dependencies
├── .env.example               # API key template
└── README.md

⚡ Performance

Metric	Value
Average report time	~30 seconds - 1 min
Tavily searches per run	9 (3 queries × 3 results)
Max revision cycles	2
Token usage per run	~8,000-11,000 tokens

⚠️ Known Limitations

Critic cannot verify all hallucinations — it cross-references against Tavily results but cannot catch confidently stated errors absent from search results
Groq free tier — 12,000 TPM limit may cause rate limiting on complex topics
Tavily free tier — 1,000 searches/month; each run uses 9 searches
Report quality depends on search result quality — niche or poorly documented topics produce lower confidence scores

🗺️ What I Learned

LangGraph state management and conditional routing
Critic pattern in multi-agent systems — and its honest limitations
Why multi-agent genuinely outperforms single-agent for parallel specialization
Token optimization for free-tier LLM APIs
Separation of concerns in agent design (single responsibility per node)

🔮 Future Improvements

Parallel agent execution (Research + Analyst simultaneously)
Vector store memory for cross-session topic persistence
PDF export for reports
Domain-specific agent personas (legal, medical, financial)
Human-in-the-loop approval before Writer Agent runs

Built as Project 4 of an AI Engineering portfolio. Part of a progression from model fine-tuning → RAG systems → single agents → multi-agent orchestration.