Spaces:
Sleeping
title: Multi-Agent Research Generator
emoji: ๐ฌ
colorFrom: blue
colorTo: indigo
sdk: docker
sdk_version: 1.44.1
app_file: app.py
pinned: false
license: mit
๐ฌ Multi-Agent Research & Report Generator
Orchestrates multiple specialized AI agents to autonomously research, analyze, fact-check, and produce structured professional reports with cited sources.
๐ง What This Project Demonstrates
Most AI demos make a single LLM call and call it "AI research." This project does something fundamentally different โ it separates research, analysis, fact-checking, and writing into specialized agents that communicate through a shared state graph.
Naive approach: prompt โ LLM โ output
This project: orchestrated multi-agent pipeline with
conditional routing, critic patterns,
and real source verification
This is the architecture pattern used in production enterprise AI systems.
๐๏ธ Agent Architecture
User Input: Research Topic
โ
Orchestrator (LangGraph StateGraph)
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Research Agent โ
โ Tavily web search โ
โ 3 targeted queries โ
โ Real-time sources โ
โโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Analyst Agent โ
โ Synthesizes findings โ
โ Identifies patterns โ
โ Flags contradictions โ
โโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Critic Agent โ
โ Cross-references claims โ
โ against Tavily sources โ โ actual ground truth
โ Assigns confidence score โ
โ Flags unsupported claims โ
โโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Conditional Router โ
โ needs_revision AND iterations < 2 โ
โ โ back to Analyst (max 2 cycles) โ
โ else โ Writer Agent โ
โโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Writer Agent โ
โ Structured report โ
โ Cited sources โ
โ Confidence score disclosed โ
โโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโ
โ
Professional Report + Confidence Score
๐ Key Technical Decisions
Why LangGraph State?
Each agent reads from and writes to a shared ResearchState TypedDict. No agent needs to know about other agents โ they only interact with state. This makes the pipeline modular, debuggable, and extensible.
class ResearchState(TypedDict):
topic: str
search_results: List[dict] # Research Agent writes
analysis: str # Analyst Agent writes
critic_feedback: str # Critic Agent writes
confidence_scores: dict # Critic writes, UI reads
final_report: str # Writer Agent writes
current_step: str # Orchestrator tracks
iteration_count: int # Prevents infinite loops
Why Tavily for Critic Agent?
A naive critic agent evaluates analyst output using only LLM knowledge โ which means it's one LLM instance confirming another's biases. Our Critic Agent cross-references every claim against actual Tavily search results, giving it real ground truth to check against. This is the architectural difference between a critic that finds real issues vs one that just agrees.
Why iteration_count?
Without a cycle limit, Critic โ Analyst โ Critic creates an infinite loop that exhausts free-tier rate limits. The Orchestrator increments this counter as a separate node (single responsibility), and the router caps revision cycles at 2 before forcing the pipeline to the Writer Agent.
Why separate Orchestrator node for iteration?
Mixing counter increments into the Critic Agent violates single responsibility. Each node does exactly one thing โ the Critic evaluates, the iterate node increments, the router decides. This makes debugging straightforward: if routing fails, only one node is responsible.
๐ Confidence Score โ What It Actually Means
High score (75-100): Most claims in analysis are directly
supported by Tavily search results
Low score (40-65): Topic is speculative or emerging โ
fewer verifiable claims in sources
Example:
Pakistan job market report โ 80/100 (established research exists)
Iran-US war economic impact โ 60/100 (speculative, fewer sources)
Honest limitation: The Critic verifies logical consistency and cross-references against search results. It cannot replace domain expert fact-checking for critical decisions.
๐ ๏ธ Tech Stack
| Tool | Purpose | Why Free |
|---|---|---|
| LangGraph | Agent orchestration | Open source |
| Groq API | LLM inference (Llama 3.3 70B) | Free tier, fastest inference |
| Tavily API | Real-time web search | Free tier, 1000 searches/month |
| LangChain | Tool definitions | Open source |
| Streamlit | UI | Open source |
๐ Setup & Installation
Prerequisites
- Python 3.10+
- Groq API key โ console.groq.com
- Tavily API key โ tavily.com
Installation
# Clone repo
git clone https://github.com/yourusername/multi-agent-research
cd multi-agent-research
# Create virtual environment
python -m venv venv
venv\Scripts\activate # Windows
source venv/bin/activate # Mac/Linux
# Install dependencies
pip install -r requirements.txt
# Configure environment
cp .env.example .env
# Add your API keys to .env
Environment Variables
GROQ_API_KEY=your_groq_key_here
TAVILY_API_KEY=your_tavily_key_here
Hugging Face Space users: Set
GROQ_API_KEYandTAVILY_API_KEYin your Space's Settings โ Variables and Secrets โ do NOT commit a.envfile.
Run
# Streamlit UI
streamlit run app.py
# Terminal only
python main.py
๐ Project Structure
multi_agent_research/
โโโ agents/
โ โโโ research_agent.py # Tavily web search (3 queries)
โ โโโ analyst_agent.py # Synthesizes findings via LLM
โ โโโ critic_agent.py # Cross-references vs sources
โ โโโ writer_agent.py # Produces final report
โโโ graph/
โ โโโ research_graph.py # LangGraph StateGraph + routing
โโโ state/
โ โโโ research_state.py # Shared state TypedDict
โโโ app.py # Streamlit UI (HF Space entrypoint)
โโโ main.py # Pipeline runner
โโโ requirements.txt # Python dependencies
โโโ .env.example # API key template
โโโ README.md
โก Performance
| Metric | Value |
|---|---|
| Average report time | ~30 seconds - 1 min |
| Tavily searches per run | 9 (3 queries ร 3 results) |
| Max revision cycles | 2 |
| Token usage per run | ~8,000-11,000 tokens |
โ ๏ธ Known Limitations
- Critic cannot verify all hallucinations โ it cross-references against Tavily results but cannot catch confidently stated errors absent from search results
- Groq free tier โ 12,000 TPM limit may cause rate limiting on complex topics
- Tavily free tier โ 1,000 searches/month; each run uses 9 searches
- Report quality depends on search result quality โ niche or poorly documented topics produce lower confidence scores
๐บ๏ธ What I Learned
- LangGraph state management and conditional routing
- Critic pattern in multi-agent systems โ and its honest limitations
- Why multi-agent genuinely outperforms single-agent for parallel specialization
- Token optimization for free-tier LLM APIs
- Separation of concerns in agent design (single responsibility per node)
๐ฎ Future Improvements
- Parallel agent execution (Research + Analyst simultaneously)
- Vector store memory for cross-session topic persistence
- PDF export for reports
- Domain-specific agent personas (legal, medical, financial)
- Human-in-the-loop approval before Writer Agent runs
Built as Project 4 of an AI Engineering portfolio. Part of a progression from model fine-tuning โ RAG systems โ single agents โ multi-agent orchestration.