Spaces:

aneeb15
/

multi-agent-research

Sleeping

App Files Files Community

multi-agent-research / Readme.md

aneeb15

Upload folder using huggingface_hub

bf39d5e verified 27 days ago

preview code

raw

history blame contribute delete

9.5 kB

	---
	title: Multi-Agent Research Generator
	emoji: 🔬
	colorFrom: blue
	colorTo: indigo
	sdk: docker
	sdk_version: 1.44.1
	app_file: app.py
	pinned: false
	license: mit
	---

	# 🔬 Multi-Agent Research & Report Generator

	> Orchestrates multiple specialized AI agents to autonomously research, analyze, fact-check, and produce structured professional reports with cited sources.

	![Python](https://img.shields.io/badge/Python-3.10+-blue) ![LangGraph](https://img.shields.io/badge/LangGraph-Latest-green) ![Groq](https://img.shields.io/badge/Groq-Llama3.3-orange) ![Streamlit](https://img.shields.io/badge/UI-Streamlit-red)

	---

	## 🧠 What This Project Demonstrates

	Most AI demos make a single LLM call and call it "AI research." This project does something fundamentally different — it separates research, analysis, fact-checking, and writing into specialized agents that communicate through a shared state graph.

	```
	Naive approach: prompt → LLM → output
	This project: orchestrated multi-agent pipeline with
	conditional routing, critic patterns,
	and real source verification
	```

	This is the architecture pattern used in production enterprise AI systems.

	---

	## 🏗️ Agent Architecture

	```
	User Input: Research Topic
	↓
	Orchestrator (LangGraph StateGraph)
	↓
	┌─────────────────────────────┐
	│ Research Agent │
	│ Tavily web search │
	│ 3 targeted queries │
	│ Real-time sources │
	└──────────┬──────────────────┘
	↓
	┌─────────────────────────────┐
	│ Analyst Agent │
	│ Synthesizes findings │
	│ Identifies patterns │
	│ Flags contradictions │
	└──────────┬──────────────────┘
	↓
	┌─────────────────────────────┐
	│ Critic Agent │
	│ Cross-references claims │
	│ against Tavily sources │ ← actual ground truth
	│ Assigns confidence score │
	│ Flags unsupported claims │
	└──────────┬──────────────────┘
	↓
	┌─────────────────────────────────────────┐
	│ Conditional Router │
	│ needs_revision AND iterations < 2 │
	│ → back to Analyst (max 2 cycles) │
	│ else → Writer Agent │
	└──────────┬──────────────────────────────┘
	↓
	┌─────────────────────────────┐
	│ Writer Agent │
	│ Structured report │
	│ Cited sources │
	│ Confidence score disclosed │
	└──────────┬──────────────────┘
	↓
	Professional Report + Confidence Score
	```

	---

	## 🔑 Key Technical Decisions

	### Why LangGraph State?
	Each agent reads from and writes to a shared `ResearchState` TypedDict. No agent needs to know about other agents — they only interact with state. This makes the pipeline modular, debuggable, and extensible.

	```python
	class ResearchState(TypedDict):
	topic: str
	search_results: List[dict] # Research Agent writes
	analysis: str # Analyst Agent writes
	critic_feedback: str # Critic Agent writes
	confidence_scores: dict # Critic writes, UI reads
	final_report: str # Writer Agent writes
	current_step: str # Orchestrator tracks
	iteration_count: int # Prevents infinite loops
	```

	### Why Tavily for Critic Agent?
	A naive critic agent evaluates analyst output using only LLM knowledge — which means it's one LLM instance confirming another's biases. Our Critic Agent cross-references every claim against actual Tavily search results, giving it real ground truth to check against. This is the architectural difference between a critic that finds real issues vs one that just agrees.

	### Why iteration_count?
	Without a cycle limit, Critic → Analyst → Critic creates an infinite loop that exhausts free-tier rate limits. The Orchestrator increments this counter as a separate node (single responsibility), and the router caps revision cycles at 2 before forcing the pipeline to the Writer Agent.

	### Why separate Orchestrator node for iteration?
	Mixing counter increments into the Critic Agent violates single responsibility. Each node does exactly one thing — the Critic evaluates, the iterate node increments, the router decides. This makes debugging straightforward: if routing fails, only one node is responsible.

	---

	## 📊 Confidence Score — What It Actually Means

	```
	High score (75-100): Most claims in analysis are directly
	supported by Tavily search results

	Low score (40-65): Topic is speculative or emerging —
	fewer verifiable claims in sources

	Example:
	Pakistan job market report → 80/100 (established research exists)
	Iran-US war economic impact → 60/100 (speculative, fewer sources)
	```

	Honest limitation: The Critic verifies logical consistency and cross-references against search results. It cannot replace domain expert fact-checking for critical decisions.

	---

	## 🛠️ Tech Stack

	\| Tool \| Purpose \| Why Free \|
	\|------\|---------\|---------\|
	\| LangGraph \| Agent orchestration \| Open source \|
	\| Groq API \| LLM inference (Llama 3.3 70B) \| Free tier, fastest inference \|
	\| Tavily API \| Real-time web search \| Free tier, 1000 searches/month \|
	\| LangChain \| Tool definitions \| Open source \|
	\| Streamlit \| UI \| Open source \|

	---

	## 🚀 Setup & Installation

	### Prerequisites
	- Python 3.10+
	- Groq API key — [console.groq.com](https://console.groq.com)
	- Tavily API key — [tavily.com](https://tavily.com)

	### Installation

	```bash
	# Clone repo
	git clone https://github.com/yourusername/multi-agent-research
	cd multi-agent-research

	# Create virtual environment
	python -m venv venv
	venv\Scripts\activate # Windows
	source venv/bin/activate # Mac/Linux

	# Install dependencies
	pip install -r requirements.txt

	# Configure environment
	cp .env.example .env
	# Add your API keys to .env
	```

	### Environment Variables

	```
	GROQ_API_KEY=your_groq_key_here
	TAVILY_API_KEY=your_tavily_key_here
	```

	> Hugging Face Space users: Set `GROQ_API_KEY` and `TAVILY_API_KEY` in your Space's Settings → Variables and Secrets — do NOT commit a `.env` file.

	### Run

	```bash
	# Streamlit UI
	streamlit run app.py

	# Terminal only
	python main.py
	```

	---

	## 📁 Project Structure

	```
	multi_agent_research/
	├── agents/
	│ ├── research_agent.py # Tavily web search (3 queries)
	│ ├── analyst_agent.py # Synthesizes findings via LLM
	│ ├── critic_agent.py # Cross-references vs sources
	│ └── writer_agent.py # Produces final report
	├── graph/
	│ └── research_graph.py # LangGraph StateGraph + routing
	├── state/
	│ └── research_state.py # Shared state TypedDict
	├── app.py # Streamlit UI (HF Space entrypoint)
	├── main.py # Pipeline runner
	├── requirements.txt # Python dependencies
	├── .env.example # API key template
	└── README.md
	```

	---

	## ⚡ Performance

	\| Metric \| Value \|
	\|--------\|-------\|
	\| Average report time \| ~30 seconds - 1 min \|
	\| Tavily searches per run \| 9 (3 queries × 3 results) \|
	\| Max revision cycles \| 2 \|
	\| Token usage per run \| ~8,000-11,000 tokens \|

	---

	## ⚠️ Known Limitations

	- Critic cannot verify all hallucinations — it cross-references against Tavily results but cannot catch confidently stated errors absent from search results
	- Groq free tier — 12,000 TPM limit may cause rate limiting on complex topics
	- Tavily free tier — 1,000 searches/month; each run uses 9 searches
	- Report quality depends on search result quality — niche or poorly documented topics produce lower confidence scores

	---

	## 🗺️ What I Learned

	- LangGraph state management and conditional routing
	- Critic pattern in multi-agent systems — and its honest limitations
	- Why multi-agent genuinely outperforms single-agent for parallel specialization
	- Token optimization for free-tier LLM APIs
	- Separation of concerns in agent design (single responsibility per node)

	---

	## 🔮 Future Improvements

	- Parallel agent execution (Research + Analyst simultaneously)
	- Vector store memory for cross-session topic persistence
	- PDF export for reports
	- Domain-specific agent personas (legal, medical, financial)
	- Human-in-the-loop approval before Writer Agent runs

	---

	Built as Project 4 of an AI Engineering portfolio. Part of a progression from model fine-tuning → RAG systems → single agents → multi-agent orchestration.