Spaces:
Sleeping
Sleeping
| title: Multi-Agent Research Generator | |
| emoji: ๐ฌ | |
| colorFrom: blue | |
| colorTo: indigo | |
| sdk: docker | |
| sdk_version: 1.44.1 | |
| app_file: app.py | |
| pinned: false | |
| license: mit | |
| # ๐ฌ Multi-Agent Research & Report Generator | |
| > Orchestrates multiple specialized AI agents to autonomously research, analyze, fact-check, and produce structured professional reports with cited sources. | |
|     | |
| --- | |
| ## ๐ง What This Project Demonstrates | |
| Most AI demos make a single LLM call and call it "AI research." This project does something fundamentally different โ it separates research, analysis, fact-checking, and writing into specialized agents that communicate through a shared state graph. | |
| ``` | |
| Naive approach: prompt โ LLM โ output | |
| This project: orchestrated multi-agent pipeline with | |
| conditional routing, critic patterns, | |
| and real source verification | |
| ``` | |
| This is the architecture pattern used in production enterprise AI systems. | |
| --- | |
| ## ๐๏ธ Agent Architecture | |
| ``` | |
| User Input: Research Topic | |
| โ | |
| Orchestrator (LangGraph StateGraph) | |
| โ | |
| โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ | |
| โ Research Agent โ | |
| โ Tavily web search โ | |
| โ 3 targeted queries โ | |
| โ Real-time sources โ | |
| โโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโ | |
| โ | |
| โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ | |
| โ Analyst Agent โ | |
| โ Synthesizes findings โ | |
| โ Identifies patterns โ | |
| โ Flags contradictions โ | |
| โโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโ | |
| โ | |
| โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ | |
| โ Critic Agent โ | |
| โ Cross-references claims โ | |
| โ against Tavily sources โ โ actual ground truth | |
| โ Assigns confidence score โ | |
| โ Flags unsupported claims โ | |
| โโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโ | |
| โ | |
| โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ | |
| โ Conditional Router โ | |
| โ needs_revision AND iterations < 2 โ | |
| โ โ back to Analyst (max 2 cycles) โ | |
| โ else โ Writer Agent โ | |
| โโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ | |
| โ | |
| โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ | |
| โ Writer Agent โ | |
| โ Structured report โ | |
| โ Cited sources โ | |
| โ Confidence score disclosed โ | |
| โโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโ | |
| โ | |
| Professional Report + Confidence Score | |
| ``` | |
| --- | |
| ## ๐ Key Technical Decisions | |
| ### Why LangGraph State? | |
| Each agent reads from and writes to a shared `ResearchState` TypedDict. No agent needs to know about other agents โ they only interact with state. This makes the pipeline modular, debuggable, and extensible. | |
| ```python | |
| class ResearchState(TypedDict): | |
| topic: str | |
| search_results: List[dict] # Research Agent writes | |
| analysis: str # Analyst Agent writes | |
| critic_feedback: str # Critic Agent writes | |
| confidence_scores: dict # Critic writes, UI reads | |
| final_report: str # Writer Agent writes | |
| current_step: str # Orchestrator tracks | |
| iteration_count: int # Prevents infinite loops | |
| ``` | |
| ### Why Tavily for Critic Agent? | |
| A naive critic agent evaluates analyst output using only LLM knowledge โ which means it's one LLM instance confirming another's biases. Our Critic Agent cross-references every claim against actual Tavily search results, giving it real ground truth to check against. This is the architectural difference between a critic that finds real issues vs one that just agrees. | |
| ### Why iteration_count? | |
| Without a cycle limit, Critic โ Analyst โ Critic creates an infinite loop that exhausts free-tier rate limits. The Orchestrator increments this counter as a separate node (single responsibility), and the router caps revision cycles at 2 before forcing the pipeline to the Writer Agent. | |
| ### Why separate Orchestrator node for iteration? | |
| Mixing counter increments into the Critic Agent violates single responsibility. Each node does exactly one thing โ the Critic evaluates, the iterate node increments, the router decides. This makes debugging straightforward: if routing fails, only one node is responsible. | |
| --- | |
| ## ๐ Confidence Score โ What It Actually Means | |
| ``` | |
| High score (75-100): Most claims in analysis are directly | |
| supported by Tavily search results | |
| Low score (40-65): Topic is speculative or emerging โ | |
| fewer verifiable claims in sources | |
| Example: | |
| Pakistan job market report โ 80/100 (established research exists) | |
| Iran-US war economic impact โ 60/100 (speculative, fewer sources) | |
| ``` | |
| **Honest limitation:** The Critic verifies logical consistency and cross-references against search results. It cannot replace domain expert fact-checking for critical decisions. | |
| --- | |
| ## ๐ ๏ธ Tech Stack | |
| | Tool | Purpose | Why Free | | |
| |------|---------|---------| | |
| | LangGraph | Agent orchestration | Open source | | |
| | Groq API | LLM inference (Llama 3.3 70B) | Free tier, fastest inference | | |
| | Tavily API | Real-time web search | Free tier, 1000 searches/month | | |
| | LangChain | Tool definitions | Open source | | |
| | Streamlit | UI | Open source | | |
| --- | |
| ## ๐ Setup & Installation | |
| ### Prerequisites | |
| - Python 3.10+ | |
| - Groq API key โ [console.groq.com](https://console.groq.com) | |
| - Tavily API key โ [tavily.com](https://tavily.com) | |
| ### Installation | |
| ```bash | |
| # Clone repo | |
| git clone https://github.com/yourusername/multi-agent-research | |
| cd multi-agent-research | |
| # Create virtual environment | |
| python -m venv venv | |
| venv\Scripts\activate # Windows | |
| source venv/bin/activate # Mac/Linux | |
| # Install dependencies | |
| pip install -r requirements.txt | |
| # Configure environment | |
| cp .env.example .env | |
| # Add your API keys to .env | |
| ``` | |
| ### Environment Variables | |
| ``` | |
| GROQ_API_KEY=your_groq_key_here | |
| TAVILY_API_KEY=your_tavily_key_here | |
| ``` | |
| > **Hugging Face Space users:** Set `GROQ_API_KEY` and `TAVILY_API_KEY` in your Space's **Settings โ Variables and Secrets** โ do NOT commit a `.env` file. | |
| ### Run | |
| ```bash | |
| # Streamlit UI | |
| streamlit run app.py | |
| # Terminal only | |
| python main.py | |
| ``` | |
| --- | |
| ## ๐ Project Structure | |
| ``` | |
| multi_agent_research/ | |
| โโโ agents/ | |
| โ โโโ research_agent.py # Tavily web search (3 queries) | |
| โ โโโ analyst_agent.py # Synthesizes findings via LLM | |
| โ โโโ critic_agent.py # Cross-references vs sources | |
| โ โโโ writer_agent.py # Produces final report | |
| โโโ graph/ | |
| โ โโโ research_graph.py # LangGraph StateGraph + routing | |
| โโโ state/ | |
| โ โโโ research_state.py # Shared state TypedDict | |
| โโโ app.py # Streamlit UI (HF Space entrypoint) | |
| โโโ main.py # Pipeline runner | |
| โโโ requirements.txt # Python dependencies | |
| โโโ .env.example # API key template | |
| โโโ README.md | |
| ``` | |
| --- | |
| ## โก Performance | |
| | Metric | Value | | |
| |--------|-------| | |
| | Average report time | ~30 seconds - 1 min | | |
| | Tavily searches per run | 9 (3 queries ร 3 results) | | |
| | Max revision cycles | 2 | | |
| | Token usage per run | ~8,000-11,000 tokens | | |
| --- | |
| ## โ ๏ธ Known Limitations | |
| - **Critic cannot verify all hallucinations** โ it cross-references against Tavily results but cannot catch confidently stated errors absent from search results | |
| - **Groq free tier** โ 12,000 TPM limit may cause rate limiting on complex topics | |
| - **Tavily free tier** โ 1,000 searches/month; each run uses 9 searches | |
| - **Report quality depends on search result quality** โ niche or poorly documented topics produce lower confidence scores | |
| --- | |
| ## ๐บ๏ธ What I Learned | |
| - LangGraph state management and conditional routing | |
| - Critic pattern in multi-agent systems โ and its honest limitations | |
| - Why multi-agent genuinely outperforms single-agent for parallel specialization | |
| - Token optimization for free-tier LLM APIs | |
| - Separation of concerns in agent design (single responsibility per node) | |
| --- | |
| ## ๐ฎ Future Improvements | |
| - Parallel agent execution (Research + Analyst simultaneously) | |
| - Vector store memory for cross-session topic persistence | |
| - PDF export for reports | |
| - Domain-specific agent personas (legal, medical, financial) | |
| - Human-in-the-loop approval before Writer Agent runs | |
| --- | |
| *Built as Project 4 of an AI Engineering portfolio. Part of a progression from model fine-tuning โ RAG systems โ single agents โ multi-agent orchestration.* |