Spaces:
Runtime error
Runtime error
A newer version of the Streamlit SDK is available:
1.55.0
π Detailed Working of the Web Research Agent
The Web Research Agent follows a modular, multi-step pipeline that transforms a user query into a reliable, summarized report. It leverages Google Gemini for natural language processing, Google Custom Search for information retrieval, and BeautifulSoup for scraping.
πΉ Step-by-Step Flow
1. User Input
- User enters a natural language research query in the Streamlit interface.
- Example: "India US trade deal 2025"
2. Query Analyzer (query_analyzer.py)
- Gemini 2.5 Pro model is used to extract structured metadata from the input query.
- Returns:
- Intent (e.g., news, opinion, analysis)
- Keywords (extracted for search)
- Information Types (e.g., statistics, policy summaries)
- Time Range (e.g., "last year")
3. Google Search Tool (search_tool.py)
- Uses the
GOOGLE_CSE_API_KEYandGOOGLE_CSE_CXto perform a search via Google Custom Search API. - Pulls top
nresults (default = 15) based on relevance to query keywords. - Returns list of dictionaries with:
- Title
- URL
- Snippet
4. Web Scraper Tool (scraper_tool.py)
- Visits each URL and extracts readable
<p>tags using BeautifulSoup. - Clips long text to 5000 characters for optimal LLM processing.
- Returns:
- Page content
- URL
5. Content Analyzer (content_analyzer.py)
- Uses Gemini 2.5 to summarize scraped content.
- Adds metadata such as:
- Summary (in bullet points)
- Content Type (e.g., "news report")
- Relevance Rating (high, medium, low)
6. Synthesizer (synthesizer.py)
- Receives all article summaries and the original query.
- Synthesizes content using Gemini with the following logic:
- Group similar insights across sources.
- Highlight contradictions.
- End with a unified "Final Takeaway".
- Returns the final report in Markdown format.
7. Streamlit Output (app.py)
- Displays the following to the user:
- Sidebar: Query analysis & article summaries.
- Main view: Top links and final synthesized report.
π οΈ Tech Stack & Tools
| Component | Technology/Library |
|---|---|
| UI | Streamlit |
| LLM | Google Gemini 2.5 Pro |
| Search API | Google Custom Search (CSE) |
| Scraper | BeautifulSoup, Requests |
| Config Management | Python-dotenv |
π Example End-to-End Flow
Query: "Electric vehicle subsidies in Europe 2024"
Result:
- 15 relevant articles scraped
- 12 summaries processed
- Synthesized final markdown with policy trends, contradictions in subsidy effectiveness, and a closing insight.