Punit1 commited on
Commit
f50852a
·
1 Parent(s): 52e9d16

Add Dockerfile and switch to Docker SDK

Browse files
Files changed (4) hide show
  1. DEPLOY_TO_HF.md +0 -53
  2. Dockerfile +28 -0
  3. PROJECT_README.md +0 -187
  4. walkthrough.md.resolved +0 -256
DEPLOY_TO_HF.md DELETED
@@ -1,53 +0,0 @@
1
- # Quick Deployment to Hugging Face Spaces
2
-
3
- ## TL;DR - Fast Deployment Steps
4
-
5
- ### 1. Get API Keys
6
- - Groq: https://console.groq.com/
7
- - Tavily: https://tavily.com/
8
-
9
- ### 2. Create HF Space
10
- 1. Go to: https://huggingface.co/new-space
11
- 2. Choose: **Streamlit** SDK
12
- 3. Name it: `research-agent`
13
- 4. Create Space
14
-
15
- ### 3. Upload Files
16
-
17
- **Using Web Interface:**
18
- - Upload: `main.py`, `requirements.txt`, entire `src/` folder, `.streamlit/` folder
19
- - **Rename** `HF_README.md` to `README.md` before uploading
20
-
21
- **Using Git:**
22
- ```bash
23
- git init
24
- git add .
25
- git commit -m "Deploy to HF Spaces"
26
- git remote add hf https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
27
- git push hf main
28
- ```
29
-
30
- ### 4. Add Secrets
31
- In your Space → Settings → Repository secrets:
32
- - `GROQ_API_KEY` = your Groq API key
33
- - `TAVILY_API_KEY` = your Tavily API key
34
-
35
- ### 5. Done!
36
- Your app will be live at: `https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME`
37
-
38
- ---
39
-
40
- ## Files Checklist
41
-
42
- ✅ All files are ready in your project:
43
-
44
- - [x] `main.py` - Main app
45
- - [x] `requirements.txt` - Dependencies
46
- - [x] `src/` - Source code
47
- - [x] `.streamlit/config.toml` - HF configuration
48
- - [x] `HF_README.md` - Space README (rename to README.md)
49
- - [x] `.gitignore` - Ignore unnecessary files
50
-
51
- **Your project is deployment-ready!** 🚀
52
-
53
- For detailed instructions, see: `hf_deployment_guide.md`
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Dockerfile ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM python:3.10-slim
2
+
3
+ WORKDIR /app
4
+
5
+ # Install system dependencies
6
+ RUN apt-get update && apt-get install -y \
7
+ build-essential \
8
+ curl \
9
+ software-properties-common \
10
+ && rm -rf /var/lib/apt/lists/*
11
+
12
+ # Copy requirements first for better caching
13
+ COPY requirements.txt .
14
+
15
+ # Install Python dependencies
16
+ RUN pip install --no-cache-dir -r requirements.txt
17
+
18
+ # Copy application code
19
+ COPY . .
20
+
21
+ # Expose port 7860 (HF Spaces default)
22
+ EXPOSE 7860
23
+
24
+ # Health check
25
+ HEALTHCHECK CMD curl --fail http://localhost:7860/_stcore/health
26
+
27
+ # Run the application
28
+ CMD ["streamlit", "run", "main.py", "--server.port=7860", "--server.address=0.0.0.0"]
PROJECT_README.md DELETED
@@ -1,187 +0,0 @@
1
- # Autonomous Research Agent with LangGraph, Groq, and Streamlit
2
-
3
- This repository contains the complete source code for an **autonomous AI research agent**. The agent takes a user-defined topic, performs web searches to gather information, evaluates and summarizes relevant sources, and compiles the findings into a comprehensive report.
4
-
5
- The project is built using a modern AI stack, showcasing a stateful, cyclic architecture that enables complex, multi-step reasoning and execution, all presented through an interactive web interface.
6
-
7
- ---
8
-
9
- ## Core Technologies
10
-
11
- - **Orchestration:** `LangGraph` – Build stateful, multi-actor applications with cycles, enabling complex agentic behaviors.
12
- - **LLM:** `Groq (Llama 3.3 70B)` – High-speed inference using a Language Processing Unit (LPU) for fast and responsive AI reasoning.
13
- - **Web Interface:** `Streamlit` – Interactive and user-friendly chat-based web application built entirely in Python.
14
- - **Search Tool:** `Tavily AI` – AI-optimized search engine to gather accurate and relevant information from the web.
15
- - **Core Framework:** `LangChain` – Provides foundational components, tools, and integrations.
16
-
17
- ---
18
-
19
- ## Key Features
20
-
21
- - **Stateful, Cyclic Architecture:**
22
- Uses LangGraph loops to iteratively search, evaluate, and decide whether to continue researching or compile findings, mimicking a human research process.
23
-
24
- - **High-Performance LLM:**
25
- Leverages Groq LPU with Llama 3.3 70B for reasoning and content generation at extremely high speeds for a seamless user experience.
26
-
27
- - **Fault Tolerance and Persistence:**
28
- Saves the agent's state at every step using `SqliteSaver` checkpointer, allowing long-running tasks to resume from the exact point of failure.
29
-
30
- - **Interactive Web UI:**
31
- Streamlit-based chat interface lets users input topics, monitor progress in real-time, and receive the final report directly in the app.
32
-
33
- - **Deep Observability with LangSmith:**
34
- Provides detailed traces of every agent step for debugging and understanding complex behavior (optional).
35
-
36
- ---
37
-
38
- ## Setup Instructions
39
-
40
- ### Prerequisites
41
-
42
- You will need two API keys:
43
-
44
- 1. **Groq API Key** - Sign up at [console.groq.com](https://console.groq.com/)
45
- 2. **Tavily API Key** - Sign up at [tavily.com](https://tavily.com/)
46
-
47
- ### Installation
48
-
49
- 1. **Clone the repository** (or navigate to the project directory)
50
-
51
- ```bash
52
- cd "Research Agent with LangGraph"
53
- ```
54
-
55
- 2. **Create and activate a virtual environment**
56
-
57
- ```powershell
58
- # Create virtual environment
59
- python -m venv venv
60
-
61
- # Activate it (Windows PowerShell)
62
- .\venv\Scripts\Activate.ps1
63
- ```
64
-
65
- 3. **Install dependencies**
66
-
67
- ```powershell
68
- .\venv\Scripts\python.exe -m pip install --upgrade pip setuptools wheel
69
- .\venv\Scripts\python.exe -m pip install -r requirements.txt
70
- ```
71
-
72
- 4. **Configure environment variables**
73
-
74
- Create or edit the `.env` file in the root directory and add your API keys:
75
-
76
- ```env
77
- GROQ_API_KEY=your_groq_api_key_here
78
- TAVILY_API_KEY=your_tavily_api_key_here
79
- ```
80
-
81
- You can use `.env.example` as a template.
82
-
83
- ---
84
-
85
- ## Running the Application
86
-
87
- Run the Streamlit app with:
88
-
89
- ```powershell
90
- .\venv\Scripts\python.exe -m streamlit run main.py
91
- ```
92
-
93
- The app will automatically open in your browser at `http://localhost:8501`
94
-
95
- ---
96
-
97
- ## Usage
98
-
99
- 1. Open the application in your browser
100
- 2. Enter a research topic in the chat input (e.g., "Recent advances in AI agents")
101
- 3. Watch the agent work:
102
- - 🔍 Search for relevant articles
103
- - 📄 Scrape content from URLs
104
- - 🤖 Evaluate relevance using the LLM
105
- - 📝 Summarize useful information
106
- - 📊 Compile a comprehensive report
107
- 4. Review the final research report
108
-
109
- ---
110
-
111
- ## Project Structure
112
-
113
- ```
114
- Research Agent with LangGraph/
115
- ├── main.py # Streamlit UI and application entry point
116
- ├── src/
117
- │ ├── graph.py # LangGraph workflow and node definitions
118
- │ ├── agent_state.py # Agent state schema
119
- │ └── tools.py # Search and scraping tools
120
- ├── requirements.txt # Python dependencies
121
- ├── .env # API keys (create this file)
122
- ├── .env.example # Template for environment variables
123
- └── checkpoints.sqlite # SQLite database for state persistence
124
- ```
125
-
126
- ---
127
-
128
- ## Troubleshooting
129
-
130
- ### Issue: "streamlit.exe not found" or Import Errors
131
-
132
- **Solution:** Recreate the virtual environment from scratch:
133
-
134
- ```powershell
135
- # Delete old venv
136
- Remove-Item -Recurse -Force venv
137
-
138
- # Create fresh venv
139
- python -m venv venv
140
-
141
- # Upgrade pip
142
- .\venv\Scripts\python.exe -m pip install --upgrade pip setuptools wheel
143
-
144
- # Install dependencies
145
- .\venv\Scripts\python.exe -m pip install -r requirements.txt
146
- ```
147
-
148
- ### Issue: API Key Errors
149
-
150
- **Solution:** Ensure your `.env` file contains valid API keys and is in the project root directory.
151
-
152
- ---
153
-
154
- ## How It Works
155
-
156
- The agent uses a **cyclic LangGraph workflow**:
157
-
158
- 1. **Search Node** → Searches web using Tavily API
159
- 2. **Scrape & Summarize Node** → Scrapes URLs one by one, evaluates relevance, and summarizes
160
- 3. **Router** → Decides to continue scraping or compile report
161
- 4. **Compile Report Node** → Synthesizes all summaries into a final report
162
-
163
- Each step's state is saved to SQLite, enabling fault tolerance.
164
-
165
- ---
166
-
167
- ## Optional: LangSmith Tracing
168
-
169
- To enable detailed tracing and debugging, add to your `.env`:
170
-
171
- ```env
172
- LANGCHAIN_TRACING_V2=true
173
- LANGCHAIN_API_KEY=your_langsmith_api_key
174
- LANGCHAIN_PROJECT=research-agent
175
- ```
176
-
177
- ---
178
-
179
- ## License
180
-
181
- MIT License - Feel free to use and modify this project.
182
-
183
- ---
184
-
185
- ## Contributing
186
-
187
- Contributions are welcome! Feel free to open issues or submit pull requests.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
walkthrough.md.resolved DELETED
@@ -1,256 +0,0 @@
1
- # Research Agent Project - Analysis & Setup Guide
2
-
3
- ## What This Project Does
4
-
5
- This is an **Autonomous Research Agent** built with a modern AI stack that:
6
-
7
- 1. 🔍 **Searches** the web for articles on a given topic (using Tavily AI)
8
- 2. 📄 **Scrapes** content from the discovered URLs
9
- 3. 🤖 **Evaluates** each article for relevance using an LLM
10
- 4. 📝 **Summarizes** relevant content
11
- 5. 📊 **Compiles** a comprehensive research report
12
-
13
- ### Architecture
14
-
15
- The agent uses **LangGraph** to create a stateful, cyclic workflow:
16
-
17
- ```mermaid
18
- graph LR
19
- A[User Input Topic] --> B[Search Node]
20
- B --> C[Scrape & Summarize Node]
21
- C --> D{More URLs?}
22
- D -->|Yes| C
23
- D -->|No| E[Compile Report Node]
24
- E --> F[Final Report]
25
- ```
26
-
27
- ### Technology Stack
28
-
29
- - **LangGraph**: Orchestration of the stateful workflow
30
- - **Groq**: High-speed LLM inference (Llama 3.3 70B)
31
- - **Streamlit**: Interactive web interface
32
- - **Tavily AI**: AI-optimized web search
33
- - **SQLite Checkpointer**: Fault-tolerant state persistence
34
-
35
- ---
36
-
37
- ## Enhancements Made
38
-
39
- Since this project was built 5 months ago, I made the following updates:
40
-
41
- ### 1. Updated LLM Model
42
- **[src/graph.py](file:///c:/Users/punit/Desktop/project/GenAI/Research%20Agent%20with%20LangGraph/src/graph.py#L14-L18)**
43
-
44
- Changed from `openai/gpt-oss-120b` (outdated/unavailable) to `llama-3.3-70b-versatile`:
45
-
46
- ```diff
47
- llm = ChatGroq(
48
- - model="openai/gpt-oss-120b",
49
- + model="llama-3.3-70b-versatile",
50
- temperature=0,
51
- api_key=os.getenv("GROQ_API_KEY")
52
- )
53
- ```
54
-
55
- ### 2. Created Environment Configuration Template
56
- **[.env.example](file:///c:/Users/punit/Desktop/project/GenAI/Research%20Agent%20with%20LangGraph/.env.example)**
57
-
58
- Added a template to help configure the required API keys.
59
-
60
- ### 3. Fixed Dependency Installation Issues
61
-
62
- **Problem:** The initial virtual environment had corrupted dependencies causing import errors.
63
-
64
- **Solution:** Recreated the virtual environment from scratch:
65
- 1. Deleted old `venv` folder
66
- 2. Created fresh virtual environment
67
- 3. Upgraded pip, setuptools, and wheel
68
- 4. Installed all dependencies from [requirements.txt](file:///c:/Users/punit/Desktop/project/GenAI/Research%20Agent%20with%20LangGraph/requirements.txt)
69
-
70
- ---
71
-
72
- ## How to Run
73
-
74
- ### Prerequisites
75
-
76
- You need two API keys:
77
- 1. **Groq API Key** - Get from [console.groq.com](https://console.groq.com/)
78
- 2. **Tavily API Key** - Get from [tavily.com](https://tavily.com/)
79
-
80
- ### Step 1: Configure API Keys
81
-
82
- Edit your [.env](file:///c:/Users/punit/Desktop/project/GenAI/Research%20Agent%20with%20LangGraph/.env) file and add:
83
-
84
- ```env
85
- GROQ_API_KEY=your_groq_api_key_here
86
- TAVILY_API_KEY=your_tavily_api_key_here
87
- ```
88
-
89
- > [!IMPORTANT]
90
- > The [.env](file:///c:/Users/punit/Desktop/project/GenAI/Research%20Agent%20with%20LangGraph/.env) file already exists in the project but needs to be configured with valid API keys.
91
-
92
- ### Step 2: Run the Application
93
-
94
- Use this command to run the application:
95
-
96
- ```powershell
97
- .\venv\Scripts\python.exe -m streamlit run main.py
98
- ```
99
-
100
- > [!TIP]
101
- > **Alternative command** (if the above doesn't work):
102
- > ```powershell
103
- > python -m streamlit run main.py
104
- > ```
105
-
106
- The app will start and automatically open in your browser at `http://localhost:8501`
107
-
108
- ### Step 3: Use the Agent
109
-
110
- 1. Enter a research topic (e.g., "LangGraph features" or "AI agents in 2026")
111
- 2. Watch the agent:
112
- - Search for articles
113
- - Evaluate each URL for relevance
114
- - Summarize relevant content
115
- - Compile the final report
116
- 3. Review the comprehensive research report
117
-
118
- ---
119
-
120
- ## Troubleshooting
121
-
122
- ### Issue: "streamlit.exe not found"
123
-
124
- **Cause:** Dependencies weren't properly installed in the virtual environment.
125
-
126
- **Solution:** Recreate the virtual environment:
127
-
128
- ```powershell
129
- # Delete old venv
130
- Remove-Item -Recurse -Force venv
131
-
132
- # Create new venv
133
- python -m venv venv
134
-
135
- # Upgrade pip
136
- .\venv\Scripts\python.exe -m pip install --upgrade pip setuptools wheel
137
-
138
- # Install dependencies
139
- .\venv\Scripts\python.exe -m pip install -r requirements.txt
140
- ```
141
-
142
- ### Issue: Import errors (pydantic, zstandard, etc.)
143
-
144
- **Cause:** Corrupted package installations.
145
-
146
- **Solution:** Follow the steps above to recreate the virtual environment completely.
147
-
148
- ### Issue: "GROQ_API_KEY not set"
149
-
150
- **Cause:** Missing or improperly configured [.env](file:///c:/Users/punit/Desktop/project/GenAI/Research%20Agent%20with%20LangGraph/.env) file.
151
-
152
- **Solution:** Ensure your [.env](file:///c:/Users/punit/Desktop/project/GenAI/Research%20Agent%20with%20LangGraph/.env) file contains valid API keys.
153
-
154
- ---
155
-
156
- ## Project Evaluation
157
-
158
- ### ✅ Strengths
159
-
160
- - **Well-architected**: Clean separation of concerns (state, graph, tools)
161
- - **Fault-tolerant**: SQLite checkpointer saves state at every step
162
- - **Modern stack**: Uses cutting-edge tools (LangGraph, Groq LPU)
163
- - **User-friendly**: Streamlit provides excellent UX with real-time progress tracking
164
-
165
- ### 🔄 Potential Enhancements
166
-
167
- While the project is solid, here are some optional improvements:
168
-
169
- 1. **Error Handling**
170
- - Add retry logic for failed web requests
171
- - Handle rate limits from Groq/Tavily APIs
172
-
173
- 2. **Content Quality**
174
- - Implement a scoring system for source credibility
175
- - Add citation tracking in the final report
176
-
177
- 3. **Performance**
178
- - Parallelize URL scraping (currently sequential)
179
- - Add caching for previously scraped URLs
180
-
181
- 4. **Features**
182
- - Export reports to PDF/Markdown
183
- - Save research history
184
- - Allow users to specify number of sources to research
185
-
186
- 5. **Observability**
187
- - Enable LangSmith tracing for debugging (already supported, just needs env vars)
188
- - Add metrics dashboard (search count, success rate, etc.)
189
-
190
- 6. **Testing**
191
- - Add unit tests for individual nodes
192
- - Create integration tests for the full workflow
193
-
194
- ---
195
-
196
- ## Technical Deep Dive
197
-
198
- ### Key Files
199
-
200
- | File | Purpose |
201
- |------|---------|
202
- | [main.py](file:///c:/Users/punit/Desktop/project/GenAI/Research%20Agent%20with%20LangGraph/main.py) | Streamlit UI and session management |
203
- | [src/graph.py](file:///c:/Users/punit/Desktop/project/GenAI/Research%20Agent%20with%20LangGraph/src/graph.py) | LangGraph workflow definition and node functions |
204
- | [src/agent_state.py](file:///c:/Users/punit/Desktop/project/GenAI/Research%20Agent%20with%20LangGraph/src/agent_state.py) | TypedDict defining the agent's state schema |
205
- | [src/tools.py](file:///c:/Users/punit/Desktop/project/GenAI/Research%20Agent%20with%20LangGraph/src/tools.py) | Search and scraping tools |
206
-
207
- ### How the Workflow Works
208
-
209
- 1. **Search Node** ([graph.py:L23-L30](file:///c:/Users/punit/Desktop/project/GenAI/Research%20Agent%20with%20LangGraph/src/graph.py#L23-L30))
210
- - Invokes Tavily search
211
- - Extracts URLs from results
212
- - Updates state with URLs list
213
-
214
- 2. **Scrape & Summarize Node** ([graph.py:L32-L69](file:///c:/Users/punit/Desktop/project/GenAI/Research%20Agent%20with%20LangGraph/src/graph.py#L32-L69))
215
- - Pops one URL from the list
216
- - Scrapes content using BeautifulSoup
217
- - Asks LLM to summarize if relevant (or return "IRRELEVANT")
218
- - Adds summary to state if relevant
219
-
220
- 3. **Routing Logic** ([graph.py:L91-L98](file:///c:/Users/punit/Desktop/project/GenAI/Research%20Agent%20with%20LangGraph/src/graph.py#L91-L98))
221
- - If URLs remain → loop back to scrape another
222
- - If no URLs → proceed to compile report
223
-
224
- 4. **Compile Report Node** ([graph.py:L71-L87](file:///c:/Users/punit/Desktop/project/GenAI/Research%20Agent%20with%20LangGraph/src/graph.py#L71-L87))
225
- - Takes all summaries
226
- - Synthesizes into a coherent report
227
- - Returns final report to user
228
-
229
- ---
230
-
231
- ## Example Usage
232
-
233
- **Topic:** "Benefits of LangGraph"
234
-
235
- **Agent Process:**
236
- 1. Searches Tavily → finds 5 relevant articles
237
- 2. Scrapes Article 1 → relevant → summarizes
238
- 3. Scrapes Article 2 → not relevant → skips
239
- 4. Scrapes Article 3 → relevant → summarizes
240
- 5. Scrapes Article 4 → relevant → summarizes
241
- 6. Scrapes Article 5 → relevant → summarizes
242
- 7. Compiles final report from 4 summaries
243
-
244
- **Result:** A comprehensive report covering LangGraph's benefits, compiled from 4 high-quality sources.
245
-
246
- ---
247
-
248
- ## Summary
249
-
250
- ✅ **Project is now fully functional!**
251
-
252
- - Updated LLM model to `llama-3.3-70b-versatile`
253
- - Fixed all dependency installation issues
254
- - Application running successfully on `http://localhost:8501`
255
-
256
- **Next steps:** Configure your API keys in the [.env](file:///c:/Users/punit/Desktop/project/GenAI/Research%20Agent%20with%20LangGraph/.env) file and start researching!