Rishabh2095 commited on
Commit
eb36d35
·
2 Parent(s): 40ea651 c7be661

Merged conflict resolved

Browse files
.gitignore CHANGED
@@ -46,6 +46,7 @@ requirements.txt
46
  docker-compose.override.example.yml
47
  DOCKERFILE_EXPLANATION.md
48
  DEPLOYMENT_GUIDE.md
 
49
  ./src/job_writing_agent/logs/*.log
50
 
51
  # Binary files (PDFs, images, etc.)
@@ -56,4 +57,6 @@ DEPLOYMENT_GUIDE.md
56
  *.gif
57
  *.zip
58
  *.tar
59
- *.gz
 
 
 
46
  docker-compose.override.example.yml
47
  DOCKERFILE_EXPLANATION.md
48
  DEPLOYMENT_GUIDE.md
49
+ <<<<<<< HEAD
50
  ./src/job_writing_agent/logs/*.log
51
 
52
  # Binary files (PDFs, images, etc.)
 
57
  *.gif
58
  *.zip
59
  *.tar
60
+ *.gz
61
+ =======
62
+ >>>>>>> 64d45e6aae112e37b1f8aa7e8180959a0b9cac27
DEPLOYMENT_GUIDE.md ADDED
@@ -0,0 +1,303 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Deployment Guide for Job Application Agent
2
+
3
+ ## Option 1: LangGraph Cloud (Easiest & Recommended)
4
+
5
+ ### Prerequisites
6
+ - LangGraph CLI installed (`langgraph-cli` in requirements.txt)
7
+ - `langgraph.json` already configured ✅
8
+
9
+ ### Steps
10
+
11
+ 1. **Install LangGraph CLI** (if not already):
12
+ ```powershell
13
+ pip install langgraph-cli
14
+ ```
15
+
16
+ 2. **Login to LangGraph Cloud**:
17
+ ```powershell
18
+ langgraph login
19
+ ```
20
+
21
+ 3. **Deploy your agent**:
22
+ ```powershell
23
+ langgraph deploy
24
+ ```
25
+
26
+ 4. **Get your API endpoint** - LangGraph Cloud provides a REST API automatically
27
+
28
+ ### Cost
29
+ - **Free tier**: Limited requests/month
30
+ - **Paid**: Pay-per-use pricing
31
+
32
+ ### Pros
33
+ - ✅ Zero infrastructure management
34
+ - ✅ Built-in state persistence
35
+ - ✅ Automatic API generation
36
+ - ✅ LangSmith integration
37
+ - ✅ Perfect for LangGraph apps
38
+
39
+ ### Cons
40
+ - ⚠️ Vendor lock-in
41
+ - ⚠️ Limited customization
42
+
43
+ ---
44
+
45
+ ## Option 2: Railway.app (Simple & Cheap)
46
+
47
+ ### Steps
48
+
49
+ 1. **Create a FastAPI wrapper** (create `api.py`):
50
+ ```python
51
+ from fastapi import FastAPI, File, UploadFile
52
+ from job_writing_agent.workflow import JobWorkflow
53
+ import tempfile
54
+ import os
55
+
56
+ app = FastAPI()
57
+
58
+ @app.post("/generate")
59
+ async def generate_application(
60
+ resume: UploadFile = File(...),
61
+ job_description: str,
62
+ content_type: str = "cover_letter"
63
+ ):
64
+ # Save resume temporarily
65
+ with tempfile.NamedTemporaryFile(delete=False, suffix=".pdf") as tmp:
66
+ tmp.write(await resume.read())
67
+ resume_path = tmp.name
68
+
69
+ try:
70
+ workflow = JobWorkflow(
71
+ resume=resume_path,
72
+ job_description_source=job_description,
73
+ content=content_type
74
+ )
75
+ result = await workflow.run()
76
+ return {"result": result}
77
+ finally:
78
+ os.unlink(resume_path)
79
+ ```
80
+
81
+ 2. **Create `Procfile`**:
82
+ ```
83
+ web: uvicorn api:app --host 0.0.0.0 --port $PORT
84
+ ```
85
+
86
+ 3. **Deploy to Railway**:
87
+ - Sign up at [railway.app](https://railway.app)
88
+ - Connect GitHub repo
89
+ - Railway auto-detects Python and runs `Procfile`
90
+
91
+ ### Cost
92
+ - **Free tier**: $5 credit/month
93
+ - **Hobby**: $5/month for 512MB RAM
94
+ - **Pro**: $20/month for 2GB RAM
95
+
96
+ ### Pros
97
+ - ✅ Very simple deployment
98
+ - ✅ Auto-scaling
99
+ - ✅ Free tier available
100
+ - ✅ Automatic HTTPS
101
+
102
+ ### Cons
103
+ - ⚠️ Need to add FastAPI wrapper
104
+ - ⚠️ State management needs Redis/Postgres
105
+
106
+ ---
107
+
108
+ ## Option 3: Render.com (Similar to Railway)
109
+
110
+ ### Steps
111
+
112
+ 1. **Create `render.yaml`**:
113
+ ```yaml
114
+ services:
115
+ - type: web
116
+ name: job-writer-api
117
+ env: python
118
+ buildCommand: pip install -r requirements.txt
119
+ startCommand: uvicorn api:app --host 0.0.0.0 --port $PORT
120
+ envVars:
121
+ - key: OPENROUTER_API_KEY
122
+ sync: false
123
+ - key: TAVILY_API_KEY
124
+ sync: false
125
+ ```
126
+
127
+ 2. **Deploy**:
128
+ - Connect GitHub repo to Render
129
+ - Render auto-detects `render.yaml`
130
+
131
+ ### Cost
132
+ - **Free tier**: 750 hours/month (sleeps after 15min inactivity)
133
+ - **Starter**: $7/month (always on)
134
+
135
+ ### Pros
136
+ - ✅ Free tier for testing
137
+ - ✅ Simple YAML config
138
+ - ✅ Auto-deploy from Git
139
+
140
+ ### Cons
141
+ - ⚠️ Free tier sleeps (cold starts)
142
+ - ⚠️ Need FastAPI wrapper
143
+
144
+ ---
145
+
146
+ ## Option 4: Fly.io (Good Free Tier)
147
+
148
+ ### Steps
149
+
150
+ 1. **Install Fly CLI**:
151
+ ```powershell
152
+ iwr https://fly.io/install.ps1 -useb | iex
153
+ ```
154
+
155
+ 2. **Create `Dockerfile`**:
156
+ ```dockerfile
157
+ FROM python:3.12-slim
158
+
159
+ WORKDIR /app
160
+ COPY requirements.txt .
161
+ RUN pip install --no-cache-dir -r requirements.txt
162
+
163
+ COPY . .
164
+
165
+ CMD ["uvicorn", "api:app", "--host", "0.0.0.0", "--port", "8080"]
166
+ ```
167
+
168
+ 3. **Deploy**:
169
+ ```powershell
170
+ fly launch
171
+ fly deploy
172
+ ```
173
+
174
+ ### Cost
175
+ - **Free tier**: 3 shared-cpu VMs, 3GB storage
176
+ - **Paid**: $1.94/month per VM
177
+
178
+ ### Pros
179
+ - ✅ Generous free tier
180
+ - ✅ Global edge deployment
181
+ - ✅ Docker-based (flexible)
182
+
183
+ ### Cons
184
+ - ⚠️ Need Docker knowledge
185
+ - ⚠️ Need FastAPI wrapper
186
+
187
+ ---
188
+
189
+ ## Option 5: AWS Lambda (Serverless - Pay Per Use)
190
+
191
+ ### Steps
192
+
193
+ 1. **Create Lambda handler** (`lambda_handler.py`):
194
+ ```python
195
+ import json
196
+ from job_writing_agent.workflow import JobWorkflow
197
+
198
+ def lambda_handler(event, context):
199
+ # Parse event
200
+ body = json.loads(event['body'])
201
+
202
+ workflow = JobWorkflow(
203
+ resume=body['resume_path'],
204
+ job_description_source=body['job_description'],
205
+ content=body.get('content_type', 'cover_letter')
206
+ )
207
+
208
+ result = workflow.run()
209
+
210
+ return {
211
+ 'statusCode': 200,
212
+ 'body': json.dumps({'result': result})
213
+ }
214
+ ```
215
+
216
+ 2. **Package and deploy** using AWS SAM or Serverless Framework
217
+
218
+ ### Cost
219
+ - **Free tier**: 1M requests/month
220
+ - **Paid**: $0.20 per 1M requests + compute time
221
+
222
+ ### Pros
223
+ - ✅ Pay only for usage
224
+ - ✅ Auto-scaling
225
+ - ✅ Very cheap for low traffic
226
+
227
+ ### Cons
228
+ - ⚠️ 15min timeout limit
229
+ - ⚠️ Cold starts
230
+ - ⚠️ Complex setup
231
+ - ⚠️ Need to handle state externally
232
+
233
+ ---
234
+
235
+ ## Recommendation
236
+
237
+ **For your use case, I recommend:**
238
+
239
+ 1. **Start with LangGraph Cloud** - Easiest, built for your stack
240
+ 2. **If you need more control → Railway** - Simple, good free tier
241
+ 3. **If you need serverless → AWS Lambda** - Cheapest for low traffic
242
+
243
+ ---
244
+
245
+ ## Quick Start: FastAPI Wrapper (for Railway/Render/Fly.io)
246
+
247
+ Create `api.py` in your project root:
248
+
249
+ ```python
250
+ from fastapi import FastAPI, File, UploadFile, HTTPException
251
+ from fastapi.responses import JSONResponse
252
+ from job_writing_agent.workflow import JobWorkflow
253
+ import tempfile
254
+ import os
255
+ import asyncio
256
+
257
+ app = FastAPI(title="Job Application Writer API")
258
+
259
+ @app.get("/")
260
+ def health():
261
+ return {"status": "ok"}
262
+
263
+ @app.post("/generate")
264
+ async def generate_application(
265
+ resume: UploadFile = File(...),
266
+ job_description: str,
267
+ content_type: str = "cover_letter"
268
+ ):
269
+ """Generate job application material."""
270
+ # Save resume temporarily
271
+ with tempfile.NamedTemporaryFile(delete=False, suffix=".pdf") as tmp:
272
+ content = await resume.read()
273
+ tmp.write(content)
274
+ resume_path = tmp.name
275
+
276
+ try:
277
+ workflow = JobWorkflow(
278
+ resume=resume_path,
279
+ job_description_source=job_description,
280
+ content=content_type
281
+ )
282
+
283
+ # Run workflow (assuming it's async or can be wrapped)
284
+ result = await asyncio.to_thread(workflow.run)
285
+
286
+ return JSONResponse({
287
+ "status": "success",
288
+ "result": result
289
+ })
290
+ except Exception as e:
291
+ raise HTTPException(status_code=500, detail=str(e))
292
+ finally:
293
+ # Cleanup
294
+ if os.path.exists(resume_path):
295
+ os.unlink(resume_path)
296
+
297
+ if __name__ == "__main__":
298
+ import uvicorn
299
+ uvicorn.run(app, host="0.0.0.0", port=8000)
300
+ ```
301
+
302
+ Then update `requirements.txt` to ensure FastAPI and uvicorn are included (they already are ✅).
303
+
DOCKERFILE_EXPLANATION.md ADDED
@@ -0,0 +1,147 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Dockerfile Explanation
2
+
3
+ This Dockerfile is specifically designed for **LangGraph Cloud/LangServe deployment**. It uses the official LangGraph API base image and configures your agent graphs to be served as REST APIs.
4
+
5
+ ## Line-by-Line Breakdown
6
+
7
+ ### 1. Base Image (Line 1)
8
+ ```dockerfile
9
+ FROM langchain/langgraph-api:3.12
10
+ ```
11
+ - **Purpose**: Uses the official LangGraph API base image with Python 3.12
12
+ - **What it includes**: Pre-configured LangGraph runtime, LangServe server, and all LangGraph dependencies
13
+ - **Why**: This image already has everything needed to serve LangGraph workflows as REST APIs
14
+
15
+ ---
16
+
17
+ ### 2. Install Node Dependencies (Line 9)
18
+ ```dockerfile
19
+ RUN PYTHONDONTWRITEBYTECODE=1 uv pip install --system --no-cache-dir -c /api/constraints.txt nodes
20
+ ```
21
+ - **Purpose**: Installs the `nodes` package (likely a dependency from your `langgraph.json`)
22
+ - **`PYTHONDONTWRITEBYTECODE=1`**: Prevents creating `.pyc` files (smaller image)
23
+ - **`uv pip`**: Uses `uv` (fast Python package installer) instead of regular `pip`
24
+ - **`--system`**: Installs to system Python (not virtual env)
25
+ - **`--no-cache-dir`**: Doesn't cache pip downloads (smaller image)
26
+ - **`-c /api/constraints.txt`**: Uses constraint file from base image (ensures compatible versions)
27
+
28
+ ---
29
+
30
+ ### 3. Copy Your Code (Line 14)
31
+ ```dockerfile
32
+ ADD . /deps/job_writer
33
+ ```
34
+ - **Purpose**: Copies your entire project into `/deps/job_writer` in the container
35
+ - **Why `/deps/`**: LangGraph API expects dependencies in this directory
36
+ - **What gets copied**: All your source code, `pyproject.toml`, `requirements.txt`, etc.
37
+
38
+ ---
39
+
40
+ ### 4. Install Your Package (Lines 19-21)
41
+ ```dockerfile
42
+ RUN for dep in /deps/*; do
43
+ echo "Installing $dep";
44
+ if [ -d "$dep" ]; then
45
+ echo "Installing $dep";
46
+ (cd "$dep" && PYTHONDONTWRITEBYTECODE=1 uv pip install --system --no-cache-dir -c /api/constraints.txt -e .);
47
+ fi;
48
+ done
49
+ ```
50
+ - **Purpose**: Installs your `job_writer` package in editable mode (`-e`)
51
+ - **How it works**:
52
+ - Loops through all directories in `/deps/`
53
+ - For each directory, changes into it and runs `pip install -e .`
54
+ - The `-e` flag installs in "editable" mode (changes to code are reflected)
55
+ - **Why**: Makes your package importable as `job_writing_agent` inside the container
56
+
57
+ ---
58
+
59
+ ### 5. Register Your Graphs (Line 25)
60
+ ```dockerfile
61
+ ENV LANGSERVE_GRAPHS='{"job_app_graph": "/deps/job_writer/src/job_writing_agent/workflow.py:job_app_graph", ...}'
62
+ ```
63
+ - **Purpose**: Tells LangServe which graphs to expose as REST APIs
64
+ - **Format**: JSON mapping of `graph_name` → `module_path:attribute_name`
65
+ - **What it does**:
66
+ - `job_app_graph` → Exposes `JobWorkflow.job_app_graph` property as an API endpoint
67
+ - `research_workflow` → Exposes the research subgraph
68
+ - `data_loading_workflow` → Exposes the data loading subgraph
69
+ - **Result**: Each graph becomes a REST API endpoint like `/invoke/job_app_graph`
70
+
71
+ ---
72
+
73
+ ### 6. Protect LangGraph API (Lines 33-35)
74
+ ```dockerfile
75
+ RUN mkdir -p /api/langgraph_api /api/langgraph_runtime /api/langgraph_license && \
76
+ touch /api/langgraph_api/__init__.py /api/langgraph_runtime/__init__.py /api/langgraph_license/__init__.py
77
+ RUN PYTHONDONTWRITEBYTECODE=1 uv pip install --system --no-cache-dir --no-deps -e /api
78
+ ```
79
+ - **Purpose**: Prevents your dependencies from accidentally overwriting LangGraph API packages
80
+ - **How**:
81
+ 1. Creates placeholder `__init__.py` files for LangGraph packages
82
+ 2. Reinstalls LangGraph API (without dependencies) to ensure it's not overwritten
83
+ - **Why**: If your `requirements.txt` has conflicting versions, this ensures LangGraph API stays intact
84
+
85
+ ---
86
+
87
+ ### 7. Cleanup Build Tools (Lines 37-41)
88
+ ```dockerfile
89
+ RUN pip uninstall -y pip setuptools wheel
90
+ RUN rm -rf /usr/local/lib/python*/site-packages/pip* ...
91
+ RUN uv pip uninstall --system pip setuptools wheel && rm /usr/bin/uv /usr/bin/uvx
92
+ ```
93
+ - **Purpose**: Removes all build tools to make the image smaller and more secure
94
+ - **What gets removed**:
95
+ - `pip`, `setuptools`, `wheel` (Python build tools)
96
+ - `uv` and `uvx` (package installers)
97
+ - **Why**: These tools aren't needed at runtime, only during build
98
+ - **Security**: Smaller attack surface (can't install malicious packages at runtime)
99
+
100
+ ---
101
+
102
+ ### 8. Set Working Directory (Line 45)
103
+ ```dockerfile
104
+ WORKDIR /deps/job_writer
105
+ ```
106
+ - **Purpose**: Sets the default directory when the container starts
107
+ - **Why**: Makes it easier to reference files relative to your project root
108
+
109
+ ---
110
+
111
+ ## How It Works at Runtime
112
+
113
+ When this container runs:
114
+
115
+ 1. **LangServe starts automatically** (from base image)
116
+ 2. **Reads `LANGSERVE_GRAPHS`** environment variable
117
+ 3. **Imports your graphs** from the specified paths
118
+ 4. **Exposes REST API endpoints**:
119
+ - `POST /invoke/job_app_graph` - Main workflow
120
+ - `POST /invoke/research_workflow` - Research subgraph
121
+ - `POST /invoke/data_loading_workflow` - Data loading subgraph
122
+ 5. **Handles state management** automatically (checkpointing, persistence)
123
+
124
+ ## Example API Usage
125
+
126
+ Once deployed, you can call your agent like this:
127
+
128
+ ```bash
129
+ curl -X POST http://your-deployment/invoke/job_app_graph \
130
+ -H "Content-Type: application/json" \
131
+ -d '{
132
+ "resume_path": "...",
133
+ "job_description_source": "...",
134
+ "content": "cover_letter"
135
+ }'
136
+ ```
137
+
138
+ ## Key Points
139
+
140
+ ✅ **Optimized for LangGraph Cloud** - Uses official base image
141
+ ✅ **Automatic API generation** - No need to write FastAPI code
142
+ ✅ **State management** - Built-in checkpointing and persistence
143
+ ✅ **Security** - Removes build tools from final image
144
+ ✅ **Small image** - No-cache installs, no bytecode files
145
+
146
+ This is the **easiest deployment option** for LangGraph apps - just build and push this Docker image!
147
+
Dockerfile CHANGED
@@ -34,6 +34,7 @@ RUN --mount=type=cache,target=/root/.cache/uv \
34
  # Install Playwright system dependencies (after playwright package is installed)
35
  RUN playwright install-deps chromium
36
 
 
37
  # Create user's cache directory for Playwright browsers (BEFORE installing browsers)
38
  # This ensures browsers are installed to the correct location that persists in the image
39
  RUN mkdir -p /home/hf_user/.cache/ms-playwright && \
@@ -47,6 +48,11 @@ RUN --mount=type=cache,target=/root/.cache/ms-playwright \
47
  playwright install chromium && \
48
  # Fix ownership after installation (browsers are installed as root)
49
  chown -R hf_user:hf_user /home/hf_user/.cache/ms-playwright
 
 
 
 
 
50
 
51
  # Create API directories and install langgraph-api as ROOT
52
  RUN mkdir -p /api/langgraph_api /api/langgraph_runtime /api/langgraph_license && \
@@ -81,9 +87,13 @@ ENV HOME=/home/hf_user \
81
  # Package-specific cache directories (for packages that don't fully respect XDG)
82
  TIKTOKEN_CACHE_DIR=/home/hf_user/.cache/tiktoken \
83
  HF_HOME=/home/hf_user/.cache/huggingface \
 
84
  TORCH_HOME=/home/hf_user/.cache/torch \
85
  # Playwright browsers path (so it knows where to find browsers at runtime)
86
  PLAYWRIGHT_BROWSERS_PATH=/home/hf_user/.cache/ms-playwright
 
 
 
87
 
88
  WORKDIR /deps/job_writer
89
 
 
34
  # Install Playwright system dependencies (after playwright package is installed)
35
  RUN playwright install-deps chromium
36
 
37
+ <<<<<<< HEAD
38
  # Create user's cache directory for Playwright browsers (BEFORE installing browsers)
39
  # This ensures browsers are installed to the correct location that persists in the image
40
  RUN mkdir -p /home/hf_user/.cache/ms-playwright && \
 
48
  playwright install chromium && \
49
  # Fix ownership after installation (browsers are installed as root)
50
  chown -R hf_user:hf_user /home/hf_user/.cache/ms-playwright
51
+ =======
52
+ # Install Playwright browser binaries (with cache mount)
53
+ RUN --mount=type=cache,target=/root/.cache/ms-playwright \
54
+ playwright install chromium
55
+ >>>>>>> 64d45e6aae112e37b1f8aa7e8180959a0b9cac27
56
 
57
  # Create API directories and install langgraph-api as ROOT
58
  RUN mkdir -p /api/langgraph_api /api/langgraph_runtime /api/langgraph_license && \
 
87
  # Package-specific cache directories (for packages that don't fully respect XDG)
88
  TIKTOKEN_CACHE_DIR=/home/hf_user/.cache/tiktoken \
89
  HF_HOME=/home/hf_user/.cache/huggingface \
90
+ <<<<<<< HEAD
91
  TORCH_HOME=/home/hf_user/.cache/torch \
92
  # Playwright browsers path (so it knows where to find browsers at runtime)
93
  PLAYWRIGHT_BROWSERS_PATH=/home/hf_user/.cache/ms-playwright
94
+ =======
95
+ TORCH_HOME=/home/hf_user/.cache/torch
96
+ >>>>>>> 64d45e6aae112e37b1f8aa7e8180959a0b9cac27
97
 
98
  WORKDIR /deps/job_writer
99
 
docker-compose.override.example.yml ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Example override file for local development
2
+ # Copy this to docker-compose.override.yml to customize settings
3
+ # docker-compose automatically loads override files
4
+
5
+ version: "3.9"
6
+ services:
7
+ redis:
8
+ # Override Redis port for local development
9
+ ports:
10
+ - "6380:6379" # Use different port if 6379 is already in use
11
+
12
+ postgres:
13
+ # Override Postgres port for local development
14
+ ports:
15
+ - "5433:5432" # Use different port if 5432 is already in use
16
+ environment:
17
+ # Override credentials for local dev
18
+ - POSTGRES_USER=dev_user
19
+ - POSTGRES_PASSWORD=dev_password
20
+ - POSTGRES_DB=job_app_dev
21
+
src/job_writing_agent/nodes/resume_loader.py CHANGED
@@ -7,6 +7,7 @@ the resume file and returning the resume in the required format.
7
  """
8
 
9
  import logging
 
10
  from pathlib import Path
11
  from typing import Any, Callable, Optional
12
 
@@ -14,6 +15,11 @@ from job_writing_agent.utils.document_processing import (
14
  get_resume as get_resume_docs,
15
  parse_resume,
16
  )
 
 
 
 
 
17
  from job_writing_agent.utils.logging.logging_decorators import (
18
  log_async,
19
  log_errors,
@@ -59,8 +65,13 @@ class ResumeLoader:
59
  Parameters
60
  ----------
61
  resume_source: Any
 
62
  Path, URL, or file-like object. Supports local paths, HTTP/HTTPS URLs,
63
  and HuggingFace Hub dataset references (e.g., "username/dataset::resume.pdf").
 
 
 
 
64
 
65
  Returns
66
  -------
@@ -78,10 +89,14 @@ class ResumeLoader:
78
  resume_text = ""
79
  assert resume_source is not None, "resume_source cannot be None"
80
 
 
81
  if isinstance(resume_source, (str, Path)):
82
  resume_chunks = await get_resume_docs(resume_source)
83
  else:
84
  resume_chunks = self._parser(resume_source)
 
 
 
85
 
86
  for chunk in resume_chunks:
87
  if hasattr(chunk, "page_content") and chunk.page_content:
 
7
  """
8
 
9
  import logging
10
+ <<<<<<< HEAD
11
  from pathlib import Path
12
  from typing import Any, Callable, Optional
13
 
 
15
  get_resume as get_resume_docs,
16
  parse_resume,
17
  )
18
+ =======
19
+ from typing import Callable, Any, Optional
20
+
21
+ from job_writing_agent.utils.document_processing import parse_resume
22
+ >>>>>>> 64d45e6aae112e37b1f8aa7e8180959a0b9cac27
23
  from job_writing_agent.utils.logging.logging_decorators import (
24
  log_async,
25
  log_errors,
 
65
  Parameters
66
  ----------
67
  resume_source: Any
68
+ <<<<<<< HEAD
69
  Path, URL, or file-like object. Supports local paths, HTTP/HTTPS URLs,
70
  and HuggingFace Hub dataset references (e.g., "username/dataset::resume.pdf").
71
+ =======
72
+ Path or file-like object accepted by the parser function.
73
+ Can be a file path, URL, or file-like object.
74
+ >>>>>>> 64d45e6aae112e37b1f8aa7e8180959a0b9cac27
75
 
76
  Returns
77
  -------
 
89
  resume_text = ""
90
  assert resume_source is not None, "resume_source cannot be None"
91
 
92
+ <<<<<<< HEAD
93
  if isinstance(resume_source, (str, Path)):
94
  resume_chunks = await get_resume_docs(resume_source)
95
  else:
96
  resume_chunks = self._parser(resume_source)
97
+ =======
98
+ resume_chunks = self._parser(resume_source)
99
+ >>>>>>> 64d45e6aae112e37b1f8aa7e8180959a0b9cac27
100
 
101
  for chunk in resume_chunks:
102
  if hasattr(chunk, "page_content") and chunk.page_content:
src/job_writing_agent/utils/document_processing.py CHANGED
@@ -3,6 +3,7 @@ Document processing utilities for parsing resumes and job descriptions.
3
  """
4
 
5
  # Standard library imports
 
6
  import asyncio
7
  import logging
8
  import os
@@ -10,12 +11,21 @@ import re
10
  import tempfile
11
  from pathlib import Path
12
  from typing import Optional
 
 
 
 
 
 
13
  from urllib.parse import urlparse
14
 
15
  # Third-party imports
16
  import dspy
 
17
  import httpx
18
  from huggingface_hub import hf_hub_download
 
 
19
  from langchain_community.document_loaders import PyPDFLoader, AsyncChromiumLoader
20
  from langchain_community.document_transformers import Html2TextTransformer
21
  from langchain_core.documents import Document
@@ -28,12 +38,16 @@ from pydantic import BaseModel, Field
28
  from typing_extensions import Any
29
 
30
  # Local imports
 
31
  from .errors import (
32
  JobDescriptionParsingError,
33
  LLMProcessingError,
34
  ResumeDownloadError,
35
  URLExtractionError,
36
  )
 
 
 
37
 
38
  # Set up logging
39
  logger = logging.getLogger(__name__)
@@ -268,6 +282,7 @@ def _is_heading(line: str) -> bool:
268
  return line.isupper() and len(line.split()) <= 5 and not re.search(r"\d", line)
269
 
270
 
 
271
  def _is_huggingface_hub_url(url: str) -> tuple[bool, Optional[str], Optional[str]]:
272
  """
273
  Detect if URL or string is a HuggingFace Hub reference and extract repo_id and filename.
@@ -424,6 +439,8 @@ async def download_file_from_url(
424
  raise ResumeDownloadError(f"Could not save file from {url}: {e}") from e
425
 
426
 
 
 
427
  def parse_resume(file_path: str | Path) -> list[Document]:
428
  """
429
  Load a résumé from PDF or TXT file → list[Document] chunks
@@ -472,6 +489,7 @@ def parse_resume(file_path: str | Path) -> list[Document]:
472
  return chunks
473
 
474
 
 
475
  async def get_resume(file_path_or_url: str | Path) -> list[Document]:
476
  """
477
  Load a résumé from a local file path or URL.
@@ -514,6 +532,8 @@ async def get_resume(file_path_or_url: str | Path) -> list[Document]:
514
  )
515
 
516
 
 
 
517
  async def get_job_description(file_path_or_url: str) -> Document:
518
  """Parse a job description from a file or URL into chunks.
519
 
 
3
  """
4
 
5
  # Standard library imports
6
+ <<<<<<< HEAD
7
  import asyncio
8
  import logging
9
  import os
 
11
  import tempfile
12
  from pathlib import Path
13
  from typing import Optional
14
+ =======
15
+ import logging
16
+ import os
17
+ import re
18
+ from pathlib import Path
19
+ >>>>>>> 64d45e6aae112e37b1f8aa7e8180959a0b9cac27
20
  from urllib.parse import urlparse
21
 
22
  # Third-party imports
23
  import dspy
24
+ <<<<<<< HEAD
25
  import httpx
26
  from huggingface_hub import hf_hub_download
27
+ =======
28
+ >>>>>>> 64d45e6aae112e37b1f8aa7e8180959a0b9cac27
29
  from langchain_community.document_loaders import PyPDFLoader, AsyncChromiumLoader
30
  from langchain_community.document_transformers import Html2TextTransformer
31
  from langchain_core.documents import Document
 
38
  from typing_extensions import Any
39
 
40
  # Local imports
41
+ <<<<<<< HEAD
42
  from .errors import (
43
  JobDescriptionParsingError,
44
  LLMProcessingError,
45
  ResumeDownloadError,
46
  URLExtractionError,
47
  )
48
+ =======
49
+ from .errors import JobDescriptionParsingError, LLMProcessingError, URLExtractionError
50
+ >>>>>>> 64d45e6aae112e37b1f8aa7e8180959a0b9cac27
51
 
52
  # Set up logging
53
  logger = logging.getLogger(__name__)
 
282
  return line.isupper() and len(line.split()) <= 5 and not re.search(r"\d", line)
283
 
284
 
285
+ <<<<<<< HEAD
286
  def _is_huggingface_hub_url(url: str) -> tuple[bool, Optional[str], Optional[str]]:
287
  """
288
  Detect if URL or string is a HuggingFace Hub reference and extract repo_id and filename.
 
439
  raise ResumeDownloadError(f"Could not save file from {url}: {e}") from e
440
 
441
 
442
+ =======
443
+ >>>>>>> 64d45e6aae112e37b1f8aa7e8180959a0b9cac27
444
  def parse_resume(file_path: str | Path) -> list[Document]:
445
  """
446
  Load a résumé from PDF or TXT file → list[Document] chunks
 
489
  return chunks
490
 
491
 
492
+ <<<<<<< HEAD
493
  async def get_resume(file_path_or_url: str | Path) -> list[Document]:
494
  """
495
  Load a résumé from a local file path or URL.
 
532
  )
533
 
534
 
535
+ =======
536
+ >>>>>>> 64d45e6aae112e37b1f8aa7e8180959a0b9cac27
537
  async def get_job_description(file_path_or_url: str) -> Document:
538
  """Parse a job description from a file or URL into chunks.
539