Sajil Awale commited on
Commit
629d435
·
1 Parent(s): 568c442

Initial commit: ResFit - AI Resume Tailor

Browse files
.dockerignore ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ __pycache__
2
+ *.pyc
3
+ *.pyo
4
+ *.pyd
5
+ .Python
6
+ env/
7
+ venv/
8
+ .venv
9
+ *.egg-info/
10
+ dist/
11
+ build/
12
+ .git
13
+ .gitignore
14
+ .DS_Store
15
+ .env.local
16
+ .cache
17
+ .resume_cache
18
+ output/*.pdf
19
+ output/*.tex
20
+ output/*.log
21
+ output/*.aux
22
+ .streamlit/
23
+ notebooks/
.gitignore ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ .env
2
+ __pycache__/
3
+ *.pyc
4
+ .DS_Store
5
+ output/
6
+ venv/
7
+ .vscode/
.streamlit/config.toml ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [client]
2
+ showErrorDetails = true
3
+ toolbarMode = "viewer"
4
+
5
+ [logger]
6
+ level = "info"
7
+
8
+ [theme]
9
+ primaryColor = "#FF6B6B"
10
+ backgroundColor = "#FFFFFF"
11
+ secondaryBackgroundColor = "#F0F2F6"
12
+ textColor = "#31333F"
13
+ font = "sans serif"
14
+
15
+ [server]
16
+ maxUploadSize = 200
17
+ maxMessageSize = 200
18
+ runOnSave = true
Dockerfile ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM python:3.12-slim
2
+
3
+ # Set working directory
4
+ WORKDIR /app
5
+
6
+ # Install system dependencies
7
+ RUN apt-get update && apt-get install -y \
8
+ texlive-latex-base \
9
+ texlive-latex-extra \
10
+ texlive-fonts-recommended \
11
+ texlive-fonts-extra \
12
+ texlive-xetex \
13
+ build-essential \
14
+ && rm -rf /var/lib/apt/lists/*
15
+
16
+ # Copy requirements
17
+ COPY requirements.txt .
18
+
19
+ # Install Python dependencies
20
+ RUN pip install --no-cache-dir -r requirements.txt
21
+
22
+ # Copy application code
23
+ COPY . .
24
+
25
+ # Expose Streamlit port
26
+ EXPOSE 8501
27
+
28
+ # Set environment variables
29
+ ENV PYTHONUNBUFFERED=1
30
+ ENV STREAMLIT_SERVER_PORT=8501
31
+ ENV STREAMLIT_SERVER_ADDRESS=0.0.0.0
32
+
33
+ # Run the app
34
+ CMD ["streamlit", "run", "app.py"]
README.md ADDED
@@ -0,0 +1,152 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ResFit: Resume Tailor AI 📄
2
+
3
+ ResFit is a powerful Streamlit application that leverages advanced Large Language Models (LLMs) to intelligently tailor your resume for specific job descriptions. By analyzing your existing resume and the target job requirements, it rewrites content to highlight relevant skills and experiences, generating a professionally formatted PDF using LaTeX.
4
+
5
+ **Why ResFit?**
6
+ The main motivation behind this project was to solve a common problem with existing resume tailoring tools: they often strip out or break hyperlinks. **ResFit is designed specifically to preserve all the links** (portfolio, LinkedIn, GitHub, etc.) that you've carefully added to your original resume.
7
+
8
+ ## 🚀 Features
9
+
10
+ - **Link Preservation**: Unlike many other tools, ResFit ensures all your hyperlinks remain intact in the final PDF.
11
+ - **Multi-Provider Support**: Choose your preferred AI model from **Google Gemini**, **Anthropic Claude**, or **OpenAI**.
12
+ - **Intelligent Tailoring**: Uses structured prompting to rewrite resume sections (Summary, Experience, Skills, Projects) specifically for the target role.
13
+ - **High-Performance**: Built with `asyncio` and parallel processing to tailor multiple sections concurrently for fast results.
14
+ - **Professional Output**: Generates high-quality, ATS-friendly PDFs using LaTeX templates.
15
+ - **Live Feedback**: Real-time logging interface shows you exactly what the AI is working on.
16
+ - **Dual Export**: Download both the final **PDF** and the raw **LaTeX (.tex)** source code for further manual editing.
17
+ - **Dockerized**: Ready-to-deploy container with all dependencies, including a full LaTeX environment.
18
+
19
+ ## 🛠️ Tech Stack
20
+
21
+ - **Frontend**: [Streamlit](https://streamlit.io/)
22
+ - **LLM Orchestration**: [Instructor](https://python.useinstructor.com/)
23
+ - **PDF Processing**: [PyMuPDF4LLM](https://pymupdf.readthedocs.io/en/latest/)
24
+ - **Document Generation**: LaTeX (via `pdflatex`) & Jinja2 templating
25
+ - **Concurrency**: Python `asyncio` & `Semaphores`
26
+
27
+ ## 🏗️ Architecture
28
+
29
+ ```mermaid
30
+ graph TD
31
+ User([User]) -->|Uploads PDF & Job Desc| UI[Streamlit UI]
32
+ UI -->|Selects Provider| LLM[LLM Client\n(Gemini/Claude/OpenAI)]
33
+ UI -->|Starts| Pipeline[ResumeTailorPipeline]
34
+
35
+ subgraph "Input Processing"
36
+ Pipeline -->|PyMuPDF4LLM| Parser[Resume Parser]
37
+ Pipeline -->|Scraper| JD[Job Description Processor]
38
+ end
39
+
40
+ subgraph "AI Orchestration (Async)"
41
+ Parser & JD -->|Context| Extractor[Data Extractor]
42
+ Extractor -->|Structured Data| Planner[Section Planner]
43
+ Planner -->|Distribute| Workers{Async Workers}
44
+
45
+ Workers -->|Tailor| S1[Summary]
46
+ Workers -->|Tailor| S2[Experience]
47
+ Workers -->|Tailor| S3[Skills]
48
+ Workers -->|Tailor| S4[Projects]
49
+ end
50
+
51
+ subgraph "Generation"
52
+ S1 & S2 & S3 & S4 -->|Merge| Context[Jinja2 Context]
53
+ Context -->|Render| Template[LaTeX Template]
54
+ Template -->|pdflatex| Compiler[PDF Compiler]
55
+ end
56
+
57
+ Compiler -->|Returns| Artifacts[(PDF & .tex Files)]
58
+ Artifacts -->|Download| UI
59
+ ```
60
+
61
+ ## 📋 Prerequisites
62
+
63
+ - **API Keys**: You will need an API key from at least one of the supported providers:
64
+ - [Google AI Studio](https://aistudio.google.com/) (Gemini)
65
+ - [Anthropic Console](https://console.anthropic.com/) (Claude)
66
+ - [OpenAI Platform](https://platform.openai.com/) (GPT)
67
+
68
+ ## 🐳 Quick Start with Docker (Recommended)
69
+
70
+ The easiest way to run the application is using Docker, as it handles the complex LaTeX dependencies automatically.
71
+
72
+ 1. **Clone the repository**
73
+ ```bash
74
+ git clone https://github.com/yourusername/resumer.git
75
+ cd resumer
76
+ ```
77
+
78
+ 2. **Build and Run**
79
+ ```bash
80
+ docker-compose up --build
81
+ ```
82
+
83
+ 3. **Access the App**
84
+ Open your browser and navigate to `http://localhost:8501`.
85
+
86
+ ## 💻 Local Installation
87
+
88
+ If you prefer to run it locally, you'll need Python 3.12+ and a LaTeX distribution installed on your system.
89
+
90
+ 1. **Install System Dependencies (LaTeX)**
91
+ - **macOS**:
92
+ ```bash
93
+ brew install --cask mactex-no-gui
94
+ ```
95
+ - **Ubuntu/Debian**:
96
+ ```bash
97
+ sudo apt-get update
98
+ sudo apt-get install -y texlive-latex-base texlive-fonts-recommended texlive-fonts-extra texlive-latex-extra
99
+ ```
100
+
101
+ 2. **Set up Python Environment**
102
+ ```bash
103
+ python -m venv venv
104
+ source venv/bin/activate # On Windows: venv\Scripts\activate
105
+ pip install -r requirements.txt
106
+ ```
107
+
108
+ 3. **Run the Application**
109
+ ```bash
110
+ streamlit run app.py
111
+ ```
112
+
113
+ ## 📖 Usage Guide
114
+
115
+ 1. **Select Provider**: Choose your AI provider (Gemini, Claude, or OpenAI) from the sidebar and select a specific model (e.g., `gemini-2.5-pro`, `claude-3-5-sonnet`).
116
+ 2. **Enter Credentials**: Paste your API Key.
117
+ 3. **Upload Resume**: Upload your current resume in PDF format.
118
+ 4. **Job Details**:
119
+ - Paste a URL to a job posting (the app will scrape it).
120
+ - OR paste the raw job description text directly.
121
+ 5. **Generate**: Click **"Tailor Resume"**.
122
+ 6. **Download**: Once complete, download your new tailored PDF or the LaTeX source file.
123
+
124
+ ## 📂 Project Structure
125
+
126
+ ```
127
+ resumer/
128
+ ├── app.py # Main Streamlit application entry point
129
+ ├── Dockerfile # Docker configuration
130
+ ├── docker-compose.yml # Docker Compose services
131
+ ├── requirements.txt # Python dependencies
132
+ ├── resumer/ # Core package
133
+ │ ├── __init__.py # Main pipeline logic (ResumeTailorPipeline)
134
+ │ ├── structures.py # Pydantic models for structured data
135
+ │ ├── prompts/ # LLM system prompts
136
+ │ ├── schemas/ # JSON schemas for extraction
137
+ │ ├── templates/ # Jinja2 LaTeX templates
138
+ │ └── utils/ # Helper functions (PDF parsing, LaTeX ops)
139
+ └── notebooks/ # Jupyter notebooks for testing components
140
+ ```
141
+
142
+ ## 🤝 Contributing
143
+
144
+ Contributions are welcome! Please feel free to submit a Pull Request.
145
+
146
+ ## 📄 License
147
+
148
+ MIT License
149
+
150
+ ## 🙏 Acknowledgements
151
+
152
+ This project is inspired by [ResumeFlow](https://github.com/Ztrimus/ResumeFlow) by Ztrimus.
app.py ADDED
@@ -0,0 +1,627 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import streamlit as st
2
+ import os
3
+ import tempfile
4
+ import json
5
+ from typing import Optional
6
+ from pathlib import Path
7
+ import asyncio
8
+
9
+ # API and instructor imports
10
+ import instructor
11
+ from google import genai
12
+ import anthropic
13
+ from openai import AsyncOpenAI
14
+
15
+ # Project imports
16
+ from resumer import ResumeTailorPipeline
17
+ from resumer.utils.latex_ops import json_to_latex_pdf
18
+
19
+ # ============================================
20
+ # PAGE CONFIGURATION
21
+ # ============================================
22
+
23
+ st.set_page_config(
24
+ page_title="Resume Tailor AI",
25
+ page_icon="📄",
26
+ layout="wide",
27
+ initial_sidebar_state="expanded"
28
+ )
29
+
30
+ st.markdown("""
31
+ <style>
32
+ .main { padding-top: 1rem; }
33
+ .stTabs [data-baseweb="tab-list"] button { font-size: 1.1em; }
34
+ </style>
35
+ """, unsafe_allow_html=True)
36
+
37
+ # ============================================
38
+ # MODEL CONFIGURATIONS
39
+ # ============================================
40
+
41
+ MODELS = {
42
+ "Gemini": [
43
+ "gemini-3-flash-preview",
44
+ "gemini-3-pro-image-preview",
45
+ "gemini-2.5-pro",
46
+ "gemini-2.5-flash",
47
+ "gemini-2.5-flash-lite"
48
+ ],
49
+ "Claude": [
50
+ "claude-sonnet-4-5",
51
+ "claude-haiku-4-5",
52
+ "claude-opus-4-5",
53
+ ],
54
+ "OpenAI": [
55
+ "gpt-5-mini",
56
+ "gpt-5-nano",
57
+ "gpt-4o-mini",
58
+ "gpt-4o",
59
+ ]
60
+ }
61
+
62
+ # ============================================
63
+ # SESSION STATE INITIALIZATION
64
+ # ============================================
65
+
66
+ def init_session_state():
67
+ defaults = {
68
+ "authenticated": False,
69
+ "api_provider": None,
70
+ "selected_model": None,
71
+ "api_key": None,
72
+ "resume_file": None,
73
+ "resume_path": None,
74
+ "resume_bytes": None,
75
+ "job_url": None,
76
+ "job_text": None,
77
+ "pipeline": None,
78
+ "tailored_resume_path": None,
79
+ "tailored_resume_pdf": None,
80
+ "tailored_resume_tex": None,
81
+ "tailored_resume_json": None,
82
+ "processing_log": [],
83
+ }
84
+ for key, value in defaults.items():
85
+ if key not in st.session_state:
86
+ st.session_state[key] = value
87
+
88
+ init_session_state()
89
+
90
+ # ============================================
91
+ # API CLIENT INITIALIZATION
92
+ # ============================================
93
+
94
+ def get_gemini_instructor_client(api_key: str):
95
+ """Initialize Instructor-patched Gemini client"""
96
+ native_client = genai.Client(api_key=api_key)
97
+ aclient = instructor.from_genai(
98
+ native_client,
99
+ mode=instructor.Mode.GENAI_TOOLS,
100
+ use_async=True
101
+ )
102
+ return aclient
103
+
104
+ def get_claude_instructor_client(api_key: str):
105
+ """Initialize Instructor-patched Claude client"""
106
+ native_client = anthropic.Anthropic(api_key=api_key)
107
+ aclient = instructor.from_anthropic(
108
+ native_client,
109
+ mode=instructor.Mode.TOOLS,
110
+ )
111
+ return aclient
112
+
113
+ def get_openai_instructor_client(api_key: str):
114
+ """Initialize Instructor-patched OpenAI client"""
115
+ native_client = AsyncOpenAI(api_key=api_key)
116
+ aclient = instructor.from_openai(
117
+ native_client,
118
+ mode=instructor.Mode.TOOLS,
119
+ )
120
+ return aclient
121
+
122
+ # ============================================
123
+ # UTILITY FUNCTIONS
124
+ # ============================================
125
+
126
+ def log_message(message: str):
127
+ """Add message to processing log"""
128
+ st.session_state.processing_log.append(message)
129
+
130
+ def save_uploaded_file(uploaded_file) -> str:
131
+ """Save uploaded file to temporary location and store bytes"""
132
+ # Read the file bytes first
133
+ file_bytes = uploaded_file.getvalue()
134
+ st.session_state.resume_bytes = file_bytes
135
+
136
+ # Save to temporary location
137
+ with tempfile.NamedTemporaryFile(delete=False, suffix=".pdf") as tmp:
138
+ tmp.write(file_bytes)
139
+ return tmp.name
140
+
141
+ async def run_pipeline(
142
+ aclient,
143
+ model_name: str,
144
+ resume_path: str,
145
+ job_url: Optional[str] = None,
146
+ job_text: Optional[str] = None,
147
+ progress_callback=None
148
+ ) -> Optional[tuple]:
149
+ """Run the ResumeTailorPipeline asynchronously"""
150
+ try:
151
+ if progress_callback:
152
+ progress_callback("📖 Initializing pipeline...")
153
+
154
+ with tempfile.TemporaryDirectory() as tmpdir:
155
+ pipeline = ResumeTailorPipeline(
156
+ aclient=aclient,
157
+ model_name=model_name,
158
+ resume_path=resume_path,
159
+ output_dir=tmpdir,
160
+ log_callback=progress_callback
161
+ )
162
+
163
+ # Store pipeline in session state
164
+ st.session_state.pipeline = pipeline
165
+
166
+ # Generate tailored resume asynchronously
167
+ result = await pipeline.generate_tailored_resume(
168
+ job_url=job_url,
169
+ job_site_content=job_text
170
+ )
171
+
172
+ # Result is now a tuple: (pdf_path, tex_path)
173
+ if isinstance(result, tuple):
174
+ tailored_pdf_path, tailored_tex_path = result
175
+ else:
176
+ tailored_pdf_path = result
177
+ tailored_tex_path = None
178
+
179
+ if progress_callback:
180
+ progress_callback("💾 Reading generated files...")
181
+
182
+ # Read the PDF and store in session state
183
+ if tailored_pdf_path and os.path.exists(tailored_pdf_path):
184
+ with open(tailored_pdf_path, "rb") as f:
185
+ st.session_state.tailored_resume_pdf = f.read()
186
+
187
+ # Read the TEX file and store in session state
188
+ if tailored_tex_path and os.path.exists(tailored_tex_path):
189
+ with open(tailored_tex_path, "r", encoding="utf-8") as f:
190
+ st.session_state.tailored_resume_tex = f.read()
191
+
192
+ # Also store the JSON details
193
+ st.session_state.tailored_resume_json = pipeline.resume_details
194
+
195
+ if progress_callback:
196
+ progress_callback("✅ Cleanup and finalization...")
197
+
198
+ pipeline.close_cache()
199
+ return (tailored_pdf_path, tailored_tex_path)
200
+
201
+ except Exception as e:
202
+ if progress_callback:
203
+ progress_callback(f"❌ Error: {str(e)}")
204
+ st.error(f"Pipeline Error: {str(e)}")
205
+ import traceback
206
+ st.error(traceback.format_exc())
207
+ return None
208
+
209
+ # ============================================
210
+ # MAIN APP UI
211
+ # ============================================
212
+
213
+ def main():
214
+ # Header
215
+ col1, col2 = st.columns([0.7, 0.3])
216
+ with col1:
217
+ st.title("📄 ResFit: Resume Tailor AI")
218
+ st.markdown("*Tailor your resume for any job using AI - **Preserving your Links!***")
219
+ st.info("💡 **Why ResFit?** Unlike other tools, this app preserves all hyperlinks in your resume (Portfolio, LinkedIn, GitHub, etc.) while tailoring the content.")
220
+
221
+ # ========== SIDEBAR: AUTHENTICATION ==========
222
+ with st.sidebar:
223
+ st.header("🔐 Authentication")
224
+
225
+ # Step 1: Select Provider
226
+ api_provider = st.radio(
227
+ "Step 1: Select API Provider",
228
+ ["Gemini", "Claude", "OpenAI"],
229
+ key="provider_select"
230
+ )
231
+ st.session_state.api_provider = api_provider
232
+
233
+ # Step 2: Select Model based on provider
234
+ available_models = MODELS[api_provider]
235
+ selected_model = st.selectbox(
236
+ "Step 2: Select Model",
237
+ available_models,
238
+ key=f"model_select_{api_provider}",
239
+ index=0
240
+ )
241
+ st.session_state.selected_model = selected_model
242
+
243
+ # Display model info
244
+ model_info = {
245
+ "Gemini": {
246
+ "gemini-3-flash-preview": "⚡ Fastest, latest (recommended)",
247
+ "gemini-3-pro-image-preview": "🖼️ Vision capabilities, advanced",
248
+ "gemini-2.5-pro": "💪 Most capable but slower",
249
+ "gemini-2.5-flash": "⚡ Fast & capable",
250
+ "gemini-2.5-flash-lite": "💨 Fastest, most affordable",
251
+ },
252
+ "Claude": {
253
+ "claude-sonnet-4-5": "⚡ Latest Sonnet (recommended)",
254
+ "claude-haiku-4-5": "💨 Fastest, most affordable",
255
+ "claude-opus-4-5": "💪 Most capable but slower",
256
+ },
257
+ "OpenAI": {
258
+ "gpt-5-mini": "⚡ Latest & fastest (recommended)",
259
+ "gpt-5-nano": "💨 Most affordable",
260
+ "gpt-4o-mini": "💪 Good balance",
261
+ "gpt-4o": "🦾 Most capable",
262
+ }
263
+ }
264
+
265
+ if selected_model in model_info.get(api_provider, {}):
266
+ st.caption(f"ℹ️ {model_info[api_provider][selected_model]}")
267
+
268
+ st.divider()
269
+
270
+ # Step 3: Enter API Key
271
+ api_key = st.text_input(
272
+ "Step 3: Enter API Key",
273
+ type="password",
274
+ key="api_key_input",
275
+ help=f"Your {api_provider} API key will not be stored"
276
+ )
277
+
278
+ st.divider()
279
+
280
+ # Authenticate button
281
+ if st.button("🔓 Authenticate", use_container_width=True, type="primary"):
282
+ if api_key:
283
+ try:
284
+ if api_provider == "Gemini":
285
+ aclient = get_gemini_instructor_client(api_key)
286
+ elif api_provider == "Claude":
287
+ aclient = get_claude_instructor_client(api_key)
288
+ else: # OpenAI
289
+ aclient = get_openai_instructor_client(api_key)
290
+
291
+ st.session_state.authenticated = True
292
+ st.session_state.api_key = api_key
293
+ st.session_state.aclient = aclient
294
+ st.success(f"✅ Authenticated!\n\n**Provider:** {api_provider}\n**Model:** {selected_model}")
295
+ except Exception as e:
296
+ st.error(f"❌ Authentication failed: {str(e)}")
297
+ else:
298
+ st.error("Please enter an API key")
299
+
300
+ st.divider()
301
+
302
+ # Display current auth status
303
+ if st.session_state.authenticated:
304
+ st.info(f"""
305
+ ✅ **Authenticated**
306
+
307
+ **Provider:** {st.session_state.api_provider}
308
+ **Model:** {st.session_state.selected_model}
309
+ """)
310
+
311
+ if st.button("🚪 Logout", use_container_width=True):
312
+ st.session_state.authenticated = False
313
+ st.session_state.api_key = None
314
+ st.session_state.api_provider = None
315
+ st.session_state.selected_model = None
316
+ st.session_state.aclient = None
317
+ st.rerun()
318
+
319
+ # ========== MAIN CONTENT ==========
320
+ if not st.session_state.authenticated:
321
+ st.warning("⚠️ Please authenticate with an API provider in the sidebar to continue")
322
+ st.info("""
323
+ **How to get an API key:**
324
+
325
+ 🔵 **Gemini**: Free API key at [https://aistudio.google.com/app/apikey](https://aistudio.google.com/app/apikey)
326
+
327
+ 🔴 **Claude**: API key at [https://console.anthropic.com/](https://console.anthropic.com/)
328
+
329
+ 🟢 **OpenAI**: API key at [https://platform.openai.com/api-keys](https://platform.openai.com/api-keys)
330
+ """)
331
+ return
332
+
333
+ # Main tabs
334
+ tab1, tab2, tab3 = st.tabs(["📤 Upload", "⚙️ Process", "📊 Results"])
335
+
336
+ # ========== TAB 1: UPLOAD ==========
337
+ with tab1:
338
+ st.header("Upload Your Materials")
339
+
340
+ col1, col2 = st.columns(2)
341
+
342
+ with col1:
343
+ st.subheader("📄 Resume PDF")
344
+ resume_file = st.file_uploader(
345
+ "Select your resume (PDF only)",
346
+ type=["pdf"],
347
+ key="resume_uploader"
348
+ )
349
+
350
+ if resume_file:
351
+ # Save to temporary location
352
+ resume_path = save_uploaded_file(resume_file)
353
+ st.session_state.resume_file = resume_file
354
+ st.session_state.resume_path = resume_path
355
+ st.success(f"✅ Uploaded: {resume_file.name}")
356
+ st.info(f"📊 Size: {resume_file.size / 1024:.1f} KB")
357
+
358
+ with col2:
359
+ st.subheader("🎯 Job Description")
360
+
361
+ job_source = st.radio(
362
+ "Provide job description via:",
363
+ ["📎 URL", "📝 Text"],
364
+ horizontal=False,
365
+ key="job_source_select"
366
+ )
367
+
368
+ if job_source == "📎 URL":
369
+ job_url = st.text_input(
370
+ "Paste job posting URL:",
371
+ placeholder="https://careers.example.com/job/123",
372
+ key="job_url_input"
373
+ )
374
+ if job_url:
375
+ st.session_state.job_url = job_url
376
+ st.session_state.job_text = None
377
+ st.success("✅ URL saved")
378
+
379
+ else: # Text
380
+ job_text = st.text_area(
381
+ "Paste job description text:",
382
+ placeholder="Paste the complete job description here...",
383
+ height=200,
384
+ key="job_text_input"
385
+ )
386
+ if job_text:
387
+ st.session_state.job_text = job_text
388
+ st.session_state.job_url = None
389
+ st.success("✅ Job description saved")
390
+
391
+ st.divider()
392
+
393
+ # Summary
394
+ st.subheader("📋 Upload Summary")
395
+ summary_col1, summary_col2 = st.columns(2)
396
+
397
+ with summary_col1:
398
+ if st.session_state.resume_path:
399
+ st.metric("Resume", "✅ Ready")
400
+ else:
401
+ st.metric("Resume", "⏳ Waiting")
402
+
403
+ with summary_col2:
404
+ if st.session_state.job_url or st.session_state.job_text:
405
+ st.metric("Job Description", "✅ Ready")
406
+ else:
407
+ st.metric("Job Description", "⏳ Waiting")
408
+
409
+ # ========== TAB 2: PROCESS ==========
410
+ with tab2:
411
+ st.header("Process Your Resume")
412
+
413
+ # Validation
414
+ if not st.session_state.resume_path:
415
+ st.error("❌ Please upload a resume in the Upload tab")
416
+ return
417
+
418
+ if not st.session_state.job_url and not st.session_state.job_text:
419
+ st.error("❌ Please provide a job description in the Upload tab")
420
+ return
421
+
422
+ st.info(f"""
423
+ **Processing Configuration:**
424
+ - **Provider:** {st.session_state.api_provider}
425
+ - **Model:** {st.session_state.selected_model}
426
+
427
+ **This process will:**
428
+ 1. Extract your resume structure asynchronously
429
+ 2. Extract job requirements asynchronously
430
+ 3. Tailor your resume to match the job
431
+ 4. Generate a PDF with the tailored version
432
+ """)
433
+
434
+ st.divider()
435
+
436
+ # Start processing button
437
+ if st.button("🚀 Generate Tailored Resume", use_container_width=True, type="primary", key="btn_start"):
438
+ # Clear processing log
439
+ st.session_state.processing_log = []
440
+
441
+ # Create a single placeholder for live log display
442
+ log_placeholder = st.empty()
443
+
444
+ def update_progress(message: str):
445
+ """Callback to update progress"""
446
+ # Add message to log
447
+ st.session_state.processing_log.append(message)
448
+
449
+ # Keep only the latest x logs
450
+ max_logs = 5
451
+ if len(st.session_state.processing_log) > max_logs:
452
+ latest_logs = st.session_state.processing_log[-max_logs:]
453
+ else:
454
+ latest_logs = st.session_state.processing_log
455
+
456
+ # Update the placeholder with latest logs (no duplicates)
457
+ with log_placeholder.container():
458
+ st.subheader(f"📝 Live Processing Log (Latest {max_logs})")
459
+ for log in latest_logs:
460
+ st.write(log)
461
+
462
+ try:
463
+ update_progress("🔐 Initializing async event loop...")
464
+
465
+ # Create and run async pipeline
466
+ loop = asyncio.new_event_loop()
467
+ asyncio.set_event_loop(loop)
468
+
469
+ update_progress("⏳ Starting resume processing...")
470
+
471
+ result = loop.run_until_complete(
472
+ run_pipeline(
473
+ aclient=st.session_state.aclient,
474
+ model_name=st.session_state.selected_model,
475
+ resume_path=st.session_state.resume_path,
476
+ job_url=st.session_state.job_url,
477
+ job_text=st.session_state.job_text,
478
+ progress_callback=update_progress
479
+ )
480
+ )
481
+
482
+ loop.close()
483
+
484
+ if result:
485
+ st.session_state.tailored_resume_path = result
486
+ st.divider()
487
+ st.success("✅ Resume tailored successfully!")
488
+ st.balloons()
489
+ else:
490
+ st.divider()
491
+ st.error("❌ Failed to generate tailored resume")
492
+
493
+ except Exception as e:
494
+ st.divider()
495
+ st.error(f"❌ Error: {str(e)}")
496
+
497
+ # Display full processing log history (after processing)
498
+ if st.session_state.processing_log:
499
+ st.divider()
500
+ st.subheader("📋 Full Processing Log")
501
+ with st.expander("View all logs", expanded=False):
502
+ for log in st.session_state.processing_log:
503
+ st.write(log)
504
+
505
+ # ========== TAB 3: RESULTS ==========
506
+ with tab3:
507
+ st.header("Results")
508
+
509
+ if not st.session_state.tailored_resume_path:
510
+ st.info("👈 Complete the processing in the Process tab to see results here")
511
+ return
512
+
513
+ st.success("✅ Your tailored resume is ready!")
514
+
515
+ # Download options
516
+ st.subheader("📥 Download Your Resumes")
517
+
518
+ col1, col2, col3 = st.columns(3)
519
+
520
+ with col1:
521
+ st.markdown("#### Original Resume")
522
+ if st.session_state.resume_bytes:
523
+ st.download_button(
524
+ label="📥 Download Original PDF",
525
+ data=st.session_state.resume_bytes,
526
+ file_name="original_resume.pdf",
527
+ mime="application/pdf",
528
+ use_container_width=True
529
+ )
530
+
531
+ with col2:
532
+ st.markdown("#### Tailored Resume (PDF)")
533
+ if "tailored_resume_pdf" in st.session_state:
534
+ st.download_button(
535
+ label="📥 Download Tailored PDF",
536
+ data=st.session_state.tailored_resume_pdf,
537
+ file_name="tailored_resume.pdf",
538
+ mime="application/pdf",
539
+ use_container_width=True,
540
+ type="primary"
541
+ )
542
+
543
+ with col3:
544
+ st.markdown("#### Tailored Resume (LaTeX)")
545
+ if "tailored_resume_tex" in st.session_state and st.session_state.tailored_resume_tex:
546
+ st.download_button(
547
+ label="📥 Download LaTeX (.tex)",
548
+ data=st.session_state.tailored_resume_tex.encode('utf-8'),
549
+ file_name="tailored_resume.tex",
550
+ mime="text/plain",
551
+ use_container_width=True
552
+ )
553
+ else:
554
+ st.info("LaTeX file not available")
555
+
556
+ st.divider()
557
+
558
+ # PDF Preview Section using iframe
559
+ st.subheader("📄 PDF Preview")
560
+
561
+ preview_col1, preview_col2 = st.columns(2)
562
+
563
+ with preview_col1:
564
+ with st.expander("👁️ View Original Resume PDF", expanded=True):
565
+ if st.session_state.resume_bytes:
566
+ import base64
567
+ pdf_b64 = base64.b64encode(st.session_state.resume_bytes).decode()
568
+ pdf_display = f'<iframe src="data:application/pdf;base64,{pdf_b64}" width="100%" height="600" type="application/pdf"></iframe>'
569
+ st.markdown(pdf_display, unsafe_allow_html=True)
570
+ else:
571
+ st.info("No original resume available")
572
+
573
+ with preview_col2:
574
+ with st.expander("✨ View Tailored Resume PDF", expanded=True):
575
+ if "tailored_resume_pdf" in st.session_state:
576
+ import base64
577
+ pdf_b64 = base64.b64encode(st.session_state.tailored_resume_pdf).decode()
578
+ pdf_display = f'<iframe src="data:application/pdf;base64,{pdf_b64}" width="100%" height="600" type="application/pdf"></iframe>'
579
+ st.markdown(pdf_display, unsafe_allow_html=True)
580
+ else:
581
+ st.info("No tailored resume available")
582
+
583
+ st.divider()
584
+
585
+ # LaTeX Source Code Viewer
586
+ st.subheader("📝 LaTeX Source Code")
587
+ if "tailored_resume_tex" in st.session_state and st.session_state.tailored_resume_tex:
588
+ with st.expander("👁️ View LaTeX Source Code", expanded=False):
589
+ st.code(st.session_state.tailored_resume_tex, language="latex")
590
+ else:
591
+ st.info("No LaTeX source available")
592
+
593
+ st.divider()
594
+
595
+ # Data comparison
596
+ st.subheader("📊 Resume Data Comparison")
597
+
598
+ if st.session_state.pipeline:
599
+ result_col1, result_col2 = st.columns(2)
600
+
601
+ with result_col1:
602
+ with st.expander("📖 Original Resume Data", expanded=False):
603
+ if st.session_state.pipeline.resume_info:
604
+ st.json(st.session_state.pipeline.resume_info.model_dump())
605
+ else:
606
+ st.info("No data available")
607
+
608
+ with result_col2:
609
+ with st.expander("✨ Tailored Resume Data", expanded=False):
610
+ if "tailored_resume_json" in st.session_state:
611
+ st.json(st.session_state.tailored_resume_json)
612
+ else:
613
+ st.info("No data available")
614
+
615
+ st.divider()
616
+
617
+ # Job info display
618
+ st.subheader("🎯 Job Requirements (Extracted)")
619
+ if st.session_state.pipeline and st.session_state.pipeline.job_info:
620
+ with st.expander("View job info", expanded=False):
621
+ if hasattr(st.session_state.pipeline.job_info, 'model_dump'):
622
+ st.json(st.session_state.pipeline.job_info.model_dump())
623
+ else:
624
+ st.json(st.session_state.pipeline.job_info)
625
+
626
+ if __name__ == "__main__":
627
+ main()
docker-compose.yml ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ version: '3.8'
2
+
3
+ services:
4
+ resume-tailor:
5
+ build: .
6
+ container_name: resume-tailor-ai
7
+ ports:
8
+ - "8501:8501"
9
+ environment:
10
+ - PYTHONUNBUFFERED=1
11
+ - STREAMLIT_SERVER_PORT=8501
12
+ - STREAMLIT_SERVER_ADDRESS=0.0.0.0
13
+ volumes:
14
+ - ./output:/app/output
15
+ - ./.env:/app/.env:ro
16
+ env_file:
17
+ - .env
18
+ stdin_open: true
19
+ tty: true
main.py ADDED
File without changes
notebooks/1_test_pdf_reader.ipynb ADDED
@@ -0,0 +1,380 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "code",
5
+ "execution_count": 2,
6
+ "metadata": {},
7
+ "outputs": [],
8
+ "source": [
9
+ "%reload_ext autoreload\n",
10
+ "%autoreload 2"
11
+ ]
12
+ },
13
+ {
14
+ "cell_type": "code",
15
+ "execution_count": 14,
16
+ "metadata": {},
17
+ "outputs": [
18
+ {
19
+ "name": "stdout",
20
+ "output_type": "stream",
21
+ "text": [
22
+ "# **Sajil Awale**\n",
23
+ "\n",
24
+ "Address: Huntsville, Alabama [Portfolio: www.sajilawale.com.np](https://www.sajilawale.com.np)\n",
25
+ "[Email :sajilawale@gmail.com](mailto:sajilawale@gmail.com) [Github: https://github.com/AwaleSajil](https://github.com/AwaleSajil)\n",
26
+ "Mobile : +1-256-417-3690 [Linkedin: https://www.linkedin.com/in/sajilawale/](https://www.linkedin.com/in/sajilawale/)\n",
27
+ "\n",
28
+ "\n",
29
+ "Summary\n",
30
+ "\n",
31
+ "\n",
32
+ "Machine Learning Engineer with 4+ years of experience specializing in NLP, Large Language Models, and Agentic AI\n",
33
+ "evaluation. Currently a CS Master’s student (4.0 GPA) at NASA-IMPACT, developing foundational scientific embedding\n",
34
+ "models and benchmarking autonomous research agents. Expert in deploying production-grade ML pipelines for healthcare\n",
35
+ "analytics and document automation.\n",
36
+ "\n",
37
+ "\n",
38
+ "Education\n",
39
+ "\n",
40
+ "\n",
41
+ "- **[[University of Alabama in Huntsville]](https://www.uah.edu/)** Huntsville, Alabama,USA\n",
42
+ "_[Master’s in Computer Science - Data Science (Concentration)](https://www.uah.edu/science/departments/computer-science/cs-graduate-programs)_ _2024 - 2026_\n",
43
+ "_Current GPA: 4.0/4.0_\n",
44
+ "\n",
45
+ "\n",
46
+ "- **[[Institute of Engineering, Pulchowk Campus, Tribhuvan University]](https://pcampus.edu.np/)** Lalitpur, Nepal\n",
47
+ "_[B.E. Electronics & Communication;](https://doece.pcampus.edu.np/index.php/bex-becie/)_ _2016 - 2021_\n",
48
+ "_[Full Scholarship; Aggregate: 79.45%; Rank: 8](https://photos.app.goo.gl/C4QsvJgsfx9jguQn7)_ _[th]_ _Position (Top 1% in University)_\n",
49
+ "\n",
50
+ "\n",
51
+ "Skills\n",
52
+ "\n",
53
+ "\n",
54
+ "- **Languages** : Python, C++, C, C#, MATLAB, SQL\n",
55
+ "\n",
56
+ "- **Machine Learning** : Pytorch, Transformers, Scikit-Learn, W&B, Spacy, Keras, OpenCV, Imbalanced-Learn, Hyperopt\n",
57
+ "\n",
58
+ "- **Data Analysis Packages** : Pandas, Dask, Numpy, Scipy, Matplotlib, Seaborn, Plotly, NetworkX\n",
59
+ "\n",
60
+ "- **Big Data Framework** : Pyspark, Hadoop\n",
61
+ "\n",
62
+ "- **Frontend** : HTML, CSS, Bootstrap, JavaScript, Angular, jQuery\n",
63
+ "\n",
64
+ "- **Backend** : FastAPI, Flask, Rest framework\n",
65
+ "\n",
66
+ "- **Cloud Computing** : Amazon EC2, EMR Hadoop, EMR Serverless, Redshift, S3\n",
67
+ "\n",
68
+ "\n",
69
+ "Experience\n",
70
+ "\n",
71
+ "\n",
72
+ "- **[Graduate Research Assistant for LLM team]**\n",
73
+ "\n",
74
+ "_[NASA-IMPACT @ UAH](https://www.earthdata.nasa.gov/about/impact)_ _August 2024 - Present_\n",
75
+ "\n",
76
+ "_◦_ Science Keyword Recommender: Built an extreme multi-label classifier for NASA CMR, scaling from 430 to 3,240\n",
77
+ "science keywords. Used Focal Loss and custom stratified sampling to improve F1 to 0.55, enhancing metadata\n",
78
+ "accuracy and dataset discoverability.\n",
79
+ "\n",
80
+ "\n",
81
+ "_◦_ Pre-training Science Embedding Model (Indus-SDE): Pretrained a RoBERTa-based model on 520K NASA\n",
82
+ "documents with extended 1024-token input and Weighted Keyword Based Dynamic Masking. Achieved 78.1% top-1\n",
83
+ "MLM accuracy, outperforming baselines on keyword tagging, astrophysics, and EJ tasks.\n",
84
+ "\n",
85
+ "\n",
86
+ "_◦_ Downstream Task Unification Framework: Developed a modular multi-task fine-tuning pipeline using Hugging Face\n",
87
+ "and W&B. Enabled plug-and-play config-based training/evaluation with automatic Excel reporting, streamlining\n",
88
+ "model comparison and boosting team productivity.\n",
89
+ "\n",
90
+ "\n",
91
+ "_◦_ Training Sentence Transformer (Indus-SDE-ST): Implemented DDP multi-GPU multi-stage training of a scientific\n",
92
+ "sentence transformer using text/code pairs. Early results show superior performance on science-domain information\n",
93
+ "retrieval benchmarks, pushing forward scientific search and discovery tools.\n",
94
+ "\n",
95
+ "\n",
96
+ "_◦_ Agentic AI Evaluation & Benchmarking – Conducted comparative evaluations of NASA’s Deep Literature Search\n",
97
+ "Agent against Gemini and OpenAI systems using LLM-as-judge metrics (contextual precision, recall, relevance,\n",
98
+ "faithfulness). Explored agent reliability, metric stability, and variance reduction strategies to improve reproducibility\n",
99
+ "and trust in autonomous scientific research agents.\n",
100
+ "\n",
101
+ "\n",
102
+ "- **[Machine Learning Engineer]**\n",
103
+ "\n",
104
+ "_[Cedar Gate Technologies](https://www.cedargate.com/)_ _July 2022 - July 2024_\n",
105
+ "\n",
106
+ "_◦_ Automated ETL field mapping by fine-tuning DistilBERT for multilabel classification to suggest\n",
107
+ "source-to-destination field mappings and achieved 0.95 recall and 0.7 IoU. Initiated full ETL automation by\n",
108
+ "fine-tuning Mistral-7B to autogenerate internal data transformation scripts.\n",
109
+ "\n",
110
+ "\n",
111
+ "_◦_ Analyzed local model explainability tools (permutation SHAP, Deep Explainer, LIME), identifying FastSHAP as the\n",
112
+ "optimal solution for a production diabetes model with extensive features based on speed and performance (87.2%\n",
113
+ "Inclusion AUC).\n",
114
+ "\n",
115
+ "\n",
116
+ "_◦_ Performed network analysis on healthcare providers to correlate patient-sharing patterns among physicians with\n",
117
+ "medical costs for patients with chronic conditions like Chronic Heart Failure and Diabetes, revealing key cost\n",
118
+ "drivers.\n",
119
+ "\n",
120
+ "\n",
121
+ "_◦_ Optimized the segmentation of frequent ER visitors by systematically evaluating various scaling, feature extraction,\n",
122
+ "and clustering methods. Utilized logistic regression coefficients for rapid cluster discrimination, ultimately\n",
123
+ "identifying K-Means (6 clusters) with an auto-encoder as the most effective model, based on cluster metrics for\n",
124
+ "overlap, quality, and cardinality.\n",
125
+ "\n",
126
+ "\n",
127
+ "_◦_ Developed a LightGBM model to predict healthcare cost-risk (MARA scores), and the likelihood of it increasing or\n",
128
+ "decreasing; achieving an R2 of 0.74 and MCC of 0.45, enabling proactive care management.\n",
129
+ "\n",
130
+ "\n",
131
+ "_◦_ Built a Gradient Boosted model to predict patient compliance with preventive care visits next year, achieving an\n",
132
+ "MCC score over 0.75 to support targeted outreach initiatives.\n",
133
+ "\n",
134
+ "\n",
135
+ "- **[[Machine Learning Engineer]](https://photos.app.goo.gl/vVbF4bHvcjuexqnL6)**\n",
136
+ "\n",
137
+ "_[Docsumo](https://www.docsumo.com/)_ _March 2022 - June 2022_\n",
138
+ "\n",
139
+ "_◦_ Benchmarked spaCy v2 vs. v3 Named Entity Recognition (NER) pipelines for information extraction from\n",
140
+ "OCR-scanned documents based on their performance, speed, and size, providing key data for a strategic upgrade\n",
141
+ "decision.\n",
142
+ "\n",
143
+ "\n",
144
+ "_◦_ Evaluated multiple document reading order detection techniques (e.g., DBSCAN, recursive XY-cut, layout reader,\n",
145
+ "line-based block separation, docstrum) to enhance NER performance on complex layouts, measuring success with\n",
146
+ "ROUGE-L and BLEU scores.\n",
147
+ "\n",
148
+ "- **[[Associate Data Engineer]](https://photos.app.goo.gl/zjsUJiMr6ZmVhfqz9)**\n",
149
+ "\n",
150
+ "_[Deerwalk](https://www.cedargate.com/)_ _May 2021 - Feb 2022_\n",
151
+ "\n",
152
+ "_◦_ Onboard new vendors (ETL processes on US healthcare data), ensure data integrity, analyse bugs which was\n",
153
+ "triggered during data processing or client request and promptly resolve critical production issues\n",
154
+ "\n",
155
+ "\n",
156
+ "Academic and Personal Projects\n",
157
+ "\n",
158
+ "\n",
159
+ "- **[[Funny Project]](https://github.com/AwaleSajil/FunnyProject)**\n",
160
+ "\n",
161
+ "_BigData Project_ _2024_\n",
162
+ "\n",
163
+ "_◦_ Engineered a two-stage NLP pipeline to classify 570,000+ jokes by humor, offensiveness, and sentiment, achieving a\n",
164
+ "0.86 weighted F1-score by fine-tuning a BERT model on a 55k-sample dataset labeled by local LLMs (Mistral,\n",
165
+ "Gemma3). Dockerized Inference Pipeline\n",
166
+ "\n",
167
+ "\n",
168
+ "- **[[Image Auto Alignment]](https://docs.google.com/presentation/d/1n_Hv8l0_MGLNv62PnMt76vsGcCsRyS7TZuXABBb1jmc/edit?usp=sharing)**\n",
169
+ "\n",
170
+ "_Weekend Project_ _2024_\n",
171
+ "\n",
172
+ "_◦_ Built two solutions to auto-correct rotated images: Rule-based Flask API for documents (e.g., invoices) using line\n",
173
+ "detection and text-weight heuristics. ML-based model with MobileNetV2 for general images; framed as a regression\n",
174
+ "task and achieved 2.6° MAE on self-supervised Flickr dataset.\n",
175
+ "\n",
176
+ "\n",
177
+ "- **[[Real Time Visual Localisation and Mapping of Mobile Robot in Dynamic Environment]](https://docs.google.com/presentation/d/1wIIM4iFpwIpxakO2Fubz-7BHouQZl0Tf2z-sj45jA7Y/edit?usp=sharing)**\n",
178
+ "\n",
179
+ "_College Major Project_ _2019 - 2020_\n",
180
+ "\n",
181
+ "_◦_ A mobile robot capable of real-time Visual SLAM (Simultaneous Localization And Mapping) in a dynamic\n",
182
+ "environment by reconstructing the entire 3D scene from 2D images captured by its camera. To address dynamic\n",
183
+ "element, visual landmarks in dynamic areas are masked using ICNet, a semantic segmentation model fine-tuned to\n",
184
+ "identify humans, the most prevalent dynamic objects.\n",
185
+ "\n",
186
+ "\n",
187
+ "- **[[Precision Livestock Farming — Improving Productivity of Broiler Chicken farm with technology]](https://docs.google.com/presentation/d/1fiFOfY1UPjH535EpBikLpd_-gw37Te88lby8h5pYT5U/edit?usp=sharing)**\n",
188
+ "\n",
189
+ "_[LOCUS 2019 Project](https://locus.pcampus.edu.np/)_ _2019_\n",
190
+ "\n",
191
+ "_◦_ Project designed to monitor broiler chickens, utilizing YOLO for chicken detection and SORT (Simple Online\n",
192
+ "Real-time Tracker) for mobility tracking. Eating behavior was estimated using a feeder microphone, while\n",
193
+ "maintaining optimal environmental conditions, including temperature and humidity.\n",
194
+ "\n",
195
+ "\n",
196
+ "- **[[Vehicle Traffic Analysis and Management]](https://drive.google.com/file/d/1N8H8YIwp3FTOcbLwzW4fNXuKU48Yo1D2/view?usp=sharing)**\n",
197
+ "\n",
198
+ "_College Minor Project_ _2019_\n",
199
+ "\n",
200
+ "_◦_ Traffic flow at various road junctions was assessed by vehicle counting with the help of YOLO and SORT from\n",
201
+ "diverse originating sources. The Webster algorithm was used to determine the optimal timing for traffic signals.\n",
202
+ "\n",
203
+ "\n",
204
+ "- **[[Sajilomart]](https://drive.google.com/file/d/1aE20vZpEihrmu-ZgcHFz5tMHzga30_T-/view?usp=sharing)**\n",
205
+ "\n",
206
+ "_[Everest Hackathon](https://www.facebook.com/hackateverest/)_ _2019_\n",
207
+ "\n",
208
+ "_◦_ Designed a prototype for effortless shopping: a seamless ”grab and go” experience eliminating lines and checkouts,\n",
209
+ "with automatic transaction handling.\n",
210
+ "\n",
211
+ "\n",
212
+ "- **[[Blind Eye — Assistive Technology for Blind People]](https://photos.app.goo.gl/a4NrQTM9jWsezDLq7)**\n",
213
+ "\n",
214
+ "_[Assistive Technology Hackathon](https://ictframe.com/announcing-at-hackathon-winners-and-conclusion/)_ _2018_\n",
215
+ "\n",
216
+ "_◦_ Designed a headset as a solution to enhance mobility for the visually impaired, aiding navigation and obstacle\n",
217
+ "avoidance.\n",
218
+ "\n",
219
+ "\n",
220
+ "Exchange Program and Fellowship\n",
221
+ "\n",
222
+ "\n",
223
+ "- **[[Sakura Science Exchange Program]](https://photos.app.goo.gl/P8gFatguLP5F1kmM9)**\n",
224
+ "\n",
225
+ "_[Japan Science and Technology Agency](https://www.jst.go.jp/EN/)_ _16_ _[th]_ _- 23_ _[th]_ _Dec, 2019_\n",
226
+ "\n",
227
+ "_◦_ Selected as one of the top 3 students for a program at Japan’s National Institute of Technology, Kisarazu. We\n",
228
+ "presented our poster, visited industries, and exchanged ideas and solutions with international peers.\n",
229
+ "\n",
230
+ "\n",
231
+ "_◦_ Participated in sessions covering Japan’s cutting-edge technologies, including Artificial Intelligence and the Internet\n",
232
+ "of Things (IoT).\n",
233
+ "\n",
234
+ "\n",
235
+ "- **[[First Nepal Winter School in AI]](https://photos.app.goo.gl/kBatEMnLzQqRJKU37)**\n",
236
+ "\n",
237
+ "_[Nepal Applied Mathematics and Informatics Institute for Research (NAAMII)](https://www.naamii.org.np/)_ _20_ _[th]_ _- 30_ _[th]_ _Dec, 2018_\n",
238
+ "\n",
239
+ "_◦_ Learnt about probability and statistics, linear algebra, AI ethics, and Deep Learning through esteemed professors\n",
240
+ "and guest speakers.\n",
241
+ "\n",
242
+ "\n",
243
+ "_◦_ Finished hands-on lab assignments in computer vision and natural language processing (NLP).\n",
244
+ "\n",
245
+ "\n",
246
+ "Honors and Awards\n",
247
+ "\n",
248
+ "\n",
249
+ "- **[[Fonepay Student Ambassador]](https://photos.app.goo.gl/3NpBXhK3KbEYw87j6)**\n",
250
+ "\n",
251
+ "_[Fonepay](https://fonepay.com/)_ _2020_\n",
252
+ "\n",
253
+ "_◦_ Selected as one of the top 10 out of 100 competitive teams responsible for driving initiatives to promote and\n",
254
+ "facilitate the growth of mobile payments.\n",
255
+ "\n",
256
+ "\n",
257
+ "- **[[Best Thematic Hardware Project]](https://photos.app.goo.gl/KDFNt1KtSUU9xkkXA)**\n",
258
+ "\n",
259
+ "_[LOCUS](https://locus.pcampus.edu.np/)_ _2019_\n",
260
+ "\n",
261
+ "\n",
262
+ "_◦_ We were honored to receive the award for ’Precision Livestock Farming’ during the 16 [th] edition of the National\n",
263
+ "Technological Festival held by LOCUS, Pulchowk Campus\n",
264
+ "\n",
265
+ "\n",
266
+ "- **[[Institute of Engineering Scholarship for BE]](https://media.edusanjal.com/redactor/Download%20TU%20IOE%20Entrance%20Examination%20Result.pdf)**\n",
267
+ "\n",
268
+ "_[Tribhuvan University, IOE](https://tu.edu.np/pages/institute-of-engineering-4)_ _2017_\n",
269
+ "\n",
270
+ "\n",
271
+ "_◦_ Received full scholarship to study engineering in the most reputed engineering college of Nepal for securing 58 [th]\n",
272
+ "\n",
273
+ "rank in competitive entrance examination given by more than ten thousand students.\n",
274
+ "\n",
275
+ "\n",
276
+ "Volunteering and Teaching experience\n",
277
+ "\n",
278
+ "\n",
279
+ "- **[Training on ML Applications]**\n",
280
+ "_Mentors Club, Cedargate_ _2023_\n",
281
+ "\n",
282
+ "_◦_ Conducted comprehensive session for all Cedargate employees in Nepal covering a variety of ML algorithms,\n",
283
+ "providing insights into our operational procedures and discussed both ongoing and completed projects that are\n",
284
+ "currently contributing to our production\n",
285
+ "\n",
286
+ "\n",
287
+ "- **[[RoboPOP and Dronacharya Competitions]](https://youtube.com/playlist?list=PLPFSwgon02wfx5U6bA2TUqVNZUvXrFQBv&si=amDXuMVfZ4LstZgu)**\n",
288
+ "\n",
289
+ "_[LOCUS](https://www.facebook.com/locus.ioe)_ _2020_\n",
290
+ "\n",
291
+ "_◦_ Voluntered to develop 3D animations that would explain the regulations introduced for RoboPOP, an exciting new\n",
292
+ "robotic balloon-popping event incorporated into LOCUS 2020. Additionally, I extended my commitment to creating\n",
293
+ "animations for Dronacharya, a cherished and popular drone racing competition\n",
294
+ "\n",
295
+ "\n",
296
+ "- **[[Hardware Fellowship]](https://photos.app.goo.gl/pM3E4DLs12xjgP7FA)**\n",
297
+ "\n",
298
+ "_[LOCUS](https://www.facebook.com/locus.ioe/posts/hardware-fellowshiphope-you-guys-are-learninglocusnepal2019/825855697584931/)_ _2020_\n",
299
+ "\n",
300
+ "_◦_ Instructed nearly 100 students, ranging from freshmen to sophomores, in the domains of Arduino programming and\n",
301
+ "electronic hardware design\n",
302
+ "\n",
303
+ "\n",
304
+ "_◦_ Mentored a team of junior-year students as they embarked on their project, creating a 2D CNC plotter designed for\n",
305
+ "writing and drawing which was showcased at the 17th Technological Festival.\n",
306
+ "\n",
307
+ "\n",
308
+ "References\n",
309
+ "\n",
310
+ "\n",
311
+ "_•_ Tathagata Mukharjee, Professor at University of Alabama in Huntsville: tm0130@uh.edu\n",
312
+ "\n",
313
+ "\n",
314
+ "_•_ Stacey Finn, Director of Data Science and Analytics at CedarGate: safinn5@gmail.com\n",
315
+ "\n",
316
+ "\n",
317
+ "Online Certifications\n",
318
+ "\n",
319
+ "\n",
320
+ "_•_ [Deep Learning Specialization by DeepLearning.AI on Coursera.](https://www.coursera.org/account/accomplishments/specialization/CMV425VZYK92?utm_source=link&utm_medium=certificate&utm_content=cert_image&utm_campaign=sharing_cta&utm_product=s12n)\n",
321
+ "\n",
322
+ "\n",
323
+ "_•_ [Applied Deep Learning Capstone Project by ibm on edx.](https://courses.edx.org/certificates/6154999d04c34c329bd68f3fcbd7e0a2)\n",
324
+ "\n",
325
+ "\n",
326
+ "_•_ [Specialized Models: Time Series and Survival Analysis on Coursera](https://www.coursera.org/account/accomplishments/certificate/5U3ZQ9767CRW)\n",
327
+ "\n",
328
+ "\n",
329
+ "_•_ [Python Classes and Inheritance by University of Michigan on Coursera.](https://www.coursera.org/account/accomplishments/verify/8KPF3UZYT7VC)\n",
330
+ "\n",
331
+ "\n",
332
+ "_•_ [Python (Basic) by Hackerrank](https://www.hackerrank.com/certificates/d41a0ed647da)\n",
333
+ "\n",
334
+ "\n",
335
+ "\n"
336
+ ]
337
+ }
338
+ ],
339
+ "source": [
340
+ "import pymupdf4llm\n",
341
+ "\n",
342
+ "md_text = pymupdf4llm.to_markdown(\"/Users/sawale/Documents/learning/resumer/resumer/demo/Sajil_Awale_CV_2025.pdf\")\n",
343
+ "\n",
344
+ "\n",
345
+ "print(md_text)\n",
346
+ "# # now work with the markdown text, e.g. store as a UTF8-encoded file\n",
347
+ "# import pathlib\n",
348
+ "# pathlib.Path(\"output.md\").write_bytes(md_text.encode())"
349
+ ]
350
+ },
351
+ {
352
+ "cell_type": "code",
353
+ "execution_count": null,
354
+ "metadata": {},
355
+ "outputs": [],
356
+ "source": []
357
+ }
358
+ ],
359
+ "metadata": {
360
+ "kernelspec": {
361
+ "display_name": ".venv",
362
+ "language": "python",
363
+ "name": "python3"
364
+ },
365
+ "language_info": {
366
+ "codemirror_mode": {
367
+ "name": "ipython",
368
+ "version": 3
369
+ },
370
+ "file_extension": ".py",
371
+ "mimetype": "text/x-python",
372
+ "name": "python",
373
+ "nbconvert_exporter": "python",
374
+ "pygments_lexer": "ipython3",
375
+ "version": "3.12.7"
376
+ }
377
+ },
378
+ "nbformat": 4,
379
+ "nbformat_minor": 2
380
+ }
notebooks/2_test_instructor.ipynb ADDED
@@ -0,0 +1,177 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "code",
5
+ "execution_count": 1,
6
+ "metadata": {},
7
+ "outputs": [],
8
+ "source": [
9
+ "%reload_ext autoreload\n",
10
+ "%autoreload 2"
11
+ ]
12
+ },
13
+ {
14
+ "cell_type": "code",
15
+ "execution_count": 2,
16
+ "metadata": {},
17
+ "outputs": [
18
+ {
19
+ "name": "stdout",
20
+ "output_type": "stream",
21
+ "text": [
22
+ "name='Alice' age=25\n"
23
+ ]
24
+ }
25
+ ],
26
+ "source": [
27
+ "import os\n",
28
+ "import instructor\n",
29
+ "from pydantic import BaseModel\n",
30
+ "from google import genai\n",
31
+ "from dotenv import load_dotenv\n",
32
+ "\n",
33
+ "# Load the variables from .env into the environment\n",
34
+ "load_dotenv()\n",
35
+ "\n",
36
+ "# Define your structure\n",
37
+ "class UserInfo(BaseModel):\n",
38
+ " name: str\n",
39
+ " age: int\n",
40
+ "\n",
41
+ "# Initialize client\n",
42
+ "# GenAI client will now find the key automatically from the environment\n",
43
+ "native_client = genai.Client(api_key=os.environ.get(\"GOOGLE_API_KEY\"))\n",
44
+ "client = instructor.from_genai(native_client)\n",
45
+ "\n",
46
+ "# Execute extraction\n",
47
+ "user = client.chat.completions.create(\n",
48
+ " model=\"gemini-2.5-flash\",\n",
49
+ " response_model=UserInfo,\n",
50
+ " messages=[\n",
51
+ " {\"role\": \"user\", \"content\": \"Extract: Alice is 25.\"}\n",
52
+ " ],\n",
53
+ ")\n",
54
+ "\n",
55
+ "print(user)"
56
+ ]
57
+ },
58
+ {
59
+ "cell_type": "code",
60
+ "execution_count": 2,
61
+ "metadata": {},
62
+ "outputs": [
63
+ {
64
+ "name": "stdout",
65
+ "output_type": "stream",
66
+ "text": [
67
+ "name='Daily Grind' location='Seattle' revenue_estimate=50000\n"
68
+ ]
69
+ }
70
+ ],
71
+ "source": [
72
+ "import os\n",
73
+ "import instructor\n",
74
+ "from pydantic import BaseModel\n",
75
+ "from google import genai\n",
76
+ "from dotenv import load_dotenv\n",
77
+ "\n",
78
+ "load_dotenv()\n",
79
+ "\n",
80
+ "# 1. Define your Pydantic model\n",
81
+ "class BusinessInfo(BaseModel):\n",
82
+ " name: str\n",
83
+ " location: str\n",
84
+ " revenue_estimate: int\n",
85
+ "\n",
86
+ "# 2. Initialize the GenAI Client for Vertex AI\n",
87
+ "native_client = genai.Client(\n",
88
+ " vertexai=True,\n",
89
+ " project=os.environ.get(\"GOOGLE_CLOUD_PROJECT\"),\n",
90
+ " location=os.environ.get(\"GOOGLE_CLOUD_LOCATION\")\n",
91
+ ")\n",
92
+ "\n",
93
+ "# 3. Patch the client with Instructor\n",
94
+ "client = instructor.from_genai(native_client)\n",
95
+ "\n",
96
+ "# 4. Create structured output\n",
97
+ "business = client.chat.completions.create(\n",
98
+ " model=\"gemini-2.0-flash\", \n",
99
+ " response_model=BusinessInfo,\n",
100
+ " messages=[\n",
101
+ " {\"role\": \"user\", \"content\": \"The coffee shop 'Daily Grind' in Seattle\"}\n",
102
+ " ],\n",
103
+ ")\n",
104
+ "\n",
105
+ "print(business)"
106
+ ]
107
+ },
108
+ {
109
+ "cell_type": "code",
110
+ "execution_count": 4,
111
+ "metadata": {},
112
+ "outputs": [
113
+ {
114
+ "data": {
115
+ "text/plain": [
116
+ "instructor.core.client.Instructor"
117
+ ]
118
+ },
119
+ "execution_count": 4,
120
+ "metadata": {},
121
+ "output_type": "execute_result"
122
+ }
123
+ ],
124
+ "source": [
125
+ "type(client)"
126
+ ]
127
+ },
128
+ {
129
+ "cell_type": "code",
130
+ "execution_count": 5,
131
+ "metadata": {},
132
+ "outputs": [
133
+ {
134
+ "data": {
135
+ "text/plain": [
136
+ "instructor.core.client.AsyncInstructor"
137
+ ]
138
+ },
139
+ "execution_count": 5,
140
+ "metadata": {},
141
+ "output_type": "execute_result"
142
+ }
143
+ ],
144
+ "source": [
145
+ "instructor.AsyncInstructor"
146
+ ]
147
+ },
148
+ {
149
+ "cell_type": "code",
150
+ "execution_count": null,
151
+ "metadata": {},
152
+ "outputs": [],
153
+ "source": []
154
+ }
155
+ ],
156
+ "metadata": {
157
+ "kernelspec": {
158
+ "display_name": ".venv",
159
+ "language": "python",
160
+ "name": "python3"
161
+ },
162
+ "language_info": {
163
+ "codemirror_mode": {
164
+ "name": "ipython",
165
+ "version": 3
166
+ },
167
+ "file_extension": ".py",
168
+ "mimetype": "text/x-python",
169
+ "name": "python",
170
+ "nbconvert_exporter": "python",
171
+ "pygments_lexer": "ipython3",
172
+ "version": "3.12.7"
173
+ }
174
+ },
175
+ "nbformat": 4,
176
+ "nbformat_minor": 2
177
+ }
notebooks/3_test_resume_extractor.ipynb ADDED
@@ -0,0 +1,1201 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "code",
5
+ "execution_count": 1,
6
+ "metadata": {},
7
+ "outputs": [],
8
+ "source": [
9
+ "%reload_ext autoreload\n",
10
+ "%autoreload 2"
11
+ ]
12
+ },
13
+ {
14
+ "cell_type": "code",
15
+ "execution_count": null,
16
+ "metadata": {},
17
+ "outputs": [],
18
+ "source": []
19
+ },
20
+ {
21
+ "cell_type": "code",
22
+ "execution_count": 2,
23
+ "metadata": {},
24
+ "outputs": [
25
+ {
26
+ "name": "stdout",
27
+ "output_type": "stream",
28
+ "text": [
29
+ "Added to path: /Users/sawale/Documents/learning/resumer\n"
30
+ ]
31
+ }
32
+ ],
33
+ "source": [
34
+ "import sys\n",
35
+ "import os\n",
36
+ "from pathlib import Path\n",
37
+ "\n",
38
+ "# Use Path.cwd() instead of __file__ in Notebooks\n",
39
+ "parent_dir = str(Path.cwd().parent)\n",
40
+ "\n",
41
+ "if parent_dir not in sys.path:\n",
42
+ " sys.path.append(parent_dir)\n",
43
+ "\n",
44
+ "print(f\"Added to path: {parent_dir}\")"
45
+ ]
46
+ },
47
+ {
48
+ "cell_type": "code",
49
+ "execution_count": 3,
50
+ "metadata": {},
51
+ "outputs": [],
52
+ "source": [
53
+ "import os\n",
54
+ "import instructor\n",
55
+ "from pydantic import BaseModel\n",
56
+ "from google import genai\n",
57
+ "from dotenv import load_dotenv\n",
58
+ "\n",
59
+ "load_dotenv()\n",
60
+ "\n",
61
+ "\n",
62
+ "# 2. Initialize the GenAI Client for Vertex AI\n",
63
+ "native_client = genai.Client(\n",
64
+ " vertexai=True,\n",
65
+ " project=os.environ.get(\"GOOGLE_CLOUD_PROJECT\"),\n",
66
+ " location=os.environ.get(\"GOOGLE_CLOUD_LOCATION\")\n",
67
+ ")\n",
68
+ "\n",
69
+ "# 3. Patch the client with Instructor\n",
70
+ "aclient = instructor.from_genai(native_client, \n",
71
+ "mode=instructor.Mode.GENAI_STRUCTURED_OUTPUTS, \n",
72
+ "use_async=True)\n"
73
+ ]
74
+ },
75
+ {
76
+ "cell_type": "code",
77
+ "execution_count": 4,
78
+ "metadata": {},
79
+ "outputs": [
80
+ {
81
+ "data": {
82
+ "text/plain": [
83
+ "<instructor.core.client.AsyncInstructor at 0x12deebdd0>"
84
+ ]
85
+ },
86
+ "execution_count": 4,
87
+ "metadata": {},
88
+ "output_type": "execute_result"
89
+ }
90
+ ],
91
+ "source": [
92
+ "aclient"
93
+ ]
94
+ },
95
+ {
96
+ "cell_type": "code",
97
+ "execution_count": 5,
98
+ "metadata": {},
99
+ "outputs": [
100
+ {
101
+ "name": "stdout",
102
+ "output_type": "stream",
103
+ "text": [
104
+ "Consider using the pymupdf_layout package for a greatly improved page layout analysis.\n"
105
+ ]
106
+ }
107
+ ],
108
+ "source": [
109
+ "from resumer import ResumeTailorPipeline"
110
+ ]
111
+ },
112
+ {
113
+ "cell_type": "code",
114
+ "execution_count": 6,
115
+ "metadata": {},
116
+ "outputs": [],
117
+ "source": [
118
+ "pp = ResumeTailorPipeline(\n",
119
+ " aclient = aclient, \n",
120
+ " model_name = os.environ.get(\"GOOGLE_GEMINI_MODEL_NAME\"),\n",
121
+ " resume_path = \"/Users/sawale/Documents/learning/resumer/resumer/demo/Sajil_Awale_CV_2025.pdf\", \n",
122
+ " output_dir= \"./output/\"\n",
123
+ ")\n"
124
+ ]
125
+ },
126
+ {
127
+ "cell_type": "code",
128
+ "execution_count": 7,
129
+ "metadata": {},
130
+ "outputs": [
131
+ {
132
+ "name": "stdout",
133
+ "output_type": "stream",
134
+ "text": [
135
+ "--- Scraping job details from: https://lifeattiktok.com/search/7527589557336869138 ---\n",
136
+ "--- Extracting job info via LLM ---\n",
137
+ "--- Loading resume info from disk cache ---\n",
138
+ "--- Successfully extracted both Resume and Job data ---\n",
139
+ "--- Adding section: summary ---\n",
140
+ "--- Adding section: work_experience ---\n",
141
+ "--- Adding section: education ---\n",
142
+ "--- Adding section: skill_sections ---\n",
143
+ "--- Adding section: projects ---\n",
144
+ "--- Adding section: certifications ---\n",
145
+ "--- Adding section: achievements ---\n",
146
+ "--- Adding section: research_works ---\n",
147
+ "--- Adding section: Exchange Program and Fellowship ---\n",
148
+ "--- Adding section: Volunteering and Teaching experience ---\n",
149
+ "--- Adding section: References ---\n",
150
+ "Error in json_to_latex_pdf: 'NoneType' object is not iterable\n"
151
+ ]
152
+ }
153
+ ],
154
+ "source": [
155
+ "await pp.generate_tailored_resume(job_url=\"https://lifeattiktok.com/search/7527589557336869138\")"
156
+ ]
157
+ },
158
+ {
159
+ "cell_type": "code",
160
+ "execution_count": 8,
161
+ "metadata": {},
162
+ "outputs": [
163
+ {
164
+ "name": "stdout",
165
+ "output_type": "stream",
166
+ "text": [
167
+ "Error in json_to_latex_pdf: 'NoneType' object is not iterable\n"
168
+ ]
169
+ }
170
+ ],
171
+ "source": [
172
+ "from resumer.utils.latex_ops import json_to_latex_pdf\n",
173
+ "x = json_to_latex_pdf(pp.resume_details, os.path.join(pp.output_dir, \"tailored_resume.pdf\"))"
174
+ ]
175
+ },
176
+ {
177
+ "cell_type": "code",
178
+ "execution_count": 24,
179
+ "metadata": {},
180
+ "outputs": [
181
+ {
182
+ "data": {
183
+ "text/plain": [
184
+ "{'personal_info': {'name': {'segments': [{'type': 'text',\n",
185
+ " 'content': 'Sajil Awale'}]},\n",
186
+ " 'location': {'segments': [{'type': 'text',\n",
187
+ " 'content': 'Huntsville, Alabama'}]},\n",
188
+ " 'phone': {'segments': [{'type': 'text', 'content': '+1-256-417-3690'}]},\n",
189
+ " 'email': {'segments': [{'type': 'link',\n",
190
+ " 'content': 'sajilawale@gmail.com',\n",
191
+ " 'url': 'mailto:sajilawale@gmail.com'}]},\n",
192
+ " 'media': {'portfolio': 'https://www.sajilawale.com.np',\n",
193
+ " 'linkedin': 'https://www.linkedin.com/in/sajilawale/',\n",
194
+ " 'github': 'https://github.com/AwaleSajil',\n",
195
+ " 'medium': None,\n",
196
+ " 'devpost': None}},\n",
197
+ " 'summary': {'segments': [{'type': 'text',\n",
198
+ " 'content': \"Machine Learning Engineer and Master's candidate with 4+ years of experience specializing in NLP, Large Language Models, and multimodal AI for content understanding and risk identification. Proficient in end-to-end algorithm development, including fine-tuning large models, distributed training, computer vision, and agentic AI evaluation. Eager to contribute to building robust moderation models and risk ranking systems for content safety.\"}]},\n",
199
+ " 'work_experience': [{'role': {'segments': [{'type': 'text',\n",
200
+ " 'content': 'Graduate Research Assistant for LLM team'}]},\n",
201
+ " 'company': {'segments': [{'type': 'link',\n",
202
+ " 'content': 'NASA-IMPACT @ UAH',\n",
203
+ " 'url': 'https://www.earthdata.nasa.gov/about/impact'}]},\n",
204
+ " 'location': {'segments': []},\n",
205
+ " 'date_description': {'segments': [{'type': 'text',\n",
206
+ " 'content': 'August 2024 - Present'}]},\n",
207
+ " 'description': [{'segments': [{'type': 'text',\n",
208
+ " 'content': 'Conducted comparative evaluations of NASA’s Deep Literature Search Agent against Gemini and OpenAI systems using LLM-as-judge metrics (contextual precision, recall, relevance, faithfulness), enhancing reproducibility and trust in autonomous scientific research agents.'}]},\n",
209
+ " {'segments': [{'type': 'text',\n",
210
+ " 'content': 'Implemented DDP multi-GPU multi-stage training of a scientific sentence transformer using text/code pairs, achieving superior performance on science-domain information retrieval benchmarks and advancing scientific search and discovery tools.'}]},\n",
211
+ " {'segments': [{'type': 'text',\n",
212
+ " 'content': 'Pretrained a RoBERTa-based science embedding model on 520K NASA documents with extended 1024-token input and Weighted Keyword Based Dynamic Masking, achieving 78.1% top-1 MLM accuracy and outperforming baselines on keyword tagging and astrophysics tasks.'}]},\n",
213
+ " {'segments': [{'type': 'text',\n",
214
+ " 'content': 'Developed a modular multi-task fine-tuning pipeline using Hugging Face and W&B with plug-and-play config-based training/evaluation and automatic Excel reporting, streamlining model comparison and boosting team productivity.'}]},\n",
215
+ " {'segments': [{'type': 'text',\n",
216
+ " 'content': 'Built an extreme multi-label classifier for NASA CMR, scaling from 430 to 3,240 science keywords using Focal Loss and custom stratified sampling, improving F1 to 0.55 and enhancing metadata accuracy and dataset discoverability.'}]}]},\n",
217
+ " {'role': {'segments': [{'type': 'text',\n",
218
+ " 'content': 'Machine Learning Engineer'}]},\n",
219
+ " 'company': {'segments': [{'type': 'link',\n",
220
+ " 'content': 'Cedar Gate Technologies',\n",
221
+ " 'url': 'https://www.cedargate.com/'}]},\n",
222
+ " 'location': {'segments': []},\n",
223
+ " 'date_description': {'segments': [{'type': 'text',\n",
224
+ " 'content': 'July 2022 - July 2024'}]},\n",
225
+ " 'description': [{'segments': [{'type': 'text',\n",
226
+ " 'content': 'Automated ETL field mapping by fine-tuning DistilBERT for multilabel classification, achieving 0.95 recall and 0.7 IoU, and initiated full ETL automation by fine-tuning Mistral-7B to autogenerate internal data transformation scripts.'}]},\n",
227
+ " {'segments': [{'type': 'text',\n",
228
+ " 'content': 'Performed network analysis on healthcare providers to correlate patient-sharing patterns with medical costs for chronic conditions, revealing key cost drivers through data mining.'}]},\n",
229
+ " {'segments': [{'type': 'text',\n",
230
+ " 'content': 'Developed a LightGBM model to predict healthcare cost-risk (MARA scores) with an R2 of 0.74 and MCC of 0.45, enabling proactive care management.'}]},\n",
231
+ " {'segments': [{'type': 'text',\n",
232
+ " 'content': 'Optimized segmentation of frequent ER visitors by evaluating scaling, feature extraction, and clustering methods, identifying K-Means (6 clusters) with an auto-encoder as the most effective model for clear cluster discrimination.'}]},\n",
233
+ " {'segments': [{'type': 'text',\n",
234
+ " 'content': 'Analyzed local model explainability tools (permutation SHAP, Deep Explainer, LIME), identifying FastSHAP as the optimal solution for a production diabetes model based on speed and performance (87.2% Inclusion AUC).'}]}]},\n",
235
+ " {'role': {'segments': [{'type': 'link',\n",
236
+ " 'content': 'Machine Learning Engineer',\n",
237
+ " 'url': 'https://photos.app.goo.gl/vVbF4bHvcjuexqnL6'}]},\n",
238
+ " 'company': {'segments': [{'type': 'link',\n",
239
+ " 'content': 'Docsumo',\n",
240
+ " 'url': 'https://www.docsumo.com/'}]},\n",
241
+ " 'location': {'segments': []},\n",
242
+ " 'date_description': {'segments': [{'type': 'text',\n",
243
+ " 'content': 'March 2022 - June 2022'}]},\n",
244
+ " 'description': [{'segments': [{'type': 'text',\n",
245
+ " 'content': 'Evaluated multiple document reading order detection techniques (e.g., DBSCAN, recursive XY-cut, layout reader) to enhance Named Entity Recognition (NER) performance on complex layouts, measuring success with ROUGE-L and BLEU scores for improved information extraction.'}]},\n",
246
+ " {'segments': [{'type': 'text',\n",
247
+ " 'content': 'Benchmarked spaCy v2 vs. v3 Named Entity Recognition (NER) pipelines for information extraction from OCR-scanned documents based on performance, speed, and size, providing key data for a strategic upgrade decision.'}]}]},\n",
248
+ " {'role': {'segments': [{'type': 'link',\n",
249
+ " 'content': 'Associate Data Engineer',\n",
250
+ " 'url': 'https://photos.app.goo.gl/zjsUJiMr6ZmVhfqz9'}]},\n",
251
+ " 'company': {'segments': [{'type': 'link',\n",
252
+ " 'content': 'Deerwalk',\n",
253
+ " 'url': 'https://www.cedargate.com/'}]},\n",
254
+ " 'location': {'segments': []},\n",
255
+ " 'date_description': {'segments': [{'type': 'text',\n",
256
+ " 'content': 'May 2021 - Feb 2022'}]},\n",
257
+ " 'description': [{'segments': [{'type': 'text',\n",
258
+ " 'content': 'Managed ETL processes for new vendor onboarding, ensuring data integrity for US healthcare data, and resolved critical production issues related to data processing and client requests.'}]}]}],\n",
259
+ " 'education': [{'degree': {'segments': [{'type': 'link',\n",
260
+ " 'content': 'Master’s in Computer Science - Data Science (Concentration)',\n",
261
+ " 'url': 'https://www.uah.edu/science/departments/computer-science/cs-graduate-programs'}]},\n",
262
+ " 'university': {'segments': [{'type': 'link',\n",
263
+ " 'content': 'University of Alabama in Huntsville',\n",
264
+ " 'url': 'https://www.uah.edu/'}]},\n",
265
+ " 'location': {'segments': [{'type': 'text',\n",
266
+ " 'content': 'Huntsville, Alabama,USA'}]},\n",
267
+ " 'date_description': {'segments': [{'type': 'text',\n",
268
+ " 'content': '2024 - 2026'}]},\n",
269
+ " 'grade': {'segments': [{'type': 'text',\n",
270
+ " 'content': 'Current GPA: 4.0/4.0'}]},\n",
271
+ " 'courses': None},\n",
272
+ " {'degree': {'segments': [{'type': 'link',\n",
273
+ " 'content': 'B.E. Electronics & Communication',\n",
274
+ " 'url': 'https://doece.pcampus.edu.np/index.php/bex-becie/'}]},\n",
275
+ " 'university': {'segments': [{'type': 'link',\n",
276
+ " 'content': 'Institute of Engineering, Pulchowk Campus, Tribhuvan University',\n",
277
+ " 'url': 'https://pcampus.edu.np/'}]},\n",
278
+ " 'location': {'segments': [{'type': 'text', 'content': 'Lalitpur, Nepal'}]},\n",
279
+ " 'date_description': {'segments': [{'type': 'text',\n",
280
+ " 'content': '2016 - 2021'}]},\n",
281
+ " 'grade': {'segments': [{'type': 'text',\n",
282
+ " 'content': 'Full Scholarship; Aggregate: 79.45%; Rank: '},\n",
283
+ " {'type': 'link',\n",
284
+ " 'content': '8th',\n",
285
+ " 'url': 'https://photos.app.goo.gl/C4QsvJgsfx9jguQn7'},\n",
286
+ " {'type': 'text', 'content': ' Position (Top 1% in University)'}]},\n",
287
+ " 'courses': None}],\n",
288
+ " 'skill_sections': [{'name': {'segments': [{'type': 'text',\n",
289
+ " 'content': 'Machine Learning & Deep Learning Frameworks'}]},\n",
290
+ " 'skills': [{'segments': [{'type': 'text', 'content': 'Pytorch'}]},\n",
291
+ " {'segments': [{'type': 'text', 'content': 'Transformers'}]},\n",
292
+ " {'segments': [{'type': 'text', 'content': 'OpenCV'}]},\n",
293
+ " {'segments': [{'type': 'text', 'content': 'Scikit-Learn'}]},\n",
294
+ " {'segments': [{'type': 'text', 'content': 'Spacy'}]},\n",
295
+ " {'segments': [{'type': 'text', 'content': 'Keras'}]},\n",
296
+ " {'segments': [{'type': 'text', 'content': 'W&B'}]},\n",
297
+ " {'segments': [{'type': 'text', 'content': 'Imbalanced-Learn'}]},\n",
298
+ " {'segments': [{'type': 'text', 'content': 'Hyperopt'}]}]},\n",
299
+ " {'name': {'segments': [{'type': 'text',\n",
300
+ " 'content': 'Programming Languages'}]},\n",
301
+ " 'skills': [{'segments': [{'type': 'text', 'content': 'Python'}]},\n",
302
+ " {'segments': [{'type': 'text', 'content': 'SQL'}]},\n",
303
+ " {'segments': [{'type': 'text', 'content': 'C++'}]}]},\n",
304
+ " {'name': {'segments': [{'type': 'text',\n",
305
+ " 'content': 'Data Processing & Big Data'}]},\n",
306
+ " 'skills': [{'segments': [{'type': 'text', 'content': 'Pyspark'}]},\n",
307
+ " {'segments': [{'type': 'text', 'content': 'Hadoop'}]},\n",
308
+ " {'segments': [{'type': 'text', 'content': 'Pandas'}]},\n",
309
+ " {'segments': [{'type': 'text', 'content': 'Numpy'}]},\n",
310
+ " {'segments': [{'type': 'text', 'content': 'Dask'}]},\n",
311
+ " {'segments': [{'type': 'text', 'content': 'Scipy'}]}]},\n",
312
+ " {'name': {'segments': [{'type': 'text', 'content': 'Cloud & MLOps'}]},\n",
313
+ " 'skills': [{'segments': [{'type': 'text',\n",
314
+ " 'content': 'Amazon EMR (Hadoop/Serverless)'}]},\n",
315
+ " {'segments': [{'type': 'text', 'content': 'Amazon S3'}]},\n",
316
+ " {'segments': [{'type': 'text', 'content': 'Amazon Redshift'}]},\n",
317
+ " {'segments': [{'type': 'text', 'content': 'Amazon EC2'}]},\n",
318
+ " {'segments': [{'type': 'text', 'content': 'FastAPI'}]},\n",
319
+ " {'segments': [{'type': 'text', 'content': 'Flask'}]},\n",
320
+ " {'segments': [{'type': 'text', 'content': 'Rest framework'}]}]},\n",
321
+ " {'name': {'segments': [{'type': 'text',\n",
322
+ " 'content': 'Data Analysis & Visualization'}]},\n",
323
+ " 'skills': [{'segments': [{'type': 'text', 'content': 'Matplotlib'}]},\n",
324
+ " {'segments': [{'type': 'text', 'content': 'Seaborn'}]},\n",
325
+ " {'segments': [{'type': 'text', 'content': 'Plotly'}]}]}],\n",
326
+ " 'projects': [{'name': {'segments': [{'type': 'link',\n",
327
+ " 'content': 'Funny Project',\n",
328
+ " 'url': 'https://github.com/AwaleSajil/FunnyProject'}]},\n",
329
+ " 'type': {'segments': [{'type': 'text', 'content': 'BigData Project'}]},\n",
330
+ " 'link': {'type': 'link',\n",
331
+ " 'content': 'GitHub',\n",
332
+ " 'url': 'https://github.com/AwaleSajil/FunnyProject'},\n",
333
+ " 'resources': [],\n",
334
+ " 'date_description': {'segments': [{'type': 'text', 'content': '2024'}]},\n",
335
+ " 'description': [{'segments': [{'type': 'text',\n",
336
+ " 'content': 'Engineered a two-stage NLP pipeline to classify 570,000+ jokes by humor, offensiveness, and sentiment, achieving a 0.86 weighted F1-score by fine-tuning a BERT model on a 55k-sample dataset labeled by local LLMs (Mistral, Gemma3); subsequently Dockerized the inference pipeline for scalable deployment in content understanding scenarios.'}]}]},\n",
337
+ " {'name': {'segments': [{'type': 'link',\n",
338
+ " 'content': 'Real Time Visual Localisation and Mapping of Mobile Robot in Dynamic Environment',\n",
339
+ " 'url': 'https://docs.google.com/presentation/d/1wIIM4iFpwIpxakO2Fubz-7BHouQZl0Tq2z-sj45jA7Y/edit?usp=sharing'}]},\n",
340
+ " 'type': {'segments': [{'type': 'text',\n",
341
+ " 'content': 'College Major Project'}]},\n",
342
+ " 'link': {'type': 'link',\n",
343
+ " 'content': 'Presentation',\n",
344
+ " 'url': 'https://docs.google.com/presentation/d/1wIIM4iFpwIpxakO02Fubz-7BHouQZl0Tq2z-sj45jA7Y/edit?usp=sharing'},\n",
345
+ " 'resources': [],\n",
346
+ " 'date_description': {'segments': [{'type': 'text',\n",
347
+ " 'content': '2019 - 2020'}]},\n",
348
+ " 'description': [{'segments': [{'type': 'text',\n",
349
+ " 'content': 'Developed a real-time Visual SLAM system for mobile robots, reconstructing 3D scenes from 2D images and enhancing robustness in dynamic environments by fine-tuning and applying ICNet, a semantic segmentation model, to accurately mask and disregard prevalent dynamic objects like humans from visual landmarks.'}]}]},\n",
350
+ " {'name': {'segments': [{'type': 'link',\n",
351
+ " 'content': 'Image Auto Alignment',\n",
352
+ " 'url': 'https://docs.google.com/presentation/d/1n_Hv8l0_MGLNv62PnMt76vsGcCsRyS7TZuXABBb1jmc/edit?usp=sharing'}]},\n",
353
+ " 'type': {'segments': [{'type': 'text', 'content': 'Weekend Project'}]},\n",
354
+ " 'link': {'type': 'link',\n",
355
+ " 'content': 'Presentation',\n",
356
+ " 'url': 'https://docs.google.com/presentation/d/1n_Hv8l0_MGLNv62PnMt76vsGcCsRyS7TZuXABBb1jmc/edit?usp=sharing'},\n",
357
+ " 'resources': [],\n",
358
+ " 'date_description': {'segments': [{'type': 'text', 'content': '2024'}]},\n",
359
+ " 'description': [{'segments': [{'type': 'text',\n",
360
+ " 'content': 'Engineered two computer vision solutions for automated image rotation correction: a rule-based Flask API for document orientation (leveraging line detection and text-weight heuristics) and an ML-based model utilizing MobileNetV2 for general images, framed as a regression task that achieved 2.6° MAE on a self-supervised Flickr dataset.'}]}]},\n",
361
+ " {'name': {'segments': [{'type': 'link',\n",
362
+ " 'content': 'Precision Livestock Farming — Improving Productivity of Broiler Chicken farm with technology',\n",
363
+ " 'url': 'https://docs.google.com/presentation/d/1fiFOfY1UPjH535EpBikLpd_-gw37Te88by8h5pYT5U/edit?usp=sharing'}]},\n",
364
+ " 'type': {'segments': [{'type': 'text', 'content': 'LOCUS 2019 Project'}]},\n",
365
+ " 'link': {'type': 'link',\n",
366
+ " 'content': 'Presentation',\n",
367
+ " 'url': 'https://docs.google.com/presentation/d/1fiFOfY1UPjH535EpBikLpd_-gw37Te88by8h5pYT5U/edit?usp=sharing'},\n",
368
+ " 'resources': [],\n",
369
+ " 'date_description': {'segments': [{'type': 'text', 'content': '2019'}]},\n",
370
+ " 'description': [{'segments': [{'type': 'text',\n",
371
+ " 'content': 'Developed a Precision Livestock Farming system integrating computer vision (YOLO for chicken detection, SORT for mobility tracking) and audio analysis (feeder microphone for eating behavior estimation) to effectively monitor broiler chickens and optimize environmental conditions.'}]}]},\n",
372
+ " {'name': {'segments': [{'type': 'link',\n",
373
+ " 'content': 'Vehicle Traffic Analysis and Management',\n",
374
+ " 'url': 'https://drive.google.com/file/d/1N8H8YIwp3FTOcbLwzW4fNXuKU48Yo1D2/view?usp=sharing'}]},\n",
375
+ " 'type': {'segments': [{'type': 'text',\n",
376
+ " 'content': 'College Minor Project'}]},\n",
377
+ " 'link': {'type': 'link',\n",
378
+ " 'content': 'Document',\n",
379
+ " 'url': 'https://drive.google.com/file/d/1N8H8YIwp3FTOcbLwzW4fNXuKU48Yo1D2/view?usp=sharing'},\n",
380
+ " 'resources': [],\n",
381
+ " 'date_description': {'segments': [{'type': 'text', 'content': '2019'}]},\n",
382
+ " 'description': [{'segments': [{'type': 'text',\n",
383
+ " 'content': 'Assessed and managed traffic flow at road junctions by implementing vehicle counting using YOLO and SORT from diverse video sources, and applied the Webster algorithm to determine optimal traffic signal timings, improving traffic efficiency.'}]}]}],\n",
384
+ " 'certifications': [{'certificate_info': {'segments': [{'type': 'link',\n",
385
+ " 'content': 'Deep Learning Specialization by DeepLearning.AI on Coursera.',\n",
386
+ " 'url': 'https://www.coursera.org/account/accomplishments/specialization/CMV425VZYK92?utm_source=link&utm_medium=certificate&utm_content=cert_image&utm_campaign=sharing_cta&utm_product=s12n'}]},\n",
387
+ " 'date': None},\n",
388
+ " {'certificate_info': {'segments': [{'type': 'link',\n",
389
+ " 'content': 'Applied Deep Learning Capstone Project by ibm on edx.',\n",
390
+ " 'url': 'https://courses.edx.org/certificates/6154999d04c34c329bd68f3fcbd7e0a2'}]},\n",
391
+ " 'date': None},\n",
392
+ " {'certificate_info': {'segments': [{'type': 'link',\n",
393
+ " 'content': 'Specialized Models: Time Series and Survival Analysis on Coursera',\n",
394
+ " 'url': 'https://www.coursera.org/account/accomplishments/certificate/5U3ZQ9767CRW'}]},\n",
395
+ " 'date': None}],\n",
396
+ " 'achievements': [{'name': {'segments': [{'type': 'link',\n",
397
+ " 'content': 'Institute of Engineering Scholarship for BE',\n",
398
+ " 'url': 'https://media.edusanjal.com/redactor/Download%20TU%20IOE%20Entrance%20Examination%20Result.pdf'}]},\n",
399
+ " 'issued_by': {'segments': [{'type': 'link',\n",
400
+ " 'content': 'Tribhuvan University, IOE',\n",
401
+ " 'url': 'https://tu.edu.np/pages/institute-of-engineering-4'}]},\n",
402
+ " 'date': {'segments': [{'type': 'text', 'content': '2017'}]},\n",
403
+ " 'description': [{'segments': [{'type': 'text',\n",
404
+ " 'content': 'Received full scholarship to study engineering in the most reputed engineering college of Nepal for securing 58th rank in a competitive entrance examination given by more than ten thousand students.'}]}]},\n",
405
+ " {'name': {'segments': [{'type': 'link',\n",
406
+ " 'content': 'Best Thematic Hardware Project',\n",
407
+ " 'url': 'https://photos.app.goo.gl/KDFNt1KtSUU9xkkXA'}]},\n",
408
+ " 'issued_by': {'segments': [{'type': 'link',\n",
409
+ " 'content': 'LOCUS',\n",
410
+ " 'url': 'https://locus.pcampus.edu.np/'}]},\n",
411
+ " 'date': {'segments': [{'type': 'text', 'content': '2019'}]},\n",
412
+ " 'description': [{'segments': [{'type': 'text',\n",
413
+ " 'content': \"Awarded for 'Precision Livestock Farming' during the 16th National Technological Festival held by LOCUS, Pulchowk Campus.\"}]}]}],\n",
414
+ " 'custom_sections': {'Exchange Program and Fellowship': [{'title': {'segments': [{'type': 'link',\n",
415
+ " 'content': 'First Nepal Winter School in AI',\n",
416
+ " 'url': 'https://photos.app.goo.gl/kBatEMLzQqRJKU37'}]},\n",
417
+ " 'subtitle': {'segments': [{'type': 'link',\n",
418
+ " 'content': 'Nepal Applied Mathematics and Informatics Institute for Research (NAAMII)',\n",
419
+ " 'url': 'https://www.naamii.org.np/'}]},\n",
420
+ " 'date_description': {'segments': [{'type': 'text',\n",
421
+ " 'content': '20th - 30th Dec, 2018'}]},\n",
422
+ " 'description': [{'segments': [{'type': 'text',\n",
423
+ " 'content': 'Gained foundational knowledge in Deep Learning, probability, statistics, and linear algebra from esteemed professors, crucial for advanced algorithm development.'}]},\n",
424
+ " {'segments': [{'type': 'text',\n",
425
+ " 'content': 'Completed hands-on lab assignments directly applying concepts in computer vision and natural language processing (NLP), aligning with key model development areas for content safety.'}]}]},\n",
426
+ " {'title': {'segments': [{'type': 'link',\n",
427
+ " 'content': 'Sakura Science Exchange Program',\n",
428
+ " 'url': 'https://photos.app.goo.gl/P8gFatguLP5F1kmM9'}]},\n",
429
+ " 'subtitle': {'segments': [{'type': 'link',\n",
430
+ " 'content': 'Japan Science and Technology Agency',\n",
431
+ " 'url': 'https://www.jst.go.jp/EN/'}]},\n",
432
+ " 'date_description': {'segments': [{'type': 'text',\n",
433
+ " 'content': '16th - 23th Dec, 2019'}]},\n",
434
+ " 'description': [{'segments': [{'type': 'text',\n",
435
+ " 'content': 'Selected as one of the top 3 students, demonstrating strong academic capability and a proactive approach to learning cutting-edge technologies.'}]},\n",
436
+ " {'segments': [{'type': 'text',\n",
437
+ " 'content': 'Engaged in technical exchange with international peers, presenting a poster and discussing solutions, enhancing communication and teamwork skills.'}]},\n",
438
+ " {'segments': [{'type': 'text',\n",
439
+ " 'content': 'Participated in sessions on advanced Artificial Intelligence and IoT, fostering a keen interest in innovative technological solutions relevant to content understanding.'}]}]}],\n",
440
+ " 'Volunteering and Teaching experience': [{'title': {'segments': [{'type': 'text',\n",
441
+ " 'content': 'Training on ML Applications'}]},\n",
442
+ " 'subtitle': {'segments': [{'type': 'text',\n",
443
+ " 'content': 'Mentors Club, Cedargate'}]},\n",
444
+ " 'date_description': {'segments': [{'type': 'text', 'content': '2023'}]},\n",
445
+ " 'description': [{'segments': [{'type': 'text',\n",
446
+ " 'content': 'Conducted comprehensive sessions for all Cedargate employees in Nepal, covering a variety of ML algorithms and providing insights into operational procedures and production-contributing projects.'}]}]},\n",
447
+ " {'title': {'segments': [{'type': 'link',\n",
448
+ " 'content': 'Hardware Fellowship',\n",
449
+ " 'url': 'https://photos.app.goo.gl/pM3E4DLs12xjgP7FA'}]},\n",
450
+ " 'subtitle': {'segments': [{'type': 'link',\n",
451
+ " 'content': 'LOCUS',\n",
452
+ " 'url': 'https://www.facebook.com/locus.ioe/posts/hardware-fellowshiphope-you-guys-are-learninglocusnepal2019/825855697584931/'}]},\n",
453
+ " 'date_description': {'segments': [{'type': 'text', 'content': '2020'}]},\n",
454
+ " 'description': [{'segments': [{'type': 'text',\n",
455
+ " 'content': 'Instructed nearly 100 students in Arduino programming and electronic hardware design, demonstrating strong communication skills and foundational technical knowledge relevant to algorithm development.'}]},\n",
456
+ " {'segments': [{'type': 'text',\n",
457
+ " 'content': 'Mentored a team of junior-year students through a project to create a 2D CNC plotter, showcasing leadership, teamwork, and an ability to guide technical projects.'}]}]}],\n",
458
+ " 'References': [{'title': {'segments': [{'type': 'text',\n",
459
+ " 'content': 'Tathagata Mukharjee, Professor at University of Alabama in Huntsville'}]},\n",
460
+ " 'subtitle': {'segments': [{'type': 'text', 'content': 'tm0130@uh.edu'}]},\n",
461
+ " 'date_description': None,\n",
462
+ " 'description': None},\n",
463
+ " {'title': {'segments': [{'type': 'text',\n",
464
+ " 'content': 'Stacey Finn, Director of Data Science and Analytics at CedarGate'}]},\n",
465
+ " 'subtitle': {'segments': [{'type': 'text',\n",
466
+ " 'content': 'safinn5@gmail.com'}]},\n",
467
+ " 'date_description': None,\n",
468
+ " 'description': None}]}}"
469
+ ]
470
+ },
471
+ "execution_count": 24,
472
+ "metadata": {},
473
+ "output_type": "execute_result"
474
+ }
475
+ ],
476
+ "source": [
477
+ "pp.resume_details"
478
+ ]
479
+ },
480
+ {
481
+ "cell_type": "code",
482
+ "execution_count": 22,
483
+ "metadata": {},
484
+ "outputs": [
485
+ {
486
+ "data": {
487
+ "text/plain": [
488
+ "dict_keys(['Exchange Program and Fellowship', 'Volunteering and Teaching experience', 'References'])"
489
+ ]
490
+ },
491
+ "execution_count": 22,
492
+ "metadata": {},
493
+ "output_type": "execute_result"
494
+ }
495
+ ],
496
+ "source": [
497
+ "pp.resume_details[\"custom_sections\"].keys()"
498
+ ]
499
+ },
500
+ {
501
+ "cell_type": "code",
502
+ "execution_count": 23,
503
+ "metadata": {},
504
+ "outputs": [
505
+ {
506
+ "data": {
507
+ "text/plain": [
508
+ "[{'title': {'segments': [{'type': 'text',\n",
509
+ " 'content': 'Tathagata Mukharjee, Professor at University of Alabama in Huntsville'}]},\n",
510
+ " 'subtitle': {'segments': [{'type': 'text', 'content': 'tm0130@uh.edu'}]},\n",
511
+ " 'date_description': None,\n",
512
+ " 'description': None},\n",
513
+ " {'title': {'segments': [{'type': 'text',\n",
514
+ " 'content': 'Stacey Finn, Director of Data Science and Analytics at CedarGate'}]},\n",
515
+ " 'subtitle': {'segments': [{'type': 'text', 'content': 'safinn5@gmail.com'}]},\n",
516
+ " 'date_description': None,\n",
517
+ " 'description': None}]"
518
+ ]
519
+ },
520
+ "execution_count": 23,
521
+ "metadata": {},
522
+ "output_type": "execute_result"
523
+ }
524
+ ],
525
+ "source": [
526
+ "pp.resume_details[\"custom_sections\"][\"References\"]"
527
+ ]
528
+ },
529
+ {
530
+ "cell_type": "code",
531
+ "execution_count": 11,
532
+ "metadata": {},
533
+ "outputs": [
534
+ {
535
+ "data": {
536
+ "text/plain": [
537
+ "{'Exchange Program and Fellowship': [{'title': {'segments': [{'type': 'link',\n",
538
+ " 'content': 'First Nepal Winter School in AI',\n",
539
+ " 'url': 'https://photos.app.goo.gl/kBatEMLzQqRJKU37'}]},\n",
540
+ " 'subtitle': {'segments': [{'type': 'link',\n",
541
+ " 'content': 'Nepal Applied Mathematics and Informatics Institute for Research (NAAMII)',\n",
542
+ " 'url': 'https://www.naamii.org.np/'}]},\n",
543
+ " 'date_description': {'segments': [{'type': 'text',\n",
544
+ " 'content': '20th - 30th Dec, 2018'}]},\n",
545
+ " 'description': [{'segments': [{'type': 'text',\n",
546
+ " 'content': 'Gained foundational knowledge in Deep Learning, probability, statistics, and linear algebra from esteemed professors, crucial for advanced algorithm development.'}]},\n",
547
+ " {'segments': [{'type': 'text',\n",
548
+ " 'content': 'Completed hands-on lab assignments directly applying concepts in computer vision and natural language processing (NLP), aligning with key model development areas for content safety.'}]}]},\n",
549
+ " {'title': {'segments': [{'type': 'link',\n",
550
+ " 'content': 'Sakura Science Exchange Program',\n",
551
+ " 'url': 'https://photos.app.goo.gl/P8gFatguLP5F1kmM9'}]},\n",
552
+ " 'subtitle': {'segments': [{'type': 'link',\n",
553
+ " 'content': 'Japan Science and Technology Agency',\n",
554
+ " 'url': 'https://www.jst.go.jp/EN/'}]},\n",
555
+ " 'date_description': {'segments': [{'type': 'text',\n",
556
+ " 'content': '16th - 23th Dec, 2019'}]},\n",
557
+ " 'description': [{'segments': [{'type': 'text',\n",
558
+ " 'content': 'Selected as one of the top 3 students, demonstrating strong academic capability and a proactive approach to learning cutting-edge technologies.'}]},\n",
559
+ " {'segments': [{'type': 'text',\n",
560
+ " 'content': 'Engaged in technical exchange with international peers, presenting a poster and discussing solutions, enhancing communication and teamwork skills.'}]},\n",
561
+ " {'segments': [{'type': 'text',\n",
562
+ " 'content': 'Participated in sessions on advanced Artificial Intelligence and IoT, fostering a keen interest in innovative technological solutions relevant to content understanding.'}]}]}],\n",
563
+ " 'Volunteering and Teaching experience': [{'title': {'segments': [{'type': 'text',\n",
564
+ " 'content': 'Training on ML Applications'}]},\n",
565
+ " 'subtitle': {'segments': [{'type': 'text',\n",
566
+ " 'content': 'Mentors Club, Cedargate'}]},\n",
567
+ " 'date_description': {'segments': [{'type': 'text', 'content': '2023'}]},\n",
568
+ " 'description': [{'segments': [{'type': 'text',\n",
569
+ " 'content': 'Conducted comprehensive sessions for all Cedargate employees in Nepal, covering a variety of ML algorithms and providing insights into operational procedures and production-contributing projects.'}]}]},\n",
570
+ " {'title': {'segments': [{'type': 'link',\n",
571
+ " 'content': 'Hardware Fellowship',\n",
572
+ " 'url': 'https://photos.app.goo.gl/pM3E4DLs12xjgP7FA'}]},\n",
573
+ " 'subtitle': {'segments': [{'type': 'link',\n",
574
+ " 'content': 'LOCUS',\n",
575
+ " 'url': 'https://www.facebook.com/locus.ioe/posts/hardware-fellowshiphope-you-guys-are-learninglocusnepal2019/825855697584931/'}]},\n",
576
+ " 'date_description': {'segments': [{'type': 'text', 'content': '2020'}]},\n",
577
+ " 'description': [{'segments': [{'type': 'text',\n",
578
+ " 'content': 'Instructed nearly 100 students in Arduino programming and electronic hardware design, demonstrating strong communication skills and foundational technical knowledge relevant to algorithm development.'}]},\n",
579
+ " {'segments': [{'type': 'text',\n",
580
+ " 'content': 'Mentored a team of junior-year students through a project to create a 2D CNC plotter, showcasing leadership, teamwork, and an ability to guide technical projects.'}]}]}],\n",
581
+ " 'References': [{'title': {'segments': [{'type': 'text',\n",
582
+ " 'content': 'Tathagata Mukharjee, Professor at University of Alabama in Huntsville'}]},\n",
583
+ " 'subtitle': {'segments': [{'type': 'text', 'content': 'tm0130@uh.edu'}]},\n",
584
+ " 'date_description': None,\n",
585
+ " 'description': None},\n",
586
+ " {'title': {'segments': [{'type': 'text',\n",
587
+ " 'content': 'Stacey Finn, Director of Data Science and Analytics at CedarGate'}]},\n",
588
+ " 'subtitle': {'segments': [{'type': 'text',\n",
589
+ " 'content': 'safinn5@gmail.com'}]},\n",
590
+ " 'date_description': None,\n",
591
+ " 'description': None}]}"
592
+ ]
593
+ },
594
+ "execution_count": 11,
595
+ "metadata": {},
596
+ "output_type": "execute_result"
597
+ }
598
+ ],
599
+ "source": [
600
+ "pp.resume_details[\"custom_sections\"]"
601
+ ]
602
+ },
603
+ {
604
+ "cell_type": "code",
605
+ "execution_count": 12,
606
+ "metadata": {},
607
+ "outputs": [
608
+ {
609
+ "data": {
610
+ "text/plain": [
611
+ "{'personal_info': {'name': {'segments': [{'type': 'text',\n",
612
+ " 'content': 'Sajil Awale'}]},\n",
613
+ " 'location': {'segments': [{'type': 'text',\n",
614
+ " 'content': 'Huntsville, Alabama'}]},\n",
615
+ " 'phone': {'segments': [{'type': 'text', 'content': '+1-256-417-3690'}]},\n",
616
+ " 'email': {'segments': [{'type': 'link',\n",
617
+ " 'content': 'sajilawale@gmail.com',\n",
618
+ " 'url': 'mailto:sajilawale@gmail.com'}]},\n",
619
+ " 'media': {'portfolio': 'https://www.sajilawale.com.np',\n",
620
+ " 'linkedin': 'https://www.linkedin.com/in/sajilawale/',\n",
621
+ " 'github': 'https://github.com/AwaleSajil',\n",
622
+ " 'medium': None,\n",
623
+ " 'devpost': None}},\n",
624
+ " 'summary': {'segments': [{'type': 'text',\n",
625
+ " 'content': 'Machine Learning Engineer with 4+ years of experience specializing in NLP, Large Language Models, and Agentic AI evaluation. Currently a CS Master’s student (4.0 GPA) at NASA-IMPACT, developing foundational scientific embedding models and benchmarking autonomous research agents. Expert in deploying production-grade ML pipelines for healthcare analytics and document automation.'}]},\n",
626
+ " 'work_experience': [{'role': {'segments': [{'type': 'text',\n",
627
+ " 'content': 'Graduate Research Assistant for LLM team'}]},\n",
628
+ " 'company': {'segments': [{'type': 'link',\n",
629
+ " 'content': 'NASA-IMPACT @ UAH',\n",
630
+ " 'url': 'https://www.earthdata.nasa.gov/about/impact'}]},\n",
631
+ " 'location': {'segments': []},\n",
632
+ " 'date_description': {'segments': [{'type': 'text',\n",
633
+ " 'content': 'August 2024 - Present'}]},\n",
634
+ " 'description': [{'segments': [{'type': 'text',\n",
635
+ " 'content': 'Science Keyword Recommender: Built an extreme multi-label classifier for NASA CMR, scaling from 430 to 3,240 science keywords. Used Focal Loss and custom stratified sampling to improve F1 to 0.55, enhancing metadata accuracy and dataset discoverability.'}]},\n",
636
+ " {'segments': [{'type': 'text',\n",
637
+ " 'content': 'Pre-training Science Embedding Model (Indus-SDE): Pretrained a RoBERTa-based model on 520K NASA documents with extended 1024-token input and Weighted Keyword Based Dynamic Masking. Achieved 78.1% top-1 MLM accuracy, outperforming baselines on keyword tagging, astrophysics, and EJ tasks.'}]},\n",
638
+ " {'segments': [{'type': 'text',\n",
639
+ " 'content': 'Downstream Task Unification Framework: Developed a modular multi-task fine-tuning pipeline using Hugging Face and W&B. Enabled plug-and-play config-based training/evaluation with automatic Excel reporting, streamlining model comparison and boosting team productivity.'}]},\n",
640
+ " {'segments': [{'type': 'text',\n",
641
+ " 'content': 'Training Sentence Transformer (Indus-SDE-ST): Implemented DDP multi-GPU multi-stage training of a scientific sentence transformer using text/code pairs. Early results show superior performance on science-domain information retrieval benchmarks, pushing forward scientific search and discovery tools.'}]},\n",
642
+ " {'segments': [{'type': 'text',\n",
643
+ " 'content': 'Agentic AI Evaluation & Benchmarking – Conducted comparative evaluations of NASA’s Deep Literature Search Agent against Gemini and OpenAI systems using LLM-as-judge metrics (contextual precision, recall, relevance, faithfulness). Explored agent reliability, metric stability, and variance reduction strategies to improve reproducibility and trust in autonomous scientific research agents.'}]}]},\n",
644
+ " {'role': {'segments': [{'type': 'text',\n",
645
+ " 'content': 'Machine Learning Engineer'}]},\n",
646
+ " 'company': {'segments': [{'type': 'link',\n",
647
+ " 'content': 'Cedar Gate Technologies',\n",
648
+ " 'url': 'https://www.cedargate.com/'}]},\n",
649
+ " 'location': {'segments': []},\n",
650
+ " 'date_description': {'segments': [{'type': 'text',\n",
651
+ " 'content': 'July 2022 - July 2024'}]},\n",
652
+ " 'description': [{'segments': [{'type': 'text',\n",
653
+ " 'content': 'Automated ETL field mapping by fine-tuning DistilBERT for multilabel classification to suggest source-to-destination field mappings and achieved 0.95 recall and 0.7 IoU. Initiated full ETL automation by fine-tuning Mistral-7B to autogenerate internal data transformation scripts.'}]},\n",
654
+ " {'segments': [{'type': 'text',\n",
655
+ " 'content': 'Analyzed local model explainability tools (permutation SHAP, Deep Explainer, LIME), identifying FastSHAP as the optimal solution for a production diabetes model with extensive features based on speed and performance (87.2% Inclusion AUC).'}]},\n",
656
+ " {'segments': [{'type': 'text',\n",
657
+ " 'content': 'Performed network analysis on healthcare providers to correlate patient-sharing patterns among physicians with medical costs for patients with chronic conditions like Chronic Heart Failure and Diabetes, revealing key cost drivers.'}]},\n",
658
+ " {'segments': [{'type': 'text',\n",
659
+ " 'content': 'Optimized the segmentation of frequent ER visitors by systematically evaluating various scaling, feature extraction, and clustering methods. Utilized logistic regression coefficients for rapid cluster discrimination, ultimately identifying K-Means (6 clusters) with an auto-encoder as the most effective model, based on cluster metrics for overlap, quality, and cardinality.'}]},\n",
660
+ " {'segments': [{'type': 'text',\n",
661
+ " 'content': 'Developed a LightGBM model to predict healthcare cost-risk (MARA scores), and the likelihood of it increasing or decreasing; achieving an R2 of 0.74 and MCC of 0.45, enabling proactive care management.'}]},\n",
662
+ " {'segments': [{'type': 'text',\n",
663
+ " 'content': 'Built a Gradient Boosted model to predict patient compliance with preventive care visits next year, achieving an MCC score over 0.75 to support targeted outreach initiatives.'}]}]},\n",
664
+ " {'role': {'segments': [{'type': 'link',\n",
665
+ " 'content': 'Machine Learning Engineer',\n",
666
+ " 'url': 'https://photos.app.goo.gl/vVbF4bHvcjuexqnL6'}]},\n",
667
+ " 'company': {'segments': [{'type': 'link',\n",
668
+ " 'content': 'Docsumo',\n",
669
+ " 'url': 'https://www.docsumo.com/'}]},\n",
670
+ " 'location': {'segments': []},\n",
671
+ " 'date_description': {'segments': [{'type': 'text',\n",
672
+ " 'content': 'March 2022 - June 2022'}]},\n",
673
+ " 'description': [{'segments': [{'type': 'text',\n",
674
+ " 'content': 'Benchmarked spaCy v2 vs. v3 Named Entity Recognition (NER) pipelines for information extraction from OCR-scanned documents based on their performance, speed, and size, providing key data for a strategic upgrade decision.'}]},\n",
675
+ " {'segments': [{'type': 'text',\n",
676
+ " 'content': 'Evaluated multiple document reading order detection techniques (e.g., DBSCAN, recursive XY-cut, layout reader, line-based block separation, docstrum) to enhance NER performance on complex layouts, measuring success with ROUGE-L and BLEU scores.'}]}]},\n",
677
+ " {'role': {'segments': [{'type': 'link',\n",
678
+ " 'content': 'Associate Data Engineer',\n",
679
+ " 'url': 'https://photos.app.goo.gl/zjsUJiMr6ZmVhfqz9'}]},\n",
680
+ " 'company': {'segments': [{'type': 'link',\n",
681
+ " 'content': 'Deerwalk',\n",
682
+ " 'url': 'https://www.cedargate.com/'}]},\n",
683
+ " 'location': {'segments': []},\n",
684
+ " 'date_description': {'segments': [{'type': 'text',\n",
685
+ " 'content': 'May 2021 - Feb 2022'}]},\n",
686
+ " 'description': [{'segments': [{'type': 'text',\n",
687
+ " 'content': 'Onboard new vendors (ETL processes on US healthcare data), ensure data integrity, analyse bugs which was triggered during data processing or client request and promptly resolve critical production issues'}]}]}],\n",
688
+ " 'education': [{'degree': {'segments': [{'type': 'link',\n",
689
+ " 'content': 'Master’s in Computer Science - Data Science (Concentration)',\n",
690
+ " 'url': 'https://www.uah.edu/science/departments/computer-science/cs-graduate-programs'}]},\n",
691
+ " 'university': {'segments': [{'type': 'link',\n",
692
+ " 'content': 'University of Alabama in Huntsville',\n",
693
+ " 'url': 'https://www.uah.edu/'}]},\n",
694
+ " 'location': {'segments': [{'type': 'text',\n",
695
+ " 'content': 'Huntsville, Alabama,USA'}]},\n",
696
+ " 'date_description': {'segments': [{'type': 'text',\n",
697
+ " 'content': '2024 - 2026'}]},\n",
698
+ " 'grade': {'segments': [{'type': 'text',\n",
699
+ " 'content': 'Current GPA: 4.0/4.0'}]},\n",
700
+ " 'courses': None},\n",
701
+ " {'degree': {'segments': [{'type': 'link',\n",
702
+ " 'content': 'B.E. Electronics & Communication;',\n",
703
+ " 'url': 'https://doece.pcampus.edu.np/index.php/bex-becie/'}]},\n",
704
+ " 'university': {'segments': [{'type': 'link',\n",
705
+ " 'content': 'Institute of Engineering, Pulchowk Campus, Tribhuvan University',\n",
706
+ " 'url': 'https://pcampus.edu.np/'}]},\n",
707
+ " 'location': {'segments': [{'type': 'text', 'content': 'Lalitpur, Nepal'}]},\n",
708
+ " 'date_description': {'segments': [{'type': 'text',\n",
709
+ " 'content': '2016 - 2021'}]},\n",
710
+ " 'grade': {'segments': [{'type': 'text',\n",
711
+ " 'content': 'Full Scholarship; Aggregate: 79.45%; Rank: '},\n",
712
+ " {'type': 'link',\n",
713
+ " 'content': '8th',\n",
714
+ " 'url': 'https://photos.app.goo.gl/C4QsvJgsfx9jguQn7'},\n",
715
+ " {'type': 'text', 'content': ' Position (Top 1% in University)'}]},\n",
716
+ " 'courses': None}],\n",
717
+ " 'skill_sections': [{'name': {'segments': [{'type': 'text',\n",
718
+ " 'content': 'Languages'}]},\n",
719
+ " 'skills': [{'segments': [{'type': 'text', 'content': 'Python'}]},\n",
720
+ " {'segments': [{'type': 'text', 'content': 'C++'}]},\n",
721
+ " {'segments': [{'type': 'text', 'content': 'C'}]},\n",
722
+ " {'segments': [{'type': 'text', 'content': 'C#'}]},\n",
723
+ " {'segments': [{'type': 'text', 'content': 'MATLAB'}]},\n",
724
+ " {'segments': [{'type': 'text', 'content': 'SQL'}]}]},\n",
725
+ " {'name': {'segments': [{'type': 'text', 'content': 'Machine Learning'}]},\n",
726
+ " 'skills': [{'segments': [{'type': 'text', 'content': 'Pytorch'}]},\n",
727
+ " {'segments': [{'type': 'text', 'content': 'Transformers'}]},\n",
728
+ " {'segments': [{'type': 'text', 'content': 'Scikit-Learn'}]},\n",
729
+ " {'segments': [{'type': 'text', 'content': 'W&B'}]},\n",
730
+ " {'segments': [{'type': 'text', 'content': 'Spacy'}]},\n",
731
+ " {'segments': [{'type': 'text', 'content': 'Keras'}]},\n",
732
+ " {'segments': [{'type': 'text', 'content': 'OpenCV'}]},\n",
733
+ " {'segments': [{'type': 'text', 'content': 'Imbalanced-Learn'}]},\n",
734
+ " {'segments': [{'type': 'text', 'content': 'Hyperopt'}]}]},\n",
735
+ " {'name': {'segments': [{'type': 'text',\n",
736
+ " 'content': 'Data Analysis Packages'}]},\n",
737
+ " 'skills': [{'segments': [{'type': 'text', 'content': 'Pandas'}]},\n",
738
+ " {'segments': [{'type': 'text', 'content': 'Dask'}]},\n",
739
+ " {'segments': [{'type': 'text', 'content': 'Numpy'}]},\n",
740
+ " {'segments': [{'type': 'text', 'content': 'Scipy'}]},\n",
741
+ " {'segments': [{'type': 'text', 'content': 'Matplotlib'}]},\n",
742
+ " {'segments': [{'type': 'text', 'content': 'Seaborn'}]},\n",
743
+ " {'segments': [{'type': 'text', 'content': 'Plotly'}]},\n",
744
+ " {'segments': [{'type': 'text', 'content': 'NetworkX'}]}]},\n",
745
+ " {'name': {'segments': [{'type': 'text', 'content': 'Big Data Framework'}]},\n",
746
+ " 'skills': [{'segments': [{'type': 'text', 'content': 'Pyspark'}]},\n",
747
+ " {'segments': [{'type': 'text', 'content': 'Hadoop'}]}]},\n",
748
+ " {'name': {'segments': [{'type': 'text', 'content': 'Frontend'}]},\n",
749
+ " 'skills': [{'segments': [{'type': 'text', 'content': 'HTML'}]},\n",
750
+ " {'segments': [{'type': 'text', 'content': 'CSS'}]},\n",
751
+ " {'segments': [{'type': 'text', 'content': 'Bootstrap'}]},\n",
752
+ " {'segments': [{'type': 'text', 'content': 'JavaScript'}]},\n",
753
+ " {'segments': [{'type': 'text', 'content': 'Angular'}]},\n",
754
+ " {'segments': [{'type': 'text', 'content': 'jQuery'}]}]},\n",
755
+ " {'name': {'segments': [{'type': 'text', 'content': 'Backend'}]},\n",
756
+ " 'skills': [{'segments': [{'type': 'text', 'content': 'FastAPI'}]},\n",
757
+ " {'segments': [{'type': 'text', 'content': 'Flask'}]},\n",
758
+ " {'segments': [{'type': 'text', 'content': 'Rest framework'}]}]},\n",
759
+ " {'name': {'segments': [{'type': 'text', 'content': 'Cloud Computing'}]},\n",
760
+ " 'skills': [{'segments': [{'type': 'text', 'content': 'Amazon EC2'}]},\n",
761
+ " {'segments': [{'type': 'text', 'content': 'EMR Hadoop'}]},\n",
762
+ " {'segments': [{'type': 'text', 'content': 'EMR Serverless'}]},\n",
763
+ " {'segments': [{'type': 'text', 'content': 'Redshift'}]},\n",
764
+ " {'segments': [{'type': 'text', 'content': 'S3'}]}]}],\n",
765
+ " 'projects': [{'name': {'segments': [{'type': 'link',\n",
766
+ " 'content': 'Funny Project',\n",
767
+ " 'url': 'https://github.com/AwaleSajil/FunnyProject'}]},\n",
768
+ " 'type': {'segments': [{'type': 'text', 'content': 'BigData Project'}]},\n",
769
+ " 'link': None,\n",
770
+ " 'resources': [],\n",
771
+ " 'date_description': {'segments': [{'type': 'text', 'content': '2024'}]},\n",
772
+ " 'description': [{'segments': [{'type': 'text',\n",
773
+ " 'content': 'Engineered a two-stage NLP pipeline to classify 570,000+ jokes by humor, offensiveness, and sentiment, achieving a 0.86 weighted F1-score by fine-tuning a BERT model on a 55k-sample dataset labeled by local LLMs (Mistral, Gemma3). Dockerized Inference Pipeline'}]}]},\n",
774
+ " {'name': {'segments': [{'type': 'link',\n",
775
+ " 'content': 'Image Auto Alignment',\n",
776
+ " 'url': 'https://docs.google.com/presentation/d/1n_Hv8l0_MGLNv62PnMt76vsGcCsRyS7TZuXABBb1jmc/edit?usp=sharing'}]},\n",
777
+ " 'type': {'segments': [{'type': 'text', 'content': 'Weekend Project'}]},\n",
778
+ " 'link': None,\n",
779
+ " 'resources': [],\n",
780
+ " 'date_description': {'segments': [{'type': 'text', 'content': '2024'}]},\n",
781
+ " 'description': [{'segments': [{'type': 'text',\n",
782
+ " 'content': 'Built two solutions to auto-correct rotated images: Rule-based Flask API for documents (e.g., invoices) using line detection and text-weight heuristics. ML-based model with MobileNetV2 for general images; framed as a regression task and achieved 2.6° MAE on self-supervised Flickr dataset.'}]}]},\n",
783
+ " {'name': {'segments': [{'type': 'link',\n",
784
+ " 'content': 'Real Time Visual Localisation and Mapping of Mobile Robot in Dynamic Environment',\n",
785
+ " 'url': 'https://docs.google.com/presentation/d/1wIIM4iFpwIpxakO2Fubz-7BHouQZl0Tq2z-sj45jA7Y/edit?usp=sharing'}]},\n",
786
+ " 'type': {'segments': [{'type': 'text',\n",
787
+ " 'content': 'College Major Project'}]},\n",
788
+ " 'link': None,\n",
789
+ " 'resources': [],\n",
790
+ " 'date_description': {'segments': [{'type': 'text',\n",
791
+ " 'content': '2019 - 2020'}]},\n",
792
+ " 'description': [{'segments': [{'type': 'text',\n",
793
+ " 'content': 'A mobile robot capable of real-time Visual SLAM (Simultaneous Localization And Mapping) in a dynamic environment by reconstructing the entire 3D scene from 2D images captured by its camera. To address dynamic element, visual landmarks in dynamic areas are masked using ICNet, a semantic segmentation model fine-tuned to identify humans, the most prevalent dynamic objects.'}]}]},\n",
794
+ " {'name': {'segments': [{'type': 'link',\n",
795
+ " 'content': 'Precision Livestock Farming — Improving Productivity of Broiler Chicken farm with technology',\n",
796
+ " 'url': 'https://docs.google.com/presentation/d/1fiFOfY1UPjH535EpBikLpd_-gw37Te88by8h5pYT5U/edit?usp=sharing'}]},\n",
797
+ " 'type': {'segments': [{'type': 'text', 'content': 'LOCUS 2019 Project'}]},\n",
798
+ " 'link': None,\n",
799
+ " 'resources': [],\n",
800
+ " 'date_description': {'segments': [{'type': 'text', 'content': '2019'}]},\n",
801
+ " 'description': [{'segments': [{'type': 'text',\n",
802
+ " 'content': 'Project designed to monitor broiler chickens, utilizing YOLO for chicken detection and SORT (Simple Online Real-time Tracker) for mobility tracking. Eating behavior was estimated using a feeder microphone, while maintaining optimal environmental conditions, including temperature and humidity.'}]}]},\n",
803
+ " {'name': {'segments': [{'type': 'link',\n",
804
+ " 'content': 'Vehicle Traffic Analysis and Management',\n",
805
+ " 'url': 'https://drive.google.com/file/d/1N8H8YIwp3FTOcbLwzW4fNXuKU48Yo1D2/view?usp=sharing'}]},\n",
806
+ " 'type': {'segments': [{'type': 'text',\n",
807
+ " 'content': 'College Minor Project'}]},\n",
808
+ " 'link': None,\n",
809
+ " 'resources': [],\n",
810
+ " 'date_description': {'segments': [{'type': 'text', 'content': '2019'}]},\n",
811
+ " 'description': [{'segments': [{'type': 'text',\n",
812
+ " 'content': 'Traffic flow at various road junctions was assessed by vehicle counting with the help of YOLO and SORT from diverse originating sources. The Webster algorithm was used to determine the optimal timing for traffic signals.'}]}]},\n",
813
+ " {'name': {'segments': [{'type': 'link',\n",
814
+ " 'content': 'Sajilomart',\n",
815
+ " 'url': 'https://drive.google.com/file/d/1aE20vZpEihrmu-ZgcHFz5tMHzga30_T-/view?usp=sharing'}]},\n",
816
+ " 'type': {'segments': [{'type': 'text', 'content': 'Everest Hackathon'}]},\n",
817
+ " 'link': None,\n",
818
+ " 'resources': [],\n",
819
+ " 'date_description': {'segments': [{'type': 'text', 'content': '2019'}]},\n",
820
+ " 'description': [{'segments': [{'type': 'text',\n",
821
+ " 'content': 'Designed a prototype for effortless shopping: a seamless ”grab and go” experience eliminating lines and checkouts, with automatic transaction handling.'}]}]},\n",
822
+ " {'name': {'segments': [{'type': 'link',\n",
823
+ " 'content': 'Blind Eye — Assistive Technology for Blind People',\n",
824
+ " 'url': 'https://photos.app.goo.gl/a4NrQTM9WwsezDLq7'}]},\n",
825
+ " 'type': {'segments': [{'type': 'text',\n",
826
+ " 'content': 'Assistive Technology Hackathon'}]},\n",
827
+ " 'link': None,\n",
828
+ " 'resources': [],\n",
829
+ " 'date_description': {'segments': [{'type': 'text', 'content': '2018'}]},\n",
830
+ " 'description': [{'segments': [{'type': 'text',\n",
831
+ " 'content': 'Designed a headset as a solution to enhance mobility for the visually impaired, aiding navigation and obstacle avoidance.'}]}]}],\n",
832
+ " 'certifications': [{'certificate_info': {'segments': [{'type': 'link',\n",
833
+ " 'content': 'Deep Learning Specialization by DeepLearning.AI on Coursera.',\n",
834
+ " 'url': 'https://www.coursera.org/account/accomplishments/specialization/CMV425VZYK92?utm_source=link&utm_medium=certificate&utm_content=cert_image&utm_campaign=sharing_cta&utm_product=s12n'}]},\n",
835
+ " 'date': None},\n",
836
+ " {'certificate_info': {'segments': [{'type': 'link',\n",
837
+ " 'content': 'Applied Deep Learning Capstone Project by ibm on edx.',\n",
838
+ " 'url': 'https://courses.edx.org/certificates/6154999d04c34c329bd68f3fcbd7e0a2'}]},\n",
839
+ " 'date': None},\n",
840
+ " {'certificate_info': {'segments': [{'type': 'link',\n",
841
+ " 'content': 'Specialized Models: Time Series and Survival Analysis on Coursera',\n",
842
+ " 'url': 'https://www.coursera.org/account/accomplishments/certificate/5U3ZQ9767CRW'}]},\n",
843
+ " 'date': None},\n",
844
+ " {'certificate_info': {'segments': [{'type': 'link',\n",
845
+ " 'content': 'Python Classes and Inheritance by University of Michigan on Coursera.',\n",
846
+ " 'url': 'https://www.coursera.org/account/accomplishments/verify/8KPF3UZYT7VC'}]},\n",
847
+ " 'date': None},\n",
848
+ " {'certificate_info': {'segments': [{'type': 'link',\n",
849
+ " 'content': 'Python (Basic) by Hackerrank',\n",
850
+ " 'url': 'https://www.hackerrank.com/certificates/d41a0ed647da'}]},\n",
851
+ " 'date': None}],\n",
852
+ " 'achievements': [{'name': {'segments': [{'type': 'link',\n",
853
+ " 'content': 'Fonepay Student Ambassador',\n",
854
+ " 'url': 'https://photos.app.goo.gl/3NpBXhK3KbEYw87j6'}]},\n",
855
+ " 'issued_by': {'segments': [{'type': 'link',\n",
856
+ " 'content': 'Fonepay',\n",
857
+ " 'url': 'https://fonepay.com/'}]},\n",
858
+ " 'date': {'segments': [{'type': 'text', 'content': '2020'}]},\n",
859
+ " 'description': [{'segments': [{'type': 'text',\n",
860
+ " 'content': 'Selected as one of the top 10 out of 100 competitive teams responsible for driving initiatives to promote and facilitate the growth of mobile payments.'}]}]},\n",
861
+ " {'name': {'segments': [{'type': 'link',\n",
862
+ " 'content': 'Best Thematic Hardware Project',\n",
863
+ " 'url': 'https://photos.app.goo.gl/KDFNt1KtSUU9xkkXA'}]},\n",
864
+ " 'issued_by': {'segments': [{'type': 'link',\n",
865
+ " 'content': 'LOCUS',\n",
866
+ " 'url': 'https://locus.pcampus.edu.np/'}]},\n",
867
+ " 'date': {'segments': [{'type': 'text', 'content': '2019'}]},\n",
868
+ " 'description': [{'segments': [{'type': 'text',\n",
869
+ " 'content': 'We were honored to receive the award for ’Precision Livestock Farming’ during the 16th edition of the National Technological Festival held by LOCUS, Pulchowk Campus'}]}]},\n",
870
+ " {'name': {'segments': [{'type': 'link',\n",
871
+ " 'content': 'Institute of Engineering Scholarship for BE',\n",
872
+ " 'url': 'https://media.edusanjal.com/redactor/Download%20TU%20IOE%20Entrance%20Examination%20Result.pdf'}]},\n",
873
+ " 'issued_by': {'segments': [{'type': 'link',\n",
874
+ " 'content': 'Tribhuvan University, IOE',\n",
875
+ " 'url': 'https://tu.edu.np/pages/institute-of-engineering-4'}]},\n",
876
+ " 'date': {'segments': [{'type': 'text', 'content': '2017'}]},\n",
877
+ " 'description': [{'segments': [{'type': 'text',\n",
878
+ " 'content': 'Received full scholarship to study engineering in the most reputed engineering college of Nepal for securing 58th rank in competitive entrance examination given by more than ten thousand students.'}]}]}],\n",
879
+ " 'research_works': [],\n",
880
+ " 'custom_sections': [{'section_name': {'segments': [{'type': 'text',\n",
881
+ " 'content': 'Exchange Program and Fellowship'}]},\n",
882
+ " 'section_detail': [{'title': {'segments': [{'type': 'link',\n",
883
+ " 'content': 'Sakura Science Exchange Program',\n",
884
+ " 'url': 'https://photos.app.goo.gl/P8gFatguLP5F1kmM9'}]},\n",
885
+ " 'subtitle': {'segments': [{'type': 'link',\n",
886
+ " 'content': 'Japan Science and Technology Agency',\n",
887
+ " 'url': 'https://www.jst.go.jp/EN/'}]},\n",
888
+ " 'date_description': {'segments': [{'type': 'text',\n",
889
+ " 'content': '16th - 23th Dec, 2019'}]},\n",
890
+ " 'description': [{'segments': [{'type': 'text',\n",
891
+ " 'content': 'Selected as one of the top 3 students for a program at Japan’s National Institute of Technology, Kisarazu. We presented our poster, visited industries, and exchanged ideas and solutions with international peers.'}]},\n",
892
+ " {'segments': [{'type': 'text',\n",
893
+ " 'content': 'Participated in sessions covering Japan’s cutting-edge technologies, including Artificial Intelligence and the Internet of Things (IoT).'}]}]},\n",
894
+ " {'title': {'segments': [{'type': 'link',\n",
895
+ " 'content': 'First Nepal Winter School in AI',\n",
896
+ " 'url': 'https://photos.app.goo.gl/kBatEMLzQqRJKU37'}]},\n",
897
+ " 'subtitle': {'segments': [{'type': 'link',\n",
898
+ " 'content': 'Nepal Applied Mathematics and Informatics Institute for Research (NAAMII)',\n",
899
+ " 'url': 'https://www.naamii.org.np/'}]},\n",
900
+ " 'date_description': {'segments': [{'type': 'text',\n",
901
+ " 'content': '20th - 30th Dec, 2018'}]},\n",
902
+ " 'description': [{'segments': [{'type': 'text',\n",
903
+ " 'content': 'Learnt about probability and statistics, linear algebra, AI ethics, and Deep Learning through esteemed professors and guest speakers.'}]},\n",
904
+ " {'segments': [{'type': 'text',\n",
905
+ " 'content': 'Finished hands-on lab assignments in computer vision and natural language processing (NLP).'}]}]}]},\n",
906
+ " {'section_name': {'segments': [{'type': 'text',\n",
907
+ " 'content': 'Volunteering and Teaching experience'}]},\n",
908
+ " 'section_detail': [{'title': {'segments': [{'type': 'text',\n",
909
+ " 'content': 'Training on ML Applications'}]},\n",
910
+ " 'subtitle': {'segments': [{'type': 'text',\n",
911
+ " 'content': 'Mentors Club, Cedargate'}]},\n",
912
+ " 'date_description': {'segments': [{'type': 'text', 'content': '2023'}]},\n",
913
+ " 'description': [{'segments': [{'type': 'text',\n",
914
+ " 'content': 'Conducted comprehensive session for all Cedargate employees in Nepal covering a variety of ML algorithms, providing insights into our operational procedures and discussed both ongoing and completed projects that are currently contributing to our production'}]}]},\n",
915
+ " {'title': {'segments': [{'type': 'link',\n",
916
+ " 'content': 'RoboPOP and Dronacharya Competitions',\n",
917
+ " 'url': 'https://youtube.com/playlist?list=PLPFSwgon02wfx5U6bA2TUqVNZUvXrFQBv&si=amDXuMVfZ4LstZgu'}]},\n",
918
+ " 'subtitle': {'segments': [{'type': 'link',\n",
919
+ " 'content': 'LOCUS',\n",
920
+ " 'url': 'https://www.facebook.com/locus.ioe'}]},\n",
921
+ " 'date_description': {'segments': [{'type': 'text', 'content': '2020'}]},\n",
922
+ " 'description': [{'segments': [{'type': 'text',\n",
923
+ " 'content': 'Voluntered to develop 3D animations that would explain the regulations introduced for RoboPOP, an exciting new robotic balloon-popping event incorporated into LOCUS 2020. Additionally, I extended my commitment to creating animations for Dronacharya, a cherished and popular drone racing competition'}]}]},\n",
924
+ " {'title': {'segments': [{'type': 'link',\n",
925
+ " 'content': 'Hardware Fellowship',\n",
926
+ " 'url': 'https://photos.app.goo.gl/pM3E4DLs12xjgP7FA'}]},\n",
927
+ " 'subtitle': {'segments': [{'type': 'link',\n",
928
+ " 'content': 'LOCUS',\n",
929
+ " 'url': 'https://www.facebook.com/locus.ioe/posts/hardware-fellowshiphope-you-guys-are-learninglocusnepal2019/825855697584931/'}]},\n",
930
+ " 'date_description': {'segments': [{'type': 'text', 'content': '2020'}]},\n",
931
+ " 'description': [{'segments': [{'type': 'text',\n",
932
+ " 'content': 'Instructed nearly 100 students, ranging from freshmen to sophomores, in the domains of Arduino programming and electronic hardware design'}]},\n",
933
+ " {'segments': [{'type': 'text',\n",
934
+ " 'content': 'Mentored a team of junior-year students as they embarked on their project, creating a 2D CNC plotter designed for writing and drawing which was showcased at the 17th Technological Festival.'}]}]}]},\n",
935
+ " {'section_name': {'segments': [{'type': 'text', 'content': 'References'}]},\n",
936
+ " 'section_detail': [{'title': {'segments': [{'type': 'text',\n",
937
+ " 'content': 'Tathagata Mukharjee, Professor at University of Alabama in Huntsville'}]},\n",
938
+ " 'subtitle': {'segments': [{'type': 'text', 'content': 'tm0130@uh.edu'}]},\n",
939
+ " 'date_description': None,\n",
940
+ " 'description': None},\n",
941
+ " {'title': {'segments': [{'type': 'text',\n",
942
+ " 'content': 'Stacey Finn, Director of Data Science and Analytics at CedarGate'}]},\n",
943
+ " 'subtitle': {'segments': [{'type': 'text',\n",
944
+ " 'content': 'safinn5@gmail.com'}]},\n",
945
+ " 'date_description': None,\n",
946
+ " 'description': None}]}],\n",
947
+ " 'keywords': []}"
948
+ ]
949
+ },
950
+ "execution_count": 12,
951
+ "metadata": {},
952
+ "output_type": "execute_result"
953
+ }
954
+ ],
955
+ "source": [
956
+ "pp.resume_info.model_dump()"
957
+ ]
958
+ },
959
+ {
960
+ "cell_type": "code",
961
+ "execution_count": 13,
962
+ "metadata": {},
963
+ "outputs": [
964
+ {
965
+ "data": {
966
+ "text/plain": [
967
+ "JobInfo(job_title='Algorithm Engineer Intern (Content Safety)', job_purpose='Develop state-of-the-art computer vision, NLP, and multimodality models and algorithms to protect the platform and users from content and behaviors that violate community guidelines, ultimately enhancing user experience and bringing joy to users worldwide. The role involves participating in the development of cutting-edge content understanding models and optimizing distributed model training frameworks.', keywords=['computer vision', 'NLP', 'multimodality models', 'algorithms', 'content understanding', 'distributed model training', 'multimodal large models', 'few-shot learning', 'zero-shot learning', 'content safety', 'moderation models', 'reinforcement learning', 'data mining', 'Chain-of-Thought (CoT) annotation frameworks', 'risk ranking', 'recall systems', 'algorithm development', 'data processing', 'modeling', 'evaluation', 'PyTorch', 'TensorFlow', 'machine learning', 'deep learning'], job_duties_and_responsibilities=['Leverage multimodal large models to explore few-shot and zero-shot strategies for content safety scenarios, and build moderation models with strong generalization capabilities.', 'Participate in reinforcement learning–based data mining, and help design Chain-of-Thought (CoT) annotation frameworks to improve the model’s understanding of complex risks.', 'Build risk ranking and recall systems to enhance coverage and accuracy in identifying high-risk content.', 'Collaborate with product and policy teams to drive real-world deployment and performance optimization of moderation algorithms.'], required_qualifications=['Currently pursuing a Master degree with a background in computer science, machine learning, or similar fields.', 'Solid foundation in machine learning and deep learning; familiarity with multimodal modeling.', 'Interest in content safety, with an understanding of the challenges in risk identification within moderation workflows.', 'Proficient in end-to-end algorithm development, including data processing, modeling, and evaluation; experience with PyTorch or TensorFlow is required.'], preferred_qualifications=['Hands-on experience with large model projects.', 'Strong learning ability.', 'Clear communication.', 'Strong sense of teamwork and responsibility.'], company_name='TikTok', company_details=\"TikTok is the leading destination for short-form mobile video, with a mission to inspire creativity and bring joy. Headquartered in Los Angeles and Singapore, with offices globally, TikTok fosters a culture of creativity, authentic self-expression, discovery, and connection. The company values curiosity, humility, and a desire to make an impact, encouraging continuous learning and an 'Always Day 1' mindset. TikTok is committed to creating an inclusive workplace that reflects its diverse global user base and provides a platform for employees to grow professionally and personally.\")"
968
+ ]
969
+ },
970
+ "execution_count": 13,
971
+ "metadata": {},
972
+ "output_type": "execute_result"
973
+ }
974
+ ],
975
+ "source": [
976
+ "pp.job_info"
977
+ ]
978
+ },
979
+ {
980
+ "cell_type": "code",
981
+ "execution_count": 14,
982
+ "metadata": {},
983
+ "outputs": [
984
+ {
985
+ "data": {
986
+ "text/plain": [
987
+ "dict_keys(['personal_info', 'summary', 'work_experience', 'education', 'skill_sections', 'projects', 'certifications', 'achievements', 'research_works', 'custom_sections', 'keywords'])"
988
+ ]
989
+ },
990
+ "execution_count": 14,
991
+ "metadata": {},
992
+ "output_type": "execute_result"
993
+ }
994
+ ],
995
+ "source": [
996
+ "pp.resume_info.model_dump().keys()"
997
+ ]
998
+ },
999
+ {
1000
+ "cell_type": "code",
1001
+ "execution_count": 15,
1002
+ "metadata": {},
1003
+ "outputs": [
1004
+ {
1005
+ "name": "stdout",
1006
+ "output_type": "stream",
1007
+ "text": [
1008
+ "Exchange Program and Fellowship\n",
1009
+ "Volunteering and Teaching experience\n",
1010
+ "References\n"
1011
+ ]
1012
+ }
1013
+ ],
1014
+ "source": [
1015
+ "# loop through custom sections\n",
1016
+ "for section in getattr(pp.resume_info, \"custom_sections\"):\n",
1017
+ " temp = section.section_name\n",
1018
+ " print(temp.plain_text)\n"
1019
+ ]
1020
+ },
1021
+ {
1022
+ "cell_type": "code",
1023
+ "execution_count": 16,
1024
+ "metadata": {},
1025
+ "outputs": [
1026
+ {
1027
+ "data": {
1028
+ "text/plain": [
1029
+ "{'section_name': {'segments': [{'type': 'text', 'content': 'References'}]},\n",
1030
+ " 'section_detail': [{'title': {'segments': [{'type': 'text',\n",
1031
+ " 'content': 'Tathagata Mukharjee, Professor at University of Alabama in Huntsville'}]},\n",
1032
+ " 'subtitle': {'segments': [{'type': 'text', 'content': 'tm0130@uh.edu'}]},\n",
1033
+ " 'date_description': None,\n",
1034
+ " 'description': None},\n",
1035
+ " {'title': {'segments': [{'type': 'text',\n",
1036
+ " 'content': 'Stacey Finn, Director of Data Science and Analytics at CedarGate'}]},\n",
1037
+ " 'subtitle': {'segments': [{'type': 'text',\n",
1038
+ " 'content': 'safinn5@gmail.com'}]},\n",
1039
+ " 'date_description': None,\n",
1040
+ " 'description': None}]}"
1041
+ ]
1042
+ },
1043
+ "execution_count": 16,
1044
+ "metadata": {},
1045
+ "output_type": "execute_result"
1046
+ }
1047
+ ],
1048
+ "source": [
1049
+ "pp.resume_info.custom_sections[2].model_dump()"
1050
+ ]
1051
+ },
1052
+ {
1053
+ "cell_type": "code",
1054
+ "execution_count": 17,
1055
+ "metadata": {},
1056
+ "outputs": [
1057
+ {
1058
+ "data": {
1059
+ "text/plain": [
1060
+ "[GenericSection(section_name=RichText(segments=[TextSegment(type='text', content='Exchange Program and Fellowship')]), section_detail=[GenericElement(title=RichText(segments=[LinkSegment(type='link', content='Sakura Science Exchange Program', url='https://photos.app.goo.gl/P8gFatguLP5F1kmM9')]), subtitle=RichText(segments=[LinkSegment(type='link', content='Japan Science and Technology Agency', url='https://www.jst.go.jp/EN/')]), date_description=RichText(segments=[TextSegment(type='text', content='16th - 23th Dec, 2019')]), description=[RichText(segments=[TextSegment(type='text', content='Selected as one of the top 3 students for a program at Japan’s National Institute of Technology, Kisarazu. We presented our poster, visited industries, and exchanged ideas and solutions with international peers.')]), RichText(segments=[TextSegment(type='text', content='Participated in sessions covering Japan’s cutting-edge technologies, including Artificial Intelligence and the Internet of Things (IoT).')])]), GenericElement(title=RichText(segments=[LinkSegment(type='link', content='First Nepal Winter School in AI', url='https://photos.app.goo.gl/kBatEMLzQqRJKU37')]), subtitle=RichText(segments=[LinkSegment(type='link', content='Nepal Applied Mathematics and Informatics Institute for Research (NAAMII)', url='https://www.naamii.org.np/')]), date_description=RichText(segments=[TextSegment(type='text', content='20th - 30th Dec, 2018')]), description=[RichText(segments=[TextSegment(type='text', content='Learnt about probability and statistics, linear algebra, AI ethics, and Deep Learning through esteemed professors and guest speakers.')]), RichText(segments=[TextSegment(type='text', content='Finished hands-on lab assignments in computer vision and natural language processing (NLP).')])])]),\n",
1061
+ " GenericSection(section_name=RichText(segments=[TextSegment(type='text', content='Volunteering and Teaching experience')]), section_detail=[GenericElement(title=RichText(segments=[TextSegment(type='text', content='Training on ML Applications')]), subtitle=RichText(segments=[TextSegment(type='text', content='Mentors Club, Cedargate')]), date_description=RichText(segments=[TextSegment(type='text', content='2023')]), description=[RichText(segments=[TextSegment(type='text', content='Conducted comprehensive session for all Cedargate employees in Nepal covering a variety of ML algorithms, providing insights into our operational procedures and discussed both ongoing and completed projects that are currently contributing to our production')])]), GenericElement(title=RichText(segments=[LinkSegment(type='link', content='RoboPOP and Dronacharya Competitions', url='https://youtube.com/playlist?list=PLPFSwgon02wfx5U6bA2TUqVNZUvXrFQBv&si=amDXuMVfZ4LstZgu')]), subtitle=RichText(segments=[LinkSegment(type='link', content='LOCUS', url='https://www.facebook.com/locus.ioe')]), date_description=RichText(segments=[TextSegment(type='text', content='2020')]), description=[RichText(segments=[TextSegment(type='text', content='Voluntered to develop 3D animations that would explain the regulations introduced for RoboPOP, an exciting new robotic balloon-popping event incorporated into LOCUS 2020. Additionally, I extended my commitment to creating animations for Dronacharya, a cherished and popular drone racing competition')])]), GenericElement(title=RichText(segments=[LinkSegment(type='link', content='Hardware Fellowship', url='https://photos.app.goo.gl/pM3E4DLs12xjgP7FA')]), subtitle=RichText(segments=[LinkSegment(type='link', content='LOCUS', url='https://www.facebook.com/locus.ioe/posts/hardware-fellowshiphope-you-guys-are-learninglocusnepal2019/825855697584931/')]), date_description=RichText(segments=[TextSegment(type='text', content='2020')]), description=[RichText(segments=[TextSegment(type='text', content='Instructed nearly 100 students, ranging from freshmen to sophomores, in the domains of Arduino programming and electronic hardware design')]), RichText(segments=[TextSegment(type='text', content='Mentored a team of junior-year students as they embarked on their project, creating a 2D CNC plotter designed for writing and drawing which was showcased at the 17th Technological Festival.')])])]),\n",
1062
+ " GenericSection(section_name=RichText(segments=[TextSegment(type='text', content='References')]), section_detail=[GenericElement(title=RichText(segments=[TextSegment(type='text', content='Tathagata Mukharjee, Professor at University of Alabama in Huntsville')]), subtitle=RichText(segments=[TextSegment(type='text', content='tm0130@uh.edu')]), date_description=None, description=None), GenericElement(title=RichText(segments=[TextSegment(type='text', content='Stacey Finn, Director of Data Science and Analytics at CedarGate')]), subtitle=RichText(segments=[TextSegment(type='text', content='safinn5@gmail.com')]), date_description=None, description=None)])]"
1063
+ ]
1064
+ },
1065
+ "execution_count": 17,
1066
+ "metadata": {},
1067
+ "output_type": "execute_result"
1068
+ }
1069
+ ],
1070
+ "source": [
1071
+ "pp.resume_info.custom_sections"
1072
+ ]
1073
+ },
1074
+ {
1075
+ "cell_type": "code",
1076
+ "execution_count": 18,
1077
+ "metadata": {},
1078
+ "outputs": [
1079
+ {
1080
+ "name": "stdout",
1081
+ "output_type": "stream",
1082
+ "text": [
1083
+ "<class 'list'>\n",
1084
+ "<class 'list'>\n",
1085
+ "<class 'list'>\n"
1086
+ ]
1087
+ }
1088
+ ],
1089
+ "source": [
1090
+ "# convert the custom section to structure like other noraml section\n",
1091
+ "custom_output = {}\n",
1092
+ "\n",
1093
+ "\n",
1094
+ "# loop trhough custom section\n",
1095
+ "for csection in pp.resume_info.custom_sections:\n",
1096
+ " # setting the key\n",
1097
+ " key_name = csection.section_name.plain_text\n",
1098
+ " custom_output[key_name] = csection.model_dump()[\"section_detail\"]\n",
1099
+ " print(type(custom_output[key_name]))\n",
1100
+ "\n",
1101
+ "\n",
1102
+ "# custom_output"
1103
+ ]
1104
+ },
1105
+ {
1106
+ "cell_type": "code",
1107
+ "execution_count": 19,
1108
+ "metadata": {},
1109
+ "outputs": [
1110
+ {
1111
+ "data": {
1112
+ "text/plain": [
1113
+ "str"
1114
+ ]
1115
+ },
1116
+ "execution_count": 19,
1117
+ "metadata": {},
1118
+ "output_type": "execute_result"
1119
+ }
1120
+ ],
1121
+ "source": [
1122
+ "type(pp.resume_info.model_dump_json(include={\"summary\"}))"
1123
+ ]
1124
+ },
1125
+ {
1126
+ "cell_type": "code",
1127
+ "execution_count": 20,
1128
+ "metadata": {},
1129
+ "outputs": [
1130
+ {
1131
+ "data": {
1132
+ "text/plain": [
1133
+ "'{\"work_experience\":[{\"role\":{\"segments\":[{\"type\":\"text\",\"content\":\"Graduate Research Assistant for LLM team\"}]},\"company\":{\"segments\":[{\"type\":\"link\",\"content\":\"NASA-IMPACT @ UAH\",\"url\":\"https://www.earthdata.nasa.gov/about/impact\"}]},\"location\":{\"segments\":[]},\"date_description\":{\"segments\":[{\"type\":\"text\",\"content\":\"August 2024 - Present\"}]},\"description\":[{\"segments\":[{\"type\":\"text\",\"content\":\"Science Keyword Recommender: Built an extreme multi-label classifier for NASA CMR, scaling from 430 to 3,240 science keywords. Used Focal Loss and custom stratified sampling to improve F1 to 0.55, enhancing metadata accuracy and dataset discoverability.\"}]},{\"segments\":[{\"type\":\"text\",\"content\":\"Pre-training Science Embedding Model (Indus-SDE): Pretrained a RoBERTa-based model on 520K NASA documents with extended 1024-token input and Weighted Keyword Based Dynamic Masking. Achieved 78.1% top-1 MLM accuracy, outperforming baselines on keyword tagging, astrophysics, and EJ tasks.\"}]},{\"segments\":[{\"type\":\"text\",\"content\":\"Downstream Task Unification Framework: Developed a modular multi-task fine-tuning pipeline using Hugging Face and W&B. Enabled plug-and-play config-based training/evaluation with automatic Excel reporting, streamlining model comparison and boosting team productivity.\"}]},{\"segments\":[{\"type\":\"text\",\"content\":\"Training Sentence Transformer (Indus-SDE-ST): Implemented DDP multi-GPU multi-stage training of a scientific sentence transformer using text/code pairs. Early results show superior performance on science-domain information retrieval benchmarks, pushing forward scientific search and discovery tools.\"}]},{\"segments\":[{\"type\":\"text\",\"content\":\"Agentic AI Evaluation & Benchmarking – Conducted comparative evaluations of NASA’s Deep Literature Search Agent against Gemini and OpenAI systems using LLM-as-judge metrics (contextual precision, recall, relevance, faithfulness). Explored agent reliability, metric stability, and variance reduction strategies to improve reproducibility and trust in autonomous scientific research agents.\"}]}]},{\"role\":{\"segments\":[{\"type\":\"text\",\"content\":\"Machine Learning Engineer\"}]},\"company\":{\"segments\":[{\"type\":\"link\",\"content\":\"Cedar Gate Technologies\",\"url\":\"https://www.cedargate.com/\"}]},\"location\":{\"segments\":[]},\"date_description\":{\"segments\":[{\"type\":\"text\",\"content\":\"July 2022 - July 2024\"}]},\"description\":[{\"segments\":[{\"type\":\"text\",\"content\":\"Automated ETL field mapping by fine-tuning DistilBERT for multilabel classification to suggest source-to-destination field mappings and achieved 0.95 recall and 0.7 IoU. Initiated full ETL automation by fine-tuning Mistral-7B to autogenerate internal data transformation scripts.\"}]},{\"segments\":[{\"type\":\"text\",\"content\":\"Analyzed local model explainability tools (permutation SHAP, Deep Explainer, LIME), identifying FastSHAP as the optimal solution for a production diabetes model with extensive features based on speed and performance (87.2% Inclusion AUC).\"}]},{\"segments\":[{\"type\":\"text\",\"content\":\"Performed network analysis on healthcare providers to correlate patient-sharing patterns among physicians with medical costs for patients with chronic conditions like Chronic Heart Failure and Diabetes, revealing key cost drivers.\"}]},{\"segments\":[{\"type\":\"text\",\"content\":\"Optimized the segmentation of frequent ER visitors by systematically evaluating various scaling, feature extraction, and clustering methods. Utilized logistic regression coefficients for rapid cluster discrimination, ultimately identifying K-Means (6 clusters) with an auto-encoder as the most effective model, based on cluster metrics for overlap, quality, and cardinality.\"}]},{\"segments\":[{\"type\":\"text\",\"content\":\"Developed a LightGBM model to predict healthcare cost-risk (MARA scores), and the likelihood of it increasing or decreasing; achieving an R2 of 0.74 and MCC of 0.45, enabling proactive care management.\"}]},{\"segments\":[{\"type\":\"text\",\"content\":\"Built a Gradient Boosted model to predict patient compliance with preventive care visits next year, achieving an MCC score over 0.75 to support targeted outreach initiatives.\"}]}]},{\"role\":{\"segments\":[{\"type\":\"link\",\"content\":\"Machine Learning Engineer\",\"url\":\"https://photos.app.goo.gl/vVbF4bHvcjuexqnL6\"}]},\"company\":{\"segments\":[{\"type\":\"link\",\"content\":\"Docsumo\",\"url\":\"https://www.docsumo.com/\"}]},\"location\":{\"segments\":[]},\"date_description\":{\"segments\":[{\"type\":\"text\",\"content\":\"March 2022 - June 2022\"}]},\"description\":[{\"segments\":[{\"type\":\"text\",\"content\":\"Benchmarked spaCy v2 vs. v3 Named Entity Recognition (NER) pipelines for information extraction from OCR-scanned documents based on their performance, speed, and size, providing key data for a strategic upgrade decision.\"}]},{\"segments\":[{\"type\":\"text\",\"content\":\"Evaluated multiple document reading order detection techniques (e.g., DBSCAN, recursive XY-cut, layout reader, line-based block separation, docstrum) to enhance NER performance on complex layouts, measuring success with ROUGE-L and BLEU scores.\"}]}]},{\"role\":{\"segments\":[{\"type\":\"link\",\"content\":\"Associate Data Engineer\",\"url\":\"https://photos.app.goo.gl/zjsUJiMr6ZmVhfqz9\"}]},\"company\":{\"segments\":[{\"type\":\"link\",\"content\":\"Deerwalk\",\"url\":\"https://www.cedargate.com/\"}]},\"location\":{\"segments\":[]},\"date_description\":{\"segments\":[{\"type\":\"text\",\"content\":\"May 2021 - Feb 2022\"}]},\"description\":[{\"segments\":[{\"type\":\"text\",\"content\":\"Onboard new vendors (ETL processes on US healthcare data), ensure data integrity, analyse bugs which was triggered during data processing or client request and promptly resolve critical production issues\"}]}]}]}'"
1134
+ ]
1135
+ },
1136
+ "execution_count": 20,
1137
+ "metadata": {},
1138
+ "output_type": "execute_result"
1139
+ }
1140
+ ],
1141
+ "source": [
1142
+ "pp.resume_info.model_dump_json(include={\"work_experience\"})"
1143
+ ]
1144
+ },
1145
+ {
1146
+ "cell_type": "code",
1147
+ "execution_count": 21,
1148
+ "metadata": {},
1149
+ "outputs": [
1150
+ {
1151
+ "data": {
1152
+ "text/plain": [
1153
+ "'{\"skill_sections\":[{\"name\":{\"segments\":[{\"type\":\"text\",\"content\":\"Languages\"}]},\"skills\":[{\"segments\":[{\"type\":\"text\",\"content\":\"Python\"}]},{\"segments\":[{\"type\":\"text\",\"content\":\"C++\"}]},{\"segments\":[{\"type\":\"text\",\"content\":\"C\"}]},{\"segments\":[{\"type\":\"text\",\"content\":\"C#\"}]},{\"segments\":[{\"type\":\"text\",\"content\":\"MATLAB\"}]},{\"segments\":[{\"type\":\"text\",\"content\":\"SQL\"}]}]},{\"name\":{\"segments\":[{\"type\":\"text\",\"content\":\"Machine Learning\"}]},\"skills\":[{\"segments\":[{\"type\":\"text\",\"content\":\"Pytorch\"}]},{\"segments\":[{\"type\":\"text\",\"content\":\"Transformers\"}]},{\"segments\":[{\"type\":\"text\",\"content\":\"Scikit-Learn\"}]},{\"segments\":[{\"type\":\"text\",\"content\":\"W&B\"}]},{\"segments\":[{\"type\":\"text\",\"content\":\"Spacy\"}]},{\"segments\":[{\"type\":\"text\",\"content\":\"Keras\"}]},{\"segments\":[{\"type\":\"text\",\"content\":\"OpenCV\"}]},{\"segments\":[{\"type\":\"text\",\"content\":\"Imbalanced-Learn\"}]},{\"segments\":[{\"type\":\"text\",\"content\":\"Hyperopt\"}]}]},{\"name\":{\"segments\":[{\"type\":\"text\",\"content\":\"Data Analysis Packages\"}]},\"skills\":[{\"segments\":[{\"type\":\"text\",\"content\":\"Pandas\"}]},{\"segments\":[{\"type\":\"text\",\"content\":\"Dask\"}]},{\"segments\":[{\"type\":\"text\",\"content\":\"Numpy\"}]},{\"segments\":[{\"type\":\"text\",\"content\":\"Scipy\"}]},{\"segments\":[{\"type\":\"text\",\"content\":\"Matplotlib\"}]},{\"segments\":[{\"type\":\"text\",\"content\":\"Seaborn\"}]},{\"segments\":[{\"type\":\"text\",\"content\":\"Plotly\"}]},{\"segments\":[{\"type\":\"text\",\"content\":\"NetworkX\"}]}]},{\"name\":{\"segments\":[{\"type\":\"text\",\"content\":\"Big Data Framework\"}]},\"skills\":[{\"segments\":[{\"type\":\"text\",\"content\":\"Pyspark\"}]},{\"segments\":[{\"type\":\"text\",\"content\":\"Hadoop\"}]}]},{\"name\":{\"segments\":[{\"type\":\"text\",\"content\":\"Frontend\"}]},\"skills\":[{\"segments\":[{\"type\":\"text\",\"content\":\"HTML\"}]},{\"segments\":[{\"type\":\"text\",\"content\":\"CSS\"}]},{\"segments\":[{\"type\":\"text\",\"content\":\"Bootstrap\"}]},{\"segments\":[{\"type\":\"text\",\"content\":\"JavaScript\"}]},{\"segments\":[{\"type\":\"text\",\"content\":\"Angular\"}]},{\"segments\":[{\"type\":\"text\",\"content\":\"jQuery\"}]}]},{\"name\":{\"segments\":[{\"type\":\"text\",\"content\":\"Backend\"}]},\"skills\":[{\"segments\":[{\"type\":\"text\",\"content\":\"FastAPI\"}]},{\"segments\":[{\"type\":\"text\",\"content\":\"Flask\"}]},{\"segments\":[{\"type\":\"text\",\"content\":\"Rest framework\"}]}]},{\"name\":{\"segments\":[{\"type\":\"text\",\"content\":\"Cloud Computing\"}]},\"skills\":[{\"segments\":[{\"type\":\"text\",\"content\":\"Amazon EC2\"}]},{\"segments\":[{\"type\":\"text\",\"content\":\"EMR Hadoop\"}]},{\"segments\":[{\"type\":\"text\",\"content\":\"EMR Serverless\"}]},{\"segments\":[{\"type\":\"text\",\"content\":\"Redshift\"}]},{\"segments\":[{\"type\":\"text\",\"content\":\"S3\"}]}]}]}'"
1154
+ ]
1155
+ },
1156
+ "execution_count": 21,
1157
+ "metadata": {},
1158
+ "output_type": "execute_result"
1159
+ }
1160
+ ],
1161
+ "source": [
1162
+ "pp.resume_info.model_dump_json(include={\"skill_sections\"})"
1163
+ ]
1164
+ },
1165
+ {
1166
+ "cell_type": "code",
1167
+ "execution_count": null,
1168
+ "metadata": {},
1169
+ "outputs": [],
1170
+ "source": []
1171
+ },
1172
+ {
1173
+ "cell_type": "code",
1174
+ "execution_count": null,
1175
+ "metadata": {},
1176
+ "outputs": [],
1177
+ "source": []
1178
+ }
1179
+ ],
1180
+ "metadata": {
1181
+ "kernelspec": {
1182
+ "display_name": "resumer (3.12.7)",
1183
+ "language": "python",
1184
+ "name": "python3"
1185
+ },
1186
+ "language_info": {
1187
+ "codemirror_mode": {
1188
+ "name": "ipython",
1189
+ "version": 3
1190
+ },
1191
+ "file_extension": ".py",
1192
+ "mimetype": "text/x-python",
1193
+ "name": "python",
1194
+ "nbconvert_exporter": "python",
1195
+ "pygments_lexer": "ipython3",
1196
+ "version": "3.12.7"
1197
+ }
1198
+ },
1199
+ "nbformat": 4,
1200
+ "nbformat_minor": 2
1201
+ }
notebooks/4_test_scraper.ipynb ADDED
@@ -0,0 +1,225 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "code",
5
+ "execution_count": 1,
6
+ "metadata": {},
7
+ "outputs": [],
8
+ "source": [
9
+ "%reload_ext autoreload\n",
10
+ "%autoreload 2"
11
+ ]
12
+ },
13
+ {
14
+ "cell_type": "code",
15
+ "execution_count": 2,
16
+ "metadata": {},
17
+ "outputs": [],
18
+ "source": [
19
+ "import requests\n",
20
+ "from bs4 import BeautifulSoup\n",
21
+ "\n",
22
+ "def scrape_text_from_url(url):\n",
23
+ " try:\n",
24
+ " # 1. Send a GET request to the URL\n",
25
+ " headers = {'User-Agent': 'Mozilla/5.0'} # Helps avoid being blocked\n",
26
+ " response = requests.get(url, headers=headers, timeout=10)\n",
27
+ " \n",
28
+ " # Check if the request was successful\n",
29
+ " response.raise_for_status()\n",
30
+ " \n",
31
+ " # 2. Parse the HTML content\n",
32
+ " soup = BeautifulSoup(response.text, 'html.parser')\n",
33
+ " \n",
34
+ " # 3. Remove script and style elements (unwanted text)\n",
35
+ " for script_or_style in soup([\"script\", \"style\"]):\n",
36
+ " script_or_style.decompose()\n",
37
+ "\n",
38
+ " # 4. Get text and clean up whitespace\n",
39
+ " text = soup.get_text(separator=' ')\n",
40
+ " \n",
41
+ " # Break into lines and remove leading/trailing whitespace\n",
42
+ " lines = (line.strip() for line in text.splitlines())\n",
43
+ " # Break multi-headlines into a line each and remove blank lines\n",
44
+ " chunks = (phrase.strip() for line in lines for phrase in line.split(\" \"))\n",
45
+ " clean_text = '\\n'.join(chunk for chunk in chunks if chunk)\n",
46
+ "\n",
47
+ " return clean_text\n",
48
+ "\n",
49
+ " except requests.exceptions.RequestException as e:\n",
50
+ " return f\"An error occurred: {e}\"\n",
51
+ "\n",
52
+ "# Example Usage:\n",
53
+ "# print(scrape_text_from_url(\"https://example.com\"))"
54
+ ]
55
+ },
56
+ {
57
+ "cell_type": "code",
58
+ "execution_count": 15,
59
+ "metadata": {},
60
+ "outputs": [
61
+ {
62
+ "data": {
63
+ "text/plain": [
64
+ "'Senior Software Engineer, AI Software Tools | Qualcomm Careers | Engineering Jobs and More | Qualcomm\\nFAQs\\nBenefits\\nSearch for Jobs\\n{\"themeOptions\": {\"customTheme\": {\"varTheme\": {\"font-family\": \"Source Sans Pro\", \"navbar-text-hover-color\": \"#4a5a75\", \"anchor-color\": \"#000\", \"pcsx-theme-linear-gradient-start\": \"#3253dc\", \"pcsx-theme-linear-gradient-end\": \"var(--primary-color-50)\", \"pcsx-secondary-background-color\": \"var(--primary-color-10)\", \"primary-color\": \"#3253dc\", \"primary-color-100\": \"#3253DC\", \"primary-color-90\": \"#4664E0\", \"primary-color-80\": \"#5B75E3\", \"primary-color-70\": \"#7087E6\", \"primary-color-60\": \"#8498EA\", \"primary-color-50\": \"#98A9EE\", \"primary-color-40\": \"#ADBAF1\", \"primary-color-30\": \"#C2CBF4\", \"primary-color-20\": \"#D6DDF8\", \"primary-color-10\": \"#EAEEFC\", \"border-radius-xl\": \"4px\", \"pcsx-hero-image-height\": \"315px\", \"accent-color\": \"var(--primary-color)\", \"accent-color-10\": \"var(--primary-color-10)\", \"accent-color-20\": \"var(--primary-color-20)\", \"accent-color-30\": \"var(--primary-color-30)\", \"accent-color-40\": \"var(--primary-color-40)\", \"accent-color-50\": \"var(--primary-color-50)\", \"accent-color-60\": \"var(--primary-color-60)\", \"accent-color-70\": \"var(--primary-color-70)\", \"accent-color-80\": \"var(--primary-color-80)\", \"accent-color-90\": \"var(--primary-color-90)\", \"accent-color-100\": \"var(--primary-color-100)\", \"button-default-text-color\": \"var(--primary-color)\", \"button-default-background-color\": \"#ffffff\", \"button-default-border-color\": \"var(--primary-color)\", \"button-default-hover-text-color\": \"var(--primary-color)\", \"button-default-hover-background-color\": \"#ffffff\", \"button-default-hover-border-color\": \"var(--primary-color)\", \"button-default-active-text-color\": \"var(--primary-color)\", \"button-default-active-background-color\": \"#ffffff\", \"button-default-active-border-color\": \"var(--primary-color)\", \"button-primary-text-color\": \"#ffffff\", \"button-primary-background-color\": \"var(--primary-color)\", \"button-primary-border-color\": \"var(--primary-color)\", \"button-primary-hover-text-color\": \"#ffffff\", \"button-primary-hover-background-color\": \"var(--primary-color)\", \"button-primary-hover-border-color\": \"var(--primary-color)\", \"button-secondary-text-color\": \"var(--primary-color)\", \"button-secondary-border-color\": \"var(--primary-color)\", \"button-secondary-background-color\": \"#ffffff\", \"button-secondary-hover-background-color\": \"#ffffff\", \"button-secondary-hover-border-color\": \"var(--primary-color)\", \"tab-pill-active-background\": \"var(--primary-color)\", \"tab-pill-active-label\": \"var(--text-inverse-color)\", \"perks-and-benefits-icon-color\": \"var(--primary-color)\", \"pcsx-jobcard-flag-text-color\": \"#69717f\", \"pcsx-jobcard-title-text-color\": \"#000\", \"pcsx-main-margin-top\": \"40px\", \"text-secondary-color\": \"#364759\", \"navbar-background\": \"#ffffff\", \"navbar-nav-active-background-color\": \"#ffffff\"}}}, \"domain\": \"qualcomm.com\", \"configPath\": \"PCS>\", \"updatePath\": \"PCS>\"}\\n{\"domain\": \"qualcomm.com\", \"user\": \"Import qualcomm.com\", \"isWillingToRelocate\": false, \"isUserAuthenticated\": false, \"isUserETXCandidate\": false, \"isDomainETX\": false, \"isCareerPlannerEnabled\": false, \"isMyApplicationsEnabled\": false, \"showVeteranEmployerSignUp\": false, \"candidate\": {\"enc_id\": 0, \"fullname\": \"\", \"firstname\": \"\", \"lastname\": \"\", \"skills\": [], \"email\": \"\", \"phone\": \"\", \"location\": \"\", \"filename\": null, \"starred_positions\": [], \"resumeUrl\": \"\", \"onboardingCompleted\": false, \"isUserInPcsIjp\": false, \"linkedinUrl\": \"\"}, \"branding\": {\"enableTalentNetwork\": 1, \"redesignedNuxConfig\": {\"autoOpen\": true, \"enabled\": true, \"backgroundImage\": \"https://static.vscdn.net/images/careers/demo/qualcomm/1692218234::Transparent+Image+for+PCS+banner\"}, \"showJobId\": 1, \"homePageHeroBanner\": {\"opacity\": 1, \"image\": \"https://static.vscdn.net/images/careers/demo/qualcomm/1686210950::Qualcomm_PCS_Banner.jpg\", \"hideInMobileView\": true, \"useImage\": 1}, \"privacy\": {\"show_notifications_consent_text\": false, \"logged_out_notifications_text\": \"I agree to receiving job recommendations by email\", \"show_notifications_privacy_policy_checkbox\": true, \"logged_out_notifications_privacy_policy_checkbox_text\": \"I would like to receive monthly Qualcomm job recommendations by email.\", \"text\": \"By clicking the \\\\\"Submit My Resume\\\\\" button below, you acknowledge that you have read Qualcomm\\'s <a\\nstyle=\\'color: var(--anchor-color);\\' href=\\'https://www.qualcomm.com/site/job-application-privacy-notice\\' target=\\'_blank\\'>Job Application Privacy Notice</a> and understand how Qualcomm may process your job application data, including using AI-enabled functionalities.\", \"button\": \"Submit My Resume\", \"logged_out_notifications_privacy_policy_checkbox_default_state\": false}, \"navBar\": {\"color\": \"#ffffff\", \"image\": \"https://static.vscdn.net/images/careers/demo/qualcomm/1686210880::Qualcomm-Logo.png\", \"link\": \"https://qualcomm.eightfold.ai/careers\", \"opacity\": 1, \"title\": \"Qualcomm\"}, \"privacyLink\": \"https://www.qualcomm.com/site/job-application-privacy-notice\", \"uploadResumeModal\": {\"title\": \"Welcome to {company_name}\\'s Career Center\", \"subtitle\": \"Streamline your search by uploading your resume to be matched with positions that best suit your qualifications.\", \"disclaimer\": \"**Uploading a resume is not a formal application for employment**\"}, \"custom_style\": {\"css\": \"@media screen and (min-width: 1500px) {\\\\n\\\\t.hero-image {\\\\n\\\\t\\\\theight: 110vh !important;\\\\n\\\\t\\\\tmargin-left: auto;\\\\n\\\\t\\\\tmargin-right: auto;\\\\n\\\\t}\\\\n}\\\\n@media screen and (max-width: 1500px) {\\\\n\\\\t.hero-image {\\\\n\\\\t\\\\theight: 120vh !important;\\\\n\\\\t\\\\tmargin-left: auto;\\\\n\\\\t\\\\tmargin-right: auto;\\\\n\\\\t}\\\\n}\\\\n.hero-image {\\\\n\\nbackground-size: contain !important;\\\\n\\nfont-weight: bold;\\\\n\\npadding: 50px;\\\\n\\nposition: relative;\\\\n\\ntext-align: center;\\\\n\\nwidth: 100%;\\\\n\\nheight: 100%;\\\\n}\\\\n\\\\n.faq-custom-header {\\\\n\\nfloat: right;\\\\n\\nmargin-right: 310px;\\\\n\\nmargin-top: 30px;\\\\n\\nz-index: 9999999;\\\\n\\nposition: relative;\\\\n\\ncolor: rgb(74, 90, 117) !important;\\\\n\\nfont-size: 20px !important;\\\\n}\\\\n@media screen and (max-width: 420px)\\nand (min-width:281px){\\\\n\\n.faq-custom-header {\\\\n\\nmargin-top: 0px;\\\\n\\ntop:30px;\\\\n\\nright: 25px;\\\\n\\nmargin-right: 0px;\\\\n\\n}\\\\n\\n.faq-custom-header.benefits {\\\\n\\nmargin-right: -42px !important;\\\\n\\nmargin-top: -30px;\\\\n\\n}\\\\n\\n.faq-custom-header.search-for-jobs {\\\\n\\nmargin-right: -42px !important;\\\\n\\nmargin-top: 30px;\\\\n\\n}\\\\n}\\\\n@media screen and (max-width: 820px) and (min-width:421px) {\\\\n\\n.faq-custom-header {\\\\n\\nmargin-top: 0px;\\\\n\\ntop:58px;\\\\n\\n}\\\\n\\n.faq-custom-header.benefits {\\\\n\\nmargin-right: 15px !important;\\\\n\\n}\\\\n\\n.faq-custom-header.search-for-jobs {\\\\n\\nmargin-right: 15px !important;\\\\n\\n}\\\\n}\\\\n@media screen and (max-width: 280px) {\\\\n.faq-custom-header {\\\\n\\nmargin-top: 0px;\\\\n\\ntop:90px;\\\\n\\nmargin-right: 245px;\\\\n}\\\\n}\\\\n.success-form .checkmark .fa-check {\\\\n\\ncolor:#3253dc !important;\\\\n\\nfont-size: 100px !important;\\\\n}\\\\n\\\\n.success-form .browse-more .btn {\\\\n\\nbackground-color: #fff;\\\\n\\nborder: 1px solid #3253dc !important;\\\\n\\ncolor: #3253dc !important;\\\\n\\nfont-size: 14px;\\\\n\\nfont-weight: 600 !important;\\\\n\\nline-height: 18px;\\\\n\\nmin-width: 145px;\\\\n\\ntext-align: center;\\\\n}\\\\n.upload-resume-modal .dropzone-container .btn {\\\\n\\nbackground-color: #3253dc !important;\\\\n}\\\\n.upload-resume-modal .privacy-agreement .action-buttons .btn-sm {\\\\n\\nborder: 1px solid #3253dc !important;\\\\n}\\\\n.btn-primary, .btn-primary:hover, .btn-primary:focus, .btn-primary:active, .btn-primary.active, .open .dropdown-toggle.btn-primary, .btn-primary:active:focus, .btn-primary:active:hover, .btn-primary.active:hover, .btn-primary.active:focus {\\\\n\\ncolor: #fff !important;\\\\n\\nbackground-color: #3253dc !important;\\\\n}\\\\n.resume-name {\\\\n\\ncolor: #3253dc;\\\\n}\\\\n.apply-form .position-apply-cancel-button {\\\\n\\nborder: 1px solid #3253dc !important;\\\\n\\ncolor: #3253dc !important;\\\\n}\\\\n.apply-form .btn-primary {\\\\n\\nbackground: #3253dc !important;\\\\n\\nborder-color: #3253dc !important;\\\\n}\\\\n.go-button {\\\\n\\nborder: 1px solid #3253dc;\\\\n\\ncolor: #3253dc;\\\\n}\\\\n.btn-secondary, .btn-secondary:hover, .btn-secondary:focus, .btn-secondary:active, .btn-secondary.active, .open .dropdown-toggle.btn-secondary, .btn-secondary:active:focus, .btn-secondary:active:hover, .btn-secondary.active:hover, .btn-secondary.active:focus {\\\\n\\ncolor: #3253dc !important;\\\\n\\nborder-color: #3253dc !important;\\\\n}\\\\n.add-to-job-cart-button {\\\\n\\nborder: 1px solid #3253dc !important;\\\\n\\ncolor: #3253dc !important;\\\\n}\\\\n.position-card .position-title {\\\\n\\ncolor: #3253dc;\\\\n}\\\\n.search-results-main-container .position-cards-container .card-selected {\\\\n\\nborder-left: 8px solid #3253dc;\\\\n}\\\\n.profile-dropdown .dropdown-title {\\\\n\\ncolor: #3253dc !important;\\\\n}\\\\n.advanced-options-button {\\\\n\\ncolor: #3253dc !important;\\\\n}\\\\n.fa-share {\\\\n\\ncolor: #3253dc !important;\\\\n}\\\\n.position-facets .pillTitle {\\\\n\\ncolor: #3253dc !important;\\\\n}\\\\n.perk .perk-icon {\\\\n\\ncolor: #3253dc !important;\\\\n}\\\\n.join-tn-link {\\\\n\\nposition: relative;\\\\n\\ntop: 10px;\\\\n\\ncolor: rgb(74, 90, 117) !important;\\\\n\\nfont-size: 20px !important;\\\\n}\\\\n.jobs-custom-header {\\\\n\\nfloat: right;\\\\n\\nmargin-right: -190px;\\\\n\\nmargin-top: 30px;\\\\n\\nz-index: 9999999;\\\\n\\nposition: relative;\\\\n\\ncolor: rgb(74, 90, 117) !important;\\\\n\\nfont-size: 20px !important;\\\\n}\\\\n.ef-dropdown.language-dropdown .ef-dropdown-title {\\\\n\\ncolor: rgb(74, 90, 117) !important;\\\\n\\nfont-size: 20px !important;\\\\n}\\\\n.upload-resume-modal .privacy-agreement .action-buttons .btn-sm.btn-secondary.pointer{\\\\n\\ncolor: #3253dc !important;\\\\n}\\\\n.faq-custom-header.benefits {\\\\n\\nmargin-right: 20px;\\\\n}\\\\n.faq-custom-header.search-for-jobs{\\\\n\\nmargin-right: 20px;\\\\n}\\\\n.col-md-12.all-positions-header.col-sm-12.col-xs-12:after {\\\\n\\ncontent: \\\\\"Prior to uploading your resume, please ensure your location is included.\\\\\";\\\\n}\"}, \"page_image\": \"\", \"max_applications_refer\": 0, \"applyButton\": {\"background\": \"#3253DC\"}, \"i18n_overrides_master\": {\"customContent\": {\"en\": {\"footer_title\": \"Unlock Your Limitless Potential with Qualcomm\", \"footer_body\": \"<p>Whether you&rsquo;re launching a new career or ready to explore what&rsquo;s next in the evolution of your talent and expertise, you&rsquo;re about to embark on a career growth journey like no other.</p> <p><strong>Bring out your best, with the best<br /></strong>Our employees make Qualcomm&rsquo;s success possible. We hire the brightest minds and foster a supportive, inclusive culture where your ideas have the power to contribute to world-changing innovations and breakthrough technologies. To make that possible, we leverage the breadth and depth of our diverse expertise from around the world to answer the unasked, conquer the complex, and solve some of the biggest challenges only we can &ndash; together.</p> <p><strong>Innovate with technology experts<br /></strong>At Qualcomm, we are passionate about the limitless potential of your career. Only here can you work alongside some of the most respected, leading engineering and technology experts in the industry &ndash; helping you learn and grow professionally in ways you haven&rsquo;t yet imagined.</p> <p><strong>Live well, work well<br /></strong>Additionally, you&rsquo;ll have access to programs such as our continuous learning and development programs, tuition reimbursement, and mentorships to tap into your limitless potential &ndash; plus, opportunities to enhance your quality of life through our comprehensive, best-in-class benefits offerings.</p> <p>The work we do at Qualcomm impacts lives around the globe &ndash; and you can be part of it. Apply today and unlock your full potential.</p>\"}, \"zh-cn\": {\"footer_title\": \"\\\\u548cQualcomm\\\\u4e00\\\\u8d77\\\\u91ca\\\\u653e\\\\u60a8\\\\u7684\\\\u65e0\\\\u9650\\\\u6f5c\\\\u529b\", \"footer_body\": \"<p>\\\\u65e0\\\\u8bba\\\\u60a8\\\\u662f\\\\u5728\\\\u5f00\\\\u542f\\\\u65b0\\\\u7684\\\\u804c\\\\u4e1a\\\\u751f\\\\u6daf\\\\uff0c\\\\u8fd8\\\\u662f\\\\u51c6\\\\u5907\\\\u8fdb\\\\u4e00\\\\u6b65\\\\u63a2\\\\u7d22\\\\u60a8\\\\u7684\\\\u5929\\\\u8d4b\\\\u548c\\\\u4e13\\\\u4e1a\\\\u77e5\\\\u8bc6\\\\uff0c\\\\u60a8\\\\u90fd\\\\u5c06\\\\u8e0f\\\\u4e0a\\\\u4e00\\\\u6bb5\\\\u72ec\\\\u4e00\\\\u65e0\\\\u4e8c\\\\u7684\\\\u804c\\\\u4e1a\\\\u6210\\\\u957f\\\\u4e4b\\\\u65c5\\\\u3002</p> <p><strong>\\\\u4e0e\\\\u5353\\\\u8d8a\\\\u5458\\\\u5de5\\\\u5171\\\\u4e8b\\\\uff0c\\\\u5c55\\\\u73b0\\\\u60a8\\\\u7684\\\\u6700\\\\u4f73\\\\u6c34\\\\u5e73<br /></strong>\\\\u5458\\\\u5de5\\\\u662fQualcomm\\\\u83b7\\\\u5f97\\\\u6210\\\\u529f\\\\u7684\\\\u5173\\\\u952e\\\\u3002\\\\u6211\\\\u4eec\\\\u8058\\\\u8bf7\\\\u4f18\\\\u79c0\\\\u7684\\\\u4eba\\\\u624d\\\\uff0c\\\\u8425\\\\u9020\\\\u652f\\\\u6301\\\\u5305\\\\u5bb9\\\\u7684\\\\u6587\\\\u5316\\\\uff0c\\\\u8ba9\\\\u60a8\\\\u7684\\\\u60f3\\\\u6cd5\\\\u52a9\\\\u529b\\\\u6539\\\\u53d8\\\\u4e16\\\\u754c\\\\u7684\\\\u521b\\\\u65b0\\\\u548c\\\\u7a81\\\\u7834\\\\u6027\\\\u6280\\\\u672f\\\\u3002\\\\u4e3a\\\\u4e86\\\\u5b9e\\\\u73b0\\\\u8fd9\\\\u4e00\\\\u76ee\\\\u6807\\\\uff0c\\\\u6211\\\\u4eec\\\\u5229\\\\u7528\\\\u6765\\\\u81ea\\\\u4e16\\\\u754c\\\\u5404\\\\u5730\\\\u7684\\\\u4e13\\\\u4e1a\\\\u4eba\\\\u5458\\\\u6765\\\\u63a2\\\\u7d22\\\\u672a\\\\u77e5\\\\u7684\\\\u95ee\\\\u9898\\\\uff0c\\\\u514b\\\\u670d\\\\u56f0\\\\u96be\\\\uff0c\\\\u5171\\\\u540c\\\\u89e3\\\\u51b3\\\\u590d\\\\u6742\\\\u7684\\\\u6311\\\\u6218\\\\u3002</p> <p><strong>\\\\u4e0e\\\\u6280\\\\u672f\\\\u4e13\\\\u5bb6\\\\u4e00\\\\u8d77\\\\u521b\\\\u65b0<br /></strong>\\\\u5728Qualcomm\\\\uff0c\\\\u6211\\\\u4eec\\\\u5bf9\\\\u60a8\\\\u804c\\\\u4e1a\\\\u751f\\\\u6daf\\\\u7684\\\\u65e0\\\\u9650\\\\u6f5c\\\\u529b\\\\u5145\\\\u6ee1\\\\u4fe1\\\\u5fc3\\\\u3002\\\\u53ea\\\\u6709\\\\u5728\\\\u8fd9\\\\u91cc\\\\uff0c\\\\u4f60\\\\u624d\\\\u80fd\\\\u4e0e\\\\u4e1a\\\\u5185\\\\u6700\\\\u53d7\\\\u5c0a\\\\u656c\\\\u3001\\\\u6700\\\\u9886\\\\u5148\\\\u7684\\\\u5de5\\\\u7a0b\\\\u548c\\\\u6280\\\\u672f\\\\u4e13\\\\u5bb6\\\\u4e00\\\\u8d77\\\\u5de5\\\\u4f5c\\\\uff0c\\\\u5e2e\\\\u52a9\\\\u4f60\\\\u5728\\\\u4e13\\\\u4e1a\\\\u4e0a\\\\u5b66\\\\u4e60\\\\u548c\\\\u6210\\\\u957f\\\\u3002</p> <p><strong>\\\\u751f\\\\u6d3b\\\\u5982\\\\u610f\\\\uff0c\\\\u5de5\\\\u4f5c\\\\u5f97\\\\u529b<br /></strong>\\\\u9664\\\\u6b64\\\\u4e4b\\\\u5916\\\\uff0c\\\\u60a8\\\\u8fd8\\\\u53ef\\\\u4ee5\\\\u53c2\\\\u52a0\\\\u6211\\\\u4eec\\\\u7684\\\\u6301\\\\u7eed\\\\u5b66\\\\u4e60\\\\u548c\\\\u53d1\\\\u5c55\\\\u8ba1\\\\u5212\\\\u3001\\\\u5b66\\\\u8d39\\\\u62a5\\\\u9500\\\\u8ba1\\\\u5212\\\\u548c\\\\u5bfc\\\\u5e08\\\\u8ba1\\\\u5212\\\\uff0c\\\\u4ee5\\\\u6316\\\\u6398\\\\u60a8\\\\u7684\\\\u65e0\\\\u9650\\\\u6f5c\\\\u529b\\\\u3002\\\\u60a8\\\\u8fd8\\\\u6709\\\\u673a\\\\u4f1a\\\\u4eab\\\\u53d7\\\\u6211\\\\u4eec\\\\u63d0\\\\u4f9b\\\\u7684\\\\u4e30\\\\u539a\\\\u798f\\\\u5229\\\\u5f85\\\\u9047\\\\uff0c\\\\u63d0\\\\u9ad8\\\\u751f\\\\u6d3b\\\\u8d28\\\\u91cf\\\\u3002</p> <p>\\\\u6211\\\\u4eec\\\\u5728Qualcomm\\\\u6240\\\\u505a\\\\u7684\\\\u5de5\\\\u4f5c\\\\u5177\\\\u6709\\\\u5168\\\\u7403\\\\u6027\\\\u5f71\\\\u54cd\\\\uff0c\\\\u60a8\\\\u4e5f\\\\u53ef\\\\u4ee5\\\\u53c2\\\\u4e0e\\\\u5176\\\\u4e2d\\\\u3002\\\\u7acb\\\\u5373\\\\u7533\\\\u8bf7\\\\uff0c\\\\u91ca\\\\u653e\\\\u60a8\\\\u7684\\\\u5168\\\\u90e8\\\\u6f5c\\\\u80fd\\\\u3002</p>\"}, \"de\": {\"footer_title\": \"Erschlie\\\\u00dfen Sie Ihr grenzenloses Potenzial bei Qualcomm\", \"footer_body\": \"<p>Ganz gleich, ob Sie gerade einen neue berufliche Herausforderung suchen oder Ihr Talent und Ihr Wissen weiterentwickeln m\\\\u00f6chten \\\\u2013 Sie sind dabei, sich auf eine unvergleichliche Karriereentwicklungsreise zu begeben.</p> <p><strong>Holen Sie das Beste aus sich heraus \\\\u2013 in einem erstklassigen Team<br /></strong>Unsere Mitarbeiter sind die treibende Kraft hinter dem Erfolg von Qualcomm. Wir rekrutieren kluge K\\\\u00f6pfe und setzen uns f\\\\u00fcr eine Kultur der gegenseitigen Unterst\\\\u00fctzung und Inklusion ein. So k\\\\u00f6nnen mit Ihren Ideen zu weltver\\\\u00e4ndernden Innovationen und bahnbrechenden Technologien beitragen. Damit das m\\\\u00f6glich ist, nutzen wir die ganze Breite und Tiefe unseres weltweiten Know-hows. Wir liefern Antworten, machen Komplexes einfach und \\\\u00fcberwinden gemeinsam anspruchsvollste Herausforderungen.</p> <p><strong>Innovationen mit Technologieexperten<br /></strong>Karriere kennt keine Grenzen \\\\u2013 davon sind wir bei Qualcomm fest \\\\u00fcberzeugt. Nutzen Sie die einzigartige M\\\\u00f6glichkeit, mit hoch angesehenen und f\\\\u00fchrenden Engineering- und Technologieexperten der Branche zusammenzuarbeiten. Bei uns erleben Sie Lernen und berufliche Weiterentwicklung in einer v\\\\u00f6llig neuen Dimension.</p> <p><strong>Gesund leben, gesund arbeiten<br /></strong>Dar\\\\u00fcber hinaus haben Sie Zugang zu Programmen wie unseren kontinuierlichen Lern- und Weiterentwicklungsprogrammen, der Erstattung von Studiengeb\\\\u00fchren und Mentoring, um Ihr grenzenloses Potenzial auszusch\\\\u00f6pfen \\\\u2013 au\\\\u00dferdem haben Sie die M\\\\u00f6glichkeit, Ihre Lebensqualit\\\\u00e4t durch unsere umfassenden, erstklassigen Leistungsangebote zu verbessern .</p> <p>Die Arbeit, die wir bei Qualcomm leisten, hat Auswirkungen auf das Leben rund um den Globus \\\\u2013 und Sie k\\\\u00f6nnen ein Teil davon sein. Bewerben Sie sich noch heute und entfalten Sie Ihr volles Potenzial..</p>\"}}}, \"customContent\": {\"positionSections\": [{\"title\": \"Unlock Your Limitless Potential with Qualcomm\", \"body\": \"<p>Whether you&rsquo;re launching a new career or ready to explore what&rsquo;s next in the evolution of your talent and expertise, you&rsquo;re about to embark on a career growth journey like no other.</p> <p><strong>Bring out your best, with the best<br /></strong>Our employees make Qualcomm&rsquo;s success possible. We hire the brightest minds and foster a supportive, inclusive culture where your ideas have the power to contribute to world-changing innovations and breakthrough technologies. To make that possible, we leverage the breadth and depth of our diverse expertise from around the world to answer the unasked, conquer the complex, and solve some of the biggest challenges only we can &ndash; together.</p> <p><strong>Innovate with technology experts<br /></strong>At Qualcomm, we are passionate about the limitless potential of your career. Only here can you work alongside some of the most respected, leading engineering and technology experts in the industry &ndash; helping you learn and grow professionally in ways you haven&rsquo;t yet imagined.</p> <p><strong>Live well, work well<br /></strong>Additionally, you&rsquo;ll have access to programs such as our continuous learning and development programs, tuition reimbursement, and mentorships to tap into your limitless potential &ndash; plus, opportunities to enhance your quality of life through our comprehensive, best-in-class benefits offerings.</p> <p>The work we do at Qualcomm impacts lives around the globe &ndash; and you can be part of it. Apply today and unlock your full potential.</p>\"}]}, \"talentNetworkHeroBanner\": {\"useImage\": 1, \"opacity\": 1, \"image\": \"https://static.vscdn.net/images/careers/demo/qualcomm/1686211087::Qualcomm_TalentNetwork_Banner.jpg\", \"title\": \"Transform your future\"}, \"defaultState\": {\"pymww\": false}, \"mapConfig\": {\"enabled\": true, \"setMapAsDefaultView\": false, \"initByIpGeolocation\": false}, \"custom_html\": {\"header\": \"<a href=\\\\\"https://www.qualcomm.com/company/careers/faqs\\\\\" target=\\\\\"_blank\\\\\" class=\\\\\"faq-custom-header\\\\\">FAQs</a>\\\\n<a href=\\\\\"https://www.qualcomm.com/company/careers/benefits\\\\\" target=\\\\\"_blank\\\\\" class=\\\\\"faq-custom-header benefits\\\\\">Benefits</a>\\\\n<a href=\\\\\"https://app.eightfold.ai/careers?domain=qualcomm.com\\\\\" target=\\\\\"_blank\\\\\" class=\\\\\"faq-custom-header search-for-jobs\\\\\">Search for Jobs</a>\"}, \"links\": {\"videos\": [\"https://www.youtube-nocookie.com/embed/Sh5HO1L4MUk\", \"https://www.youtube-nocookie.com/embed/SRh7NL2kpvo\", \"https://www.youtube-nocookie.com/embed/evU87h3tDJM\", \"https://www.youtube-nocookie.com/embed/l2LT5mi6kG0\"], \"blogs\": [\"https://www.qualcomm.com/news/onq/2022/12/why-qualcomm-is-the-true-leader-in-5g\", \"https://www.qualcomm.com/news/onq/2022/03/meet-vanitha-kumar-qualcomm-inventor-whose-work-modem-software-helped-make-5g\", \"https://www.qualcomm.com/news/onq/2022/01/snapdragon-compute-platforms-are-leading-way-mobile-education-hybrid-learning\"]}, \"hideJobCart\": false, \"perks\": [{\"title\": \"Health\", \"description\": \"Qualcomm offers a world-class health benefit option providing world-class coverage to employees and their eligible dependents.\", \"icon\": \"fa-user-md\"}, {\"title\": \"Wealth\", \"description\": \"Our programs are designed to help employees build and prepare for a financially secure future.\", \"icon\": \"fa-coins\"}, {\"title\": \"Self\", \"description\": \"Our self and family resources help you build emotional/mental strength and resilience, as well as define your purpose \\\\u2014 in life and at work.\", \"icon\": \"fas fa-hand-holding-heart\"}, {\"description\": \"Qualcomm\\\\u2019s wellbeing programs and resources offer support to help employees Live+Well and Work+Well, so they can unlock their full potential at home, at work, and everywhere between.\", \"title\": \"Wellbeing\", \"icon\": \"fas fa-balance-scale\"}], \"page_title\": \"Qualcomm Careers | Engineering Jobs and More | Qualcomm\", \"job_page_title\": \"Qualcomm Careers | Engineering Jobs and More | Qualcomm\", \"page_description\": \"Search open positions at Qualcomm. Learn more about how our culture of collaboration and robust benefits program allow our employees to live well and exceed their potential.\", \"recaptcha_enabled\": 0, \"companyName\": \"Qualcomm\", \"showLoggedOutNotificationsPrivacyPolicy\": true, \"hideEightfoldBranding\": false, \"customJobDescEnhancedTableGate\": false}, \"pid\": \"446715275527\", \"positions\": [{\"id\": 446715275527, \"name\": \"Senior Software Engineer, AI Software Tools\", \"location\": \"San Diego, California, United States of America\", \"locations\": [\"San Diego, California, United States of America\"], \"hot\": 0, \"department\": \"Machine Learning Engineering\", \"business_unit\": \"33227 AI SW US\", \"t_update\": 1767065733, \"t_create\": 1761523200, \"ats_job_id\": \"3081044\", \"display_job_id\": \"3081044\", \"type\": \"ATS\", \"id_locale\": \"3081044-en-US\", \"job_description\": \"\", \"locale\": \"en-US\", \"stars\": 0.0, \"medallionProgram\": null, \"location_flexibility\": null, \"work_location_option\": \"onsite\", \"canonicalPositionUrl\": \"https://careers.qualcomm.com/careers/job/446715275527\", \"isPrivate\": false, \"latlongs\": \"32.715738,-117.1610838\"}], \"isFallback\": false, \"debug\": {}, \"count\": 1, \"personal_message\": \"We thought you would be interested in this position.\", \"scheduling\": {\"minTime\": 9, \"limit\": 10, \"maxTime\": 17, \"increments\": 60, \"minTimeslots\": 3}, \"userTitles\": [], \"enableTargetedResume\": 0, \"query\": {\"query\": \"\", \"location\": \"\", \"department\": [], \"skill\": [], \"seniority\": [], \"pid\": \"446715275527\"}, \"singleview\": true, \"see_all_jobs\": true, \"recommended_star_threshold\": 4.0, \"chatbot\": false, \"iframeImplementation\": null, \"pcsApplyFormV2Enabled\": false, \"isPcsBrandingApril2023Enabled\": false, \"allowedFileTypes\": {}, \"pcsTextConfiguration\": {}, \"hideDepartment\": null, \"pcsOctupleMigration0Enabled\": true, \"pcsOctupleMigration1Enabled\": false, \"replaceUrlOnGoBack\": true, \"pcsRedesignedNuxEnabled\": true, \"readmoreInstructionEnabled\": false, \"userActivityTimeout\": 86400000, \"userActivityTimeoutEnabled\": 1, \"isLoggedInPcsEnabled\": false, \"sortByConfig\": null, \"searchBoxConfig\": {}, \"excludePrivatePositions\": true, \"eeocFilterKeywords\": [\"veteran\", \"disability\", \"gender\", \"race\", \"citizen\", \"visa\", \"ethnicity\"], \"disableScrollLoadPositionSidebar\": false, \"locationFlexibilityFrontendEnabled\": false, \"workLocationOptionFrontendEnabled\": false, \"remoteFlexibleJobsFilterEnabled\": false, \"loggedOutNotificationsEnabled\": true, \"candidateLogin\": {}, \"prepopulateApplyFormEnabled\": true, \"prepopulateSettings\": {\"prepopulateCheckboxText\": \"Save my answers for future applications\", \"showPrepopulateCheckbox\": false}, \"candidateBuildProfile\": {}, \"enhancementsEnabled\": false, \"themeBuilderUser\": null, \"mandatoryFields\": [\"firstname\", \"lastname\", \"email\", \"phone\"], \"blindfoldWidgetPcsGate\": false, \"pcsApplyFormLocationGate\": false, \"t3sEnabled\": false, \"uploadApplicationAnswers\": false, \"candidateAuthV2Enabled\": true, \"preApplicationSubmitAuthEnabled\": false, \"applyFormV2Enabled\": false, \"loggedOutSavedSearchEnabled\": false, \"locationRadiusTypeToggleEnabled\": true, \"incompleteApplicationsEnabled\": false, \"incompleteApplicationConfig\": {}, \"fallbackPcsJdGate\": true, \"enableResumeCoach\": false, \"isPcsEnabled\": true, \"applicationInfoReviewEnabled\": false, \"phoneWithCountryCodeEnabled\": true, \"phoneWithCountryCodeJTNEnabled\": false, \"notificationSuggestVerificationToken\": null, \"cookiesAutoDisabled\": false, \"strictEmailValidationEnabled\": true, \"chatbotxConfig\": {\"enabled\": false, \"featureAccessFlags\": {\"resumeCoachCardFlags\": {\"showApplyWithResume\": true, \"showEditResume\": true}, \"positionCardFlags\": {\"showAddToJobCart\": true}}}, \"pcsOptionalResumeWithJobcartGate\": false, \"loggedInCandidate\": {}, \"hamburgerMenuEnabled\": false, \"sharedTalentPoolGate\": false, \"pcsAccessibilityHomeEnabled\": true, \"pcsAccessibilityApplyFormEnabled\": true, \"showLanguageDropdown\": true, \"languages\": [{\"value\": \"en\", \"title\": \"English\"}, {\"value\": \"de\", \"title\": \"Deutsch\"}, {\"value\": \"zh-CN\", \"title\": \"\\\\u4e2d\\\\u6587 (\\\\u7b80\\\\u4f53)\"}], \"displayLanguage\": \"en-US\", \"installed_app_data\": [], \"singlePageCareersNavbarGate\": false, \"customJobDescTranslationSkipList\": [], \"all_applicable_locations\": [{\"location\": \"San Diego, California, United States of America\", \"city\": \"San Diego\", \"state\": \"CA,US\", \"country\": \"US\"}]}\\n{\"display_banner\": false, \"display_text\": \"\"}\\n@media screen and (min-width: 1500px) {\\n.hero-image {\\nheight: 110vh !important;\\nmargin-left: auto;\\nmargin-right: auto;\\n}\\n}\\n@media screen and (max-width: 1500px) {\\n.hero-image {\\nheight: 120vh !important;\\nmargin-left: auto;\\nmargin-right: auto;\\n}\\n}\\n.hero-image {\\nbackground-size: contain !important;\\nfont-weight: bold;\\npadding: 50px;\\nposition: relative;\\ntext-align: center;\\nwidth: 100%;\\nheight: 100%;\\n}\\n.faq-custom-header {\\nfloat: right;\\nmargin-right: 310px;\\nmargin-top: 30px;\\nz-index: 9999999;\\nposition: relative;\\ncolor: rgb(74, 90, 117) !important;\\nfont-size: 20px !important;\\n}\\n@media screen and (max-width: 420px)\\nand (min-width:281px){\\n.faq-custom-header {\\nmargin-top: 0px;\\ntop:30px;\\nright: 25px;\\nmargin-right: 0px;\\n}\\n.faq-custom-header.benefits {\\nmargin-right: -42px !important;\\nmargin-top: -30px;\\n}\\n.faq-custom-header.search-for-jobs {\\nmargin-right: -42px !important;\\nmargin-top: 30px;\\n}\\n}\\n@media screen and (max-width: 820px) and (min-width:421px) {\\n.faq-custom-header {\\nmargin-top: 0px;\\ntop:58px;\\n}\\n.faq-custom-header.benefits {\\nmargin-right: 15px !important;\\n}\\n.faq-custom-header.search-for-jobs {\\nmargin-right: 15px !important;\\n}\\n}\\n@media screen and (max-width: 280px) {\\n.faq-custom-header {\\nmargin-top: 0px;\\ntop:90px;\\nmargin-right: 245px;\\n}\\n}\\n.success-form .checkmark .fa-check {\\ncolor:#3253dc !important;\\nfont-size: 100px !important;\\n}\\n.success-form .browse-more .btn {\\nbackground-color: #fff;\\nborder: 1px solid #3253dc !important;\\ncolor: #3253dc !important;\\nfont-size: 14px;\\nfont-weight: 600 !important;\\nline-height: 18px;\\nmin-width: 145px;\\ntext-align: center;\\n}\\n.upload-resume-modal .dropzone-container .btn {\\nbackground-color: #3253dc !important;\\n}\\n.upload-resume-modal .privacy-agreement .action-buttons .btn-sm {\\nborder: 1px solid #3253dc !important;\\n}\\n.btn-primary, .btn-primary:hover, .btn-primary:focus, .btn-primary:active, .btn-primary.active, .open .dropdown-toggle.btn-primary, .btn-primary:active:focus, .btn-primary:active:hover, .btn-primary.active:hover, .btn-primary.active:focus {\\ncolor: #fff !important;\\nbackground-color: #3253dc !important;\\n}\\n.resume-name {\\ncolor: #3253dc;\\n}\\n.apply-form .position-apply-cancel-button {\\nborder: 1px solid #3253dc !important;\\ncolor: #3253dc !important;\\n}\\n.apply-form .btn-primary {\\nbackground: #3253dc !important;\\nborder-color: #3253dc !important;\\n}\\n.go-button {\\nborder: 1px solid #3253dc;\\ncolor: #3253dc;\\n}\\n.btn-secondary, .btn-secondary:hover, .btn-secondary:focus, .btn-secondary:active, .btn-secondary.active, .open .dropdown-toggle.btn-secondary, .btn-secondary:active:focus, .btn-secondary:active:hover, .btn-secondary.active:hover, .btn-secondary.active:focus {\\ncolor: #3253dc !important;\\nborder-color: #3253dc !important;\\n}\\n.add-to-job-cart-button {\\nborder: 1px solid #3253dc !important;\\ncolor: #3253dc !important;\\n}\\n.position-card .position-title {\\ncolor: #3253dc;\\n}\\n.search-results-main-container .position-cards-container .card-selected {\\nborder-left: 8px solid #3253dc;\\n}\\n.profile-dropdown .dropdown-title {\\ncolor: #3253dc !important;\\n}\\n.advanced-options-button {\\ncolor: #3253dc !important;\\n}\\n.fa-share {\\ncolor: #3253dc !important;\\n}\\n.position-facets .pillTitle {\\ncolor: #3253dc !important;\\n}\\n.perk .perk-icon {\\ncolor: #3253dc !important;\\n}\\n.join-tn-link {\\nposition: relative;\\ntop: 10px;\\ncolor: rgb(74, 90, 117) !important;\\nfont-size: 20px !important;\\n}\\n.jobs-custom-header {\\nfloat: right;\\nmargin-right: -190px;\\nmargin-top: 30px;\\nz-index: 9999999;\\nposition: relative;\\ncolor: rgb(74, 90, 117) !important;\\nfont-size: 20px !important;\\n}\\n.ef-dropdown.language-dropdown .ef-dropdown-title {\\ncolor: rgb(74, 90, 117) !important;\\nfont-size: 20px !important;\\n}\\n.upload-resume-modal .privacy-agreement .action-buttons .btn-sm.btn-secondary.pointer{\\ncolor: #3253dc !important;\\n}\\n.faq-custom-header.benefits {\\nmargin-right: 20px;\\n}\\n.faq-custom-header.search-for-jobs{\\nmargin-right: 20px;\\n}\\n.col-md-12.all-positions-header.col-sm-12.col-xs-12:after {\\ncontent: \"Prior to uploading your resume, please ensure your location is included.\";\\n}'"
65
+ ]
66
+ },
67
+ "execution_count": 15,
68
+ "metadata": {},
69
+ "output_type": "execute_result"
70
+ }
71
+ ],
72
+ "source": [
73
+ "# scrape_text_from_url(\"https://www.linkedin.com/jobs/view/4317364994\")\n",
74
+ "scrape_text_from_url(\"https://careers.qualcomm.com/careers/job/446715275527?hl=en-US&domain=qualcomm.com&source=APPLICANT_SOURCE-6-2\")\n"
75
+ ]
76
+ },
77
+ {
78
+ "cell_type": "code",
79
+ "execution_count": 18,
80
+ "metadata": {},
81
+ "outputs": [
82
+ {
83
+ "name": "stdout",
84
+ "output_type": "stream",
85
+ "text": [
86
+ "Join us as we inspire\n",
87
+ "\n",
88
+ "creativity and bring joy to\n",
89
+ "\n",
90
+ "millions of users worldwide.\n",
91
+ "\n",
92
+ "@2025 TikTok\n",
93
+ "\n",
94
+ "Responsibilities\n",
95
+ "\n",
96
+ "The algorithm team is responsible for developing state-of-the-art computer vision, NLP and multimodality models and algorithms to protect our platform and users from the content and behaviors that violate community guidelines and related regulations. With the continuous efforts from our team, TikTok is able to provide the best user experience and bring joy to everyone in the world. In our team, you will have the opportunity to participate in the development of the cutting-edge content understanding model to help improve the recognition ability of violated content in TikTok, and will also be responsible for optimizing our distributed model training framework continuously. We are looking for talented individuals to join us for an internship in 2026. Internships at TikTok aim to offer students industry exposure and hands-on experience. Turn your ambitions into reality as your inspiration brings infinite opportunities at TikTok. Internships at TikTok aim to provide students with hands-on experience in developing fundamental skills and exploring potential career paths. A vibrant blend of social events and enriching development workshops will be available for you to explore. Here, you will utilize your knowledge in real-world scenarios while laying a strong foundation for personal and professional growth. It runs for 12 weeks. Candidates can apply to a maximum of two positions and will be considered for jobs in the order you apply. The application limit is applicable to TikTok and its affiliates' jobs globally. Applications will be reviewed on a rolling basis. We encourage you to apply as early as possible. Please state your availability clearly in your resume (Start date, End date). Summer Start Dates: - May 11th, 2026 - May 18th, 2026 - May 26th, 2026 - June 8th, 2026 - June 22nd, 2026 Responsibilities: 1. Leverage multimodal large models to explore few-shot and zero-shot strategies for content safety scenarios, and build moderation models with strong generalization capabilities. 2. Participate in reinforcement learning–based data mining, and help design Chain-of-Thought (CoT) annotation frameworks to improve the model’s understanding of complex risks. 3. Build risk ranking and recall systems to enhance coverage and accuracy in identifying high-risk content. 4. Collaborate with product and policy teams to drive real-world deployment and performance optimization of moderation algorithms.\n",
97
+ "\n",
98
+ "Qualifications\n",
99
+ "\n",
100
+ "Minimum Qualifications: 1. Currently pursuing a Master degree with a background in computer science, machine learning, or similar fields; 2. Solid foundation in machine learning and deep learning; familiarity with multimodal modeling; hands-on experience with large model projects is a plus. 3. Interest in content safety, with an understanding of the challenges in risk identification within moderation workflows. 4. Proficient in end-to-end algorithm development, including data processing, modeling, and evaluation; experience with PyTorch or TensorFlow is required. Preferred Qualifications: Strong learning ability, clear communication, and a strong sense of teamwork and responsibility. For TikTok By submitting an application for this role, you accept and agree to our global applicant privacy policy, which may be accessed here: https://careers.tiktok.com/legal/privacy\n",
101
+ "\n",
102
+ "Job Information\n",
103
+ "\n",
104
+ "【For Pay Transparency】Compensation Description (Hourly) - Campus Intern\n",
105
+ "\n",
106
+ "The hourly rate range for this position in the selected city is $45- $45.\n",
107
+ "\n",
108
+ "Benefits may vary depending on the nature of employment and the country work location. Interns have day one access to health insurance, life insurance, wellbeing benefits and more. Interns also receive 10 paid holidays per year and paid sick time (56 hours if hired in first half of year, 40 if hired in second half of year). Interns who are not working 100% remote may also be eligible for housing allowance.\n",
109
+ "\n",
110
+ "The Company reserves the right to modify or change these benefits programs at any time, with or without notice.\n",
111
+ "\n",
112
+ "For Los Angeles County (unincorporated) Candidates:\n",
113
+ "\n",
114
+ "Qualified applicants with arrest or conviction records will be considered for employment in accordance with all federal, state, and local laws including the Los Angeles County Fair Chance Ordinance for Employers and the California Fair Chance Act. Our company believes that criminal history may have a direct, adverse and negative relationship on the following job duties, potentially resulting in the withdrawal of the conditional offer of employment:\n",
115
+ "\n",
116
+ "1. Interacting and occasionally having unsupervised contact with internal/external clients and/or colleagues;\n",
117
+ "\n",
118
+ "2. Appropriately handling and managing confidential information including proprietary and trade secret information and access to information technology systems; and\n",
119
+ "\n",
120
+ "3. Exercising sound judgment.\n",
121
+ "\n",
122
+ "About TikTok\n",
123
+ "\n",
124
+ "TikTok is the leading destination for short-form mobile video. At TikTok, our mission is to inspire creativity and bring joy. TikTok's global headquarters are in Los Angeles and Singapore, and we also have offices in New York City, London, Dublin, Paris, Berlin, Dubai, Jakarta, Seoul, and Tokyo.\n",
125
+ "\n",
126
+ "Why Join Us\n",
127
+ "\n",
128
+ "Inspiring creativity is at the core of TikTok's mission. Our innovative product is built to help people authentically express themselves, discover and connect – and our global, diverse teams make that possible. Together, we create value for our communities, inspire creativity and bring joy - a mission we work towards every day.\n",
129
+ "\n",
130
+ "We strive to do great things with great people. We lead with curiosity, humility, and a desire to make impact in a rapidly growing tech company. Every challenge is an opportunity to learn and innovate as one team. We're resilient and embrace challenges as they come. By constantly iterating and fostering an \"Always Day 1\" mindset, we achieve meaningful breakthroughs for ourselves, our company, and our users. When we create and grow together, the possibilities are limitless. Join us.\n",
131
+ "\n",
132
+ "\n",
133
+ "Diversity & Inclusion\n",
134
+ "\n",
135
+ "TikTok is committed to creating an inclusive space where employees are valued for their skills, experiences, and unique perspectives. Our platform connects people from across the globe and so does our workplace. At TikTok, our mission is to inspire creativity and bring joy. To achieve that goal, we are committed to celebrating our diverse voices and to creating an environment that reflects the many communities we reach. We are passionate about this and hope you are too.\n",
136
+ "\n",
137
+ "TikTok Accommodation\n",
138
+ "\n",
139
+ "TikTok is committed to providing reasonable accommodations in our recruitment processes for candidates with disabilities, pregnancy, sincerely held religious beliefs or other reasons protected by applicable laws. If you need assistance or a reasonable accommodation, please reach out to us at https://tinyurl.com/RA-request\n"
140
+ ]
141
+ }
142
+ ],
143
+ "source": [
144
+ "import requests\n",
145
+ "import trafilatura\n",
146
+ "import random\n",
147
+ "\n",
148
+ "def scrape_job_details(url):\n",
149
+ " # 1. Setup headers to look like a real browser\n",
150
+ " user_agents = [\n",
151
+ " \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36\",\n",
152
+ " \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36\"\n",
153
+ " ]\n",
154
+ " \n",
155
+ " headers = {\n",
156
+ " \"User-Agent\": random.choice(user_agents),\n",
157
+ " \"Accept-Language\": \"en-US,en;q=0.9\",\n",
158
+ " }\n",
159
+ "\n",
160
+ " try:\n",
161
+ " # 2. Fetch the HTML manually using requests\n",
162
+ " response = requests.get(url, headers=headers, timeout=15)\n",
163
+ " response.raise_for_status() # Check for HTTP errors\n",
164
+ " \n",
165
+ " # 3. Pass the raw HTML to trafilatura for extraction\n",
166
+ " # We use 'extract' on the response text directly\n",
167
+ " content = trafilatura.extract(\n",
168
+ " response.text, \n",
169
+ " include_formatting=True,\n",
170
+ " include_links=False,\n",
171
+ " favor_precision=True\n",
172
+ " )\n",
173
+ "\n",
174
+ " if not content:\n",
175
+ " return \"Error: Could not identify the main content of the page.\"\n",
176
+ "\n",
177
+ " return content\n",
178
+ "\n",
179
+ " except requests.exceptions.RequestException as e:\n",
180
+ " return f\"Network error: {e}\"\n",
181
+ " except Exception as e:\n",
182
+ " return f\"An unexpected error occurred: {e}\"\n",
183
+ "\n",
184
+ "# --- Usage ---\n",
185
+ "url = \"https://lifeattiktok.com/search/7527589557336869138\"\n",
186
+ "print(scrape_job_details(url))"
187
+ ]
188
+ },
189
+ {
190
+ "cell_type": "code",
191
+ "execution_count": null,
192
+ "metadata": {},
193
+ "outputs": [],
194
+ "source": []
195
+ },
196
+ {
197
+ "cell_type": "code",
198
+ "execution_count": null,
199
+ "metadata": {},
200
+ "outputs": [],
201
+ "source": []
202
+ }
203
+ ],
204
+ "metadata": {
205
+ "kernelspec": {
206
+ "display_name": ".venv",
207
+ "language": "python",
208
+ "name": "python3"
209
+ },
210
+ "language_info": {
211
+ "codemirror_mode": {
212
+ "name": "ipython",
213
+ "version": 3
214
+ },
215
+ "file_extension": ".py",
216
+ "mimetype": "text/x-python",
217
+ "name": "python",
218
+ "nbconvert_exporter": "python",
219
+ "pygments_lexer": "ipython3",
220
+ "version": "3.12.7"
221
+ }
222
+ },
223
+ "nbformat": 4,
224
+ "nbformat_minor": 2
225
+ }
notebooks/5_test_prompt_template.ipynb ADDED
@@ -0,0 +1,79 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "code",
5
+ "execution_count": 1,
6
+ "metadata": {},
7
+ "outputs": [],
8
+ "source": [
9
+ "%reload_ext autoreload\n",
10
+ "%autoreload 2"
11
+ ]
12
+ },
13
+ {
14
+ "cell_type": "code",
15
+ "execution_count": 4,
16
+ "metadata": {},
17
+ "outputs": [],
18
+ "source": [
19
+ "ACHIEVEMENTS =\"\"\"\n",
20
+ "<achievements>\n",
21
+ "{section_data}\n",
22
+ "</achievements>\n",
23
+ "\n",
24
+ "<job_description>\n",
25
+ "{job_description}\n",
26
+ "</job_description>\n",
27
+ "\"\"\""
28
+ ]
29
+ },
30
+ {
31
+ "cell_type": "code",
32
+ "execution_count": 5,
33
+ "metadata": {},
34
+ "outputs": [
35
+ {
36
+ "data": {
37
+ "text/plain": [
38
+ "'\\n<achievements>\\nsec_d\\n</achievements>\\n\\n<job_description>\\njob_d\\n</job_description>\\n'"
39
+ ]
40
+ },
41
+ "execution_count": 5,
42
+ "metadata": {},
43
+ "output_type": "execute_result"
44
+ }
45
+ ],
46
+ "source": [
47
+ "ACHIEVEMENTS.format(section_data = \"sec_d\", job_description = \"job_d\")"
48
+ ]
49
+ },
50
+ {
51
+ "cell_type": "code",
52
+ "execution_count": null,
53
+ "metadata": {},
54
+ "outputs": [],
55
+ "source": []
56
+ }
57
+ ],
58
+ "metadata": {
59
+ "kernelspec": {
60
+ "display_name": ".venv",
61
+ "language": "python",
62
+ "name": "python3"
63
+ },
64
+ "language_info": {
65
+ "codemirror_mode": {
66
+ "name": "ipython",
67
+ "version": 3
68
+ },
69
+ "file_extension": ".py",
70
+ "mimetype": "text/x-python",
71
+ "name": "python",
72
+ "nbconvert_exporter": "python",
73
+ "pygments_lexer": "ipython3",
74
+ "version": "3.12.7"
75
+ }
76
+ },
77
+ "nbformat": 4,
78
+ "nbformat_minor": 2
79
+ }
notebooks/6_test_jinja2.ipynb ADDED
@@ -0,0 +1,410 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "code",
5
+ "execution_count": 5,
6
+ "metadata": {},
7
+ "outputs": [],
8
+ "source": [
9
+ "%reload_ext autoreload\n",
10
+ "%autoreload 2"
11
+ ]
12
+ },
13
+ {
14
+ "cell_type": "code",
15
+ "execution_count": 6,
16
+ "metadata": {},
17
+ "outputs": [],
18
+ "source": [
19
+ "import sys\n",
20
+ "import os\n",
21
+ "\n",
22
+ "# 1. Define the path to the directory containing your utility\n",
23
+ "util_path = \"/Users/sawale/Documents/learning/resumer/resumer/utils\"\n",
24
+ "\n",
25
+ "# 2. Add it to sys.path if it isn't already there\n",
26
+ "if util_path not in sys.path:\n",
27
+ " sys.path.append(util_path)\n",
28
+ "\n",
29
+ "from latex_ops import json_to_latex_pdf\n",
30
+ "\n"
31
+ ]
32
+ },
33
+ {
34
+ "cell_type": "code",
35
+ "execution_count": 11,
36
+ "metadata": {},
37
+ "outputs": [
38
+ {
39
+ "name": "stdout",
40
+ "output_type": "stream",
41
+ "text": [
42
+ "Running command: pdflatex -interaction=nonstopmode -output-directory=/Users/sawale/Documents/learning/resumer/notebooks /Users/sawale/Documents/learning/resumer/notebooks/my_resume.tex\n",
43
+ "PDF generated at: /Users/sawale/Documents/learning/resumer/notebooks/my_resume.pdf\n",
44
+ "PDF successfully generated at: /Users/sawale/Documents/learning/resumer/notebooks/my_resume.pdf\n"
45
+ ]
46
+ }
47
+ ],
48
+ "source": [
49
+ "# --- DEMO EXECUTION ---\n",
50
+ "\n",
51
+ "data = {'personal_info': {'name': {'segments': [{'type': 'text',\n",
52
+ " 'content': 'Sajil Awale'}]},\n",
53
+ " 'location': {'segments': [{'type': 'text',\n",
54
+ " 'content': 'Huntsville, Alabama'}]},\n",
55
+ " 'phone': {'segments': [{'type': 'text', 'content': '+1-256-417-3690'}]},\n",
56
+ " 'email': {'segments': [{'type': 'link',\n",
57
+ " 'content': 'sajilawale@gmail.com',\n",
58
+ " 'url': 'mailto:sajilawale@gmail.com'}]},\n",
59
+ " 'media': {'portfolio': 'https://www.sajilawale.com.np',\n",
60
+ " 'linkedin': 'https://www.linkedin.com/in/sajilawale/',\n",
61
+ " 'github': 'https://github.com/AwaleSajil',\n",
62
+ " 'medium': None,\n",
63
+ " 'devpost': None}},\n",
64
+ " 'summary': {'segments': [{'type': 'text',\n",
65
+ " 'content': \"Machine Learning Engineer and Master's candidate with 4+ years of experience specializing in NLP, Large Language Models, and multimodal AI for content understanding and risk identification. Proficient in end-to-end algorithm development, including fine-tuning large models, distributed training, computer vision, and agentic AI evaluation. Eager to contribute to building robust moderation models and risk ranking systems for content safety.\"}]},\n",
66
+ " 'work_experience': [{'role': {'segments': [{'type': 'text',\n",
67
+ " 'content': 'Graduate Research Assistant for LLM team'}]},\n",
68
+ " 'company': {'segments': [{'type': 'link',\n",
69
+ " 'content': 'NASA-IMPACT @ UAH',\n",
70
+ " 'url': 'https://www.earthdata.nasa.gov/about/impact'}]},\n",
71
+ " 'location': {'segments': []},\n",
72
+ " 'date_description': {'segments': [{'type': 'text',\n",
73
+ " 'content': 'August 2024 - Present'}]},\n",
74
+ " 'description': [{'segments': [{'type': 'text',\n",
75
+ " 'content': 'Conducted comparative evaluations of NASA’s Deep Literature Search Agent against Gemini and OpenAI systems using LLM-as-judge metrics (contextual precision, recall, relevance, faithfulness), enhancing reproducibility and trust in autonomous scientific research agents.'}]},\n",
76
+ " {'segments': [{'type': 'text',\n",
77
+ " 'content': 'Implemented DDP multi-GPU multi-stage training of a scientific sentence transformer using text/code pairs, achieving superior performance on science-domain information retrieval benchmarks and advancing scientific search and discovery tools.'}]},\n",
78
+ " {'segments': [{'type': 'text',\n",
79
+ " 'content': 'Pretrained a RoBERTa-based science embedding model on 520K NASA documents with extended 1024-token input and Weighted Keyword Based Dynamic Masking, achieving 78.1% top-1 MLM accuracy and outperforming baselines on keyword tagging and astrophysics tasks.'}]},\n",
80
+ " {'segments': [{'type': 'text',\n",
81
+ " 'content': 'Developed a modular multi-task fine-tuning pipeline using Hugging Face and W&B with plug-and-play config-based training/evaluation and automatic Excel reporting, streamlining model comparison and boosting team productivity.'}]},\n",
82
+ " {'segments': [{'type': 'text',\n",
83
+ " 'content': 'Built an extreme multi-label classifier for NASA CMR, scaling from 430 to 3,240 science keywords using Focal Loss and custom stratified sampling, improving F1 to 0.55 and enhancing metadata accuracy and dataset discoverability.'}]}]},\n",
84
+ " {'role': {'segments': [{'type': 'text',\n",
85
+ " 'content': 'Machine Learning Engineer'}]},\n",
86
+ " 'company': {'segments': [{'type': 'link',\n",
87
+ " 'content': 'Cedar Gate Technologies',\n",
88
+ " 'url': 'https://www.cedargate.com/'}]},\n",
89
+ " 'location': {'segments': []},\n",
90
+ " 'date_description': {'segments': [{'type': 'text',\n",
91
+ " 'content': 'July 2022 - July 2024'}]},\n",
92
+ " 'description': [{'segments': [{'type': 'text',\n",
93
+ " 'content': 'Automated ETL field mapping by fine-tuning DistilBERT for multilabel classification, achieving 0.95 recall and 0.7 IoU, and initiated full ETL automation by fine-tuning Mistral-7B to autogenerate internal data transformation scripts.'}]},\n",
94
+ " {'segments': [{'type': 'text',\n",
95
+ " 'content': 'Performed network analysis on healthcare providers to correlate patient-sharing patterns with medical costs for chronic conditions, revealing key cost drivers through data mining.'}]},\n",
96
+ " {'segments': [{'type': 'text',\n",
97
+ " 'content': 'Developed a LightGBM model to predict healthcare cost-risk (MARA scores) with an R2 of 0.74 and MCC of 0.45, enabling proactive care management.'}]},\n",
98
+ " {'segments': [{'type': 'text',\n",
99
+ " 'content': 'Optimized segmentation of frequent ER visitors by evaluating scaling, feature extraction, and clustering methods, identifying K-Means (6 clusters) with an auto-encoder as the most effective model for clear cluster discrimination.'}]},\n",
100
+ " {'segments': [{'type': 'text',\n",
101
+ " 'content': 'Analyzed local model explainability tools (permutation SHAP, Deep Explainer, LIME), identifying FastSHAP as the optimal solution for a production diabetes model based on speed and performance (87.2% Inclusion AUC).'}]}]},\n",
102
+ " {'role': {'segments': [{'type': 'link',\n",
103
+ " 'content': 'Machine Learning Engineer',\n",
104
+ " 'url': 'https://photos.app.goo.gl/vVbF4bHvcjuexqnL6'}]},\n",
105
+ " 'company': {'segments': [{'type': 'link',\n",
106
+ " 'content': 'Docsumo',\n",
107
+ " 'url': 'https://www.docsumo.com/'}]},\n",
108
+ " 'location': {'segments': []},\n",
109
+ " 'date_description': {'segments': [{'type': 'text',\n",
110
+ " 'content': 'March 2022 - June 2022'}]},\n",
111
+ " 'description': [{'segments': [{'type': 'text',\n",
112
+ " 'content': 'Evaluated multiple document reading order detection techniques (e.g., DBSCAN, recursive XY-cut, layout reader) to enhance Named Entity Recognition (NER) performance on complex layouts, measuring success with ROUGE-L and BLEU scores for improved information extraction.'}]},\n",
113
+ " {'segments': [{'type': 'text',\n",
114
+ " 'content': 'Benchmarked spaCy v2 vs. v3 Named Entity Recognition (NER) pipelines for information extraction from OCR-scanned documents based on performance, speed, and size, providing key data for a strategic upgrade decision.'}]}]},\n",
115
+ " {'role': {'segments': [{'type': 'link',\n",
116
+ " 'content': 'Associate Data Engineer',\n",
117
+ " 'url': 'https://photos.app.goo.gl/zjsUJiMr6ZmVhfqz9'}]},\n",
118
+ " 'company': {'segments': [{'type': 'link',\n",
119
+ " 'content': 'Deerwalk',\n",
120
+ " 'url': 'https://www.cedargate.com/'}]},\n",
121
+ " 'location': {'segments': []},\n",
122
+ " 'date_description': {'segments': [{'type': 'text',\n",
123
+ " 'content': 'May 2021 - Feb 2022'}]},\n",
124
+ " 'description': [{'segments': [{'type': 'text',\n",
125
+ " 'content': 'Managed ETL processes for new vendor onboarding, ensuring data integrity for US healthcare data, and resolved critical production issues related to data processing and client requests.'}]}]}],\n",
126
+ " 'education': [{'degree': {'segments': [{'type': 'link',\n",
127
+ " 'content': 'Master’s in Computer Science - Data Science (Concentration)',\n",
128
+ " 'url': 'https://www.uah.edu/science/departments/computer-science/cs-graduate-programs'}]},\n",
129
+ " 'university': {'segments': [{'type': 'link',\n",
130
+ " 'content': 'University of Alabama in Huntsville',\n",
131
+ " 'url': 'https://www.uah.edu/'}]},\n",
132
+ " 'location': {'segments': [{'type': 'text',\n",
133
+ " 'content': 'Huntsville, Alabama,USA'}]},\n",
134
+ " 'date_description': {'segments': [{'type': 'text',\n",
135
+ " 'content': '2024 - 2026'}]},\n",
136
+ " 'grade': {'segments': [{'type': 'text',\n",
137
+ " 'content': 'Current GPA: 4.0/4.0'}]},\n",
138
+ " 'courses': None},\n",
139
+ " {'degree': {'segments': [{'type': 'link',\n",
140
+ " 'content': 'B.E. Electronics & Communication',\n",
141
+ " 'url': 'https://doece.pcampus.edu.np/index.php/bex-becie/'}]},\n",
142
+ " 'university': {'segments': [{'type': 'link',\n",
143
+ " 'content': 'Institute of Engineering, Pulchowk Campus, Tribhuvan University',\n",
144
+ " 'url': 'https://pcampus.edu.np/'}]},\n",
145
+ " 'location': {'segments': [{'type': 'text', 'content': 'Lalitpur, Nepal'}]},\n",
146
+ " 'date_description': {'segments': [{'type': 'text',\n",
147
+ " 'content': '2016 - 2021'}]},\n",
148
+ " 'grade': {'segments': [{'type': 'text',\n",
149
+ " 'content': 'Full Scholarship; Aggregate: 79.45%; Rank: '},\n",
150
+ " {'type': 'link',\n",
151
+ " 'content': '8th',\n",
152
+ " 'url': 'https://photos.app.goo.gl/C4QsvJgsfx9jguQn7'},\n",
153
+ " {'type': 'text', 'content': ' Position (Top 1% in University)'}]},\n",
154
+ " 'courses': None}],\n",
155
+ " 'skill_sections': [{'name': {'segments': [{'type': 'text',\n",
156
+ " 'content': 'Machine Learning & Deep Learning Frameworks'}]},\n",
157
+ " 'skills': [{'segments': [{'type': 'text', 'content': 'Pytorch'}]},\n",
158
+ " {'segments': [{'type': 'text', 'content': 'Transformers'}]},\n",
159
+ " {'segments': [{'type': 'text', 'content': 'OpenCV'}]},\n",
160
+ " {'segments': [{'type': 'text', 'content': 'Scikit-Learn'}]},\n",
161
+ " {'segments': [{'type': 'text', 'content': 'Spacy'}]},\n",
162
+ " {'segments': [{'type': 'text', 'content': 'Keras'}]},\n",
163
+ " {'segments': [{'type': 'text', 'content': 'W&B'}]},\n",
164
+ " {'segments': [{'type': 'text', 'content': 'Imbalanced-Learn'}]},\n",
165
+ " {'segments': [{'type': 'text', 'content': 'Hyperopt'}]}]},\n",
166
+ " {'name': {'segments': [{'type': 'text',\n",
167
+ " 'content': 'Programming Languages'}]},\n",
168
+ " 'skills': [{'segments': [{'type': 'text', 'content': 'Python'}]},\n",
169
+ " {'segments': [{'type': 'text', 'content': 'SQL'}]},\n",
170
+ " {'segments': [{'type': 'text', 'content': 'C++'}]}]},\n",
171
+ " {'name': {'segments': [{'type': 'text',\n",
172
+ " 'content': 'Data Processing & Big Data'}]},\n",
173
+ " 'skills': [{'segments': [{'type': 'text', 'content': 'Pyspark'}]},\n",
174
+ " {'segments': [{'type': 'text', 'content': 'Hadoop'}]},\n",
175
+ " {'segments': [{'type': 'text', 'content': 'Pandas'}]},\n",
176
+ " {'segments': [{'type': 'text', 'content': 'Numpy'}]},\n",
177
+ " {'segments': [{'type': 'text', 'content': 'Dask'}]},\n",
178
+ " {'segments': [{'type': 'text', 'content': 'Scipy'}]}]},\n",
179
+ " {'name': {'segments': [{'type': 'text', 'content': 'Cloud & MLOps'}]},\n",
180
+ " 'skills': [{'segments': [{'type': 'text',\n",
181
+ " 'content': 'Amazon EMR (Hadoop/Serverless)'}]},\n",
182
+ " {'segments': [{'type': 'text', 'content': 'Amazon S3'}]},\n",
183
+ " {'segments': [{'type': 'text', 'content': 'Amazon Redshift'}]},\n",
184
+ " {'segments': [{'type': 'text', 'content': 'Amazon EC2'}]},\n",
185
+ " {'segments': [{'type': 'text', 'content': 'FastAPI'}]},\n",
186
+ " {'segments': [{'type': 'text', 'content': 'Flask'}]},\n",
187
+ " {'segments': [{'type': 'text', 'content': 'Rest framework'}]}]},\n",
188
+ " {'name': {'segments': [{'type': 'text',\n",
189
+ " 'content': 'Data Analysis & Visualization'}]},\n",
190
+ " 'skills': [{'segments': [{'type': 'text', 'content': 'Matplotlib'}]},\n",
191
+ " {'segments': [{'type': 'text', 'content': 'Seaborn'}]},\n",
192
+ " {'segments': [{'type': 'text', 'content': 'Plotly'}]}]}],\n",
193
+ " 'projects': [{'name': {'segments': [{'type': 'link',\n",
194
+ " 'content': 'Funny Project',\n",
195
+ " 'url': 'https://github.com/AwaleSajil/FunnyProject'}]},\n",
196
+ " 'type': {'segments': [{'type': 'text', 'content': 'BigData Project'}]},\n",
197
+ " 'link': {'type': 'link',\n",
198
+ " 'content': 'GitHub',\n",
199
+ " 'url': 'https://github.com/AwaleSajil/FunnyProject'},\n",
200
+ " 'resources': [],\n",
201
+ " 'date_description': {'segments': [{'type': 'text', 'content': '2024'}]},\n",
202
+ " 'description': [{'segments': [{'type': 'text',\n",
203
+ " 'content': 'Engineered a two-stage NLP pipeline to classify 570,000+ jokes by humor, offensiveness, and sentiment, achieving a 0.86 weighted F1-score by fine-tuning a BERT model on a 55k-sample dataset labeled by local LLMs (Mistral, Gemma3); subsequently Dockerized the inference pipeline for scalable deployment in content understanding scenarios.'}]}]},\n",
204
+ " {'name': {'segments': [{'type': 'link',\n",
205
+ " 'content': 'Real Time Visual Localisation and Mapping of Mobile Robot in Dynamic Environment',\n",
206
+ " 'url': 'https://docs.google.com/presentation/d/1wIIM4iFpwIpxakO2Fubz-7BHouQZl0Tq2z-sj45jA7Y/edit?usp=sharing'}]},\n",
207
+ " 'type': {'segments': [{'type': 'text',\n",
208
+ " 'content': 'College Major Project'}]},\n",
209
+ " 'link': {'type': 'link',\n",
210
+ " 'content': 'Presentation',\n",
211
+ " 'url': 'https://docs.google.com/presentation/d/1wIIM4iFpwIpxakO02Fubz-7BHouQZl0Tq2z-sj45jA7Y/edit?usp=sharing'},\n",
212
+ " 'resources': [],\n",
213
+ " 'date_description': {'segments': [{'type': 'text',\n",
214
+ " 'content': '2019 - 2020'}]},\n",
215
+ " 'description': [{'segments': [{'type': 'text',\n",
216
+ " 'content': 'Developed a real-time Visual SLAM system for mobile robots, reconstructing 3D scenes from 2D images and enhancing robustness in dynamic environments by fine-tuning and applying ICNet, a semantic segmentation model, to accurately mask and disregard prevalent dynamic objects like humans from visual landmarks.'}]}]},\n",
217
+ " {'name': {'segments': [{'type': 'link',\n",
218
+ " 'content': 'Image Auto Alignment',\n",
219
+ " 'url': 'https://docs.google.com/presentation/d/1n_Hv8l0_MGLNv62PnMt76vsGcCsRyS7TZuXABBb1jmc/edit?usp=sharing'}]},\n",
220
+ " 'type': {'segments': [{'type': 'text', 'content': 'Weekend Project'}]},\n",
221
+ " 'link': {'type': 'link',\n",
222
+ " 'content': 'Presentation',\n",
223
+ " 'url': 'https://docs.google.com/presentation/d/1n_Hv8l0_MGLNv62PnMt76vsGcCsRyS7TZuXABBb1jmc/edit?usp=sharing'},\n",
224
+ " 'resources': [],\n",
225
+ " 'date_description': {'segments': [{'type': 'text', 'content': '2024'}]},\n",
226
+ " 'description': [{'segments': [{'type': 'text',\n",
227
+ " 'content': 'Engineered two computer vision solutions for automated image rotation correction: a rule-based Flask API for document orientation (leveraging line detection and text-weight heuristics) and an ML-based model utilizing MobileNetV2 for general images, framed as a regression task that achieved 2.6° MAE on a self-supervised Flickr dataset.'}]}]},\n",
228
+ " {'name': {'segments': [{'type': 'link',\n",
229
+ " 'content': 'Precision Livestock Farming — Improving Productivity of Broiler Chicken farm with technology',\n",
230
+ " 'url': 'https://docs.google.com/presentation/d/1fiFOfY1UPjH535EpBikLpd_-gw37Te88by8h5pYT5U/edit?usp=sharing'}]},\n",
231
+ " 'type': {'segments': [{'type': 'text', 'content': 'LOCUS 2019 Project'}]},\n",
232
+ " 'link': {'type': 'link',\n",
233
+ " 'content': 'Presentation',\n",
234
+ " 'url': 'https://docs.google.com/presentation/d/1fiFOfY1UPjH535EpBikLpd_-gw37Te88by8h5pYT5U/edit?usp=sharing'},\n",
235
+ " 'resources': [],\n",
236
+ " 'date_description': {'segments': [{'type': 'text', 'content': '2019'}]},\n",
237
+ " 'description': [{'segments': [{'type': 'text',\n",
238
+ " 'content': 'Developed a Precision Livestock Farming system integrating computer vision (YOLO for chicken detection, SORT for mobility tracking) and audio analysis (feeder microphone for eating behavior estimation) to effectively monitor broiler chickens and optimize environmental conditions.'}]}]},\n",
239
+ " {'name': {'segments': [{'type': 'link',\n",
240
+ " 'content': 'Vehicle Traffic Analysis and Management',\n",
241
+ " 'url': 'https://drive.google.com/file/d/1N8H8YIwp3FTOcbLwzW4fNXuKU48Yo1D2/view?usp=sharing'}]},\n",
242
+ " 'type': {'segments': [{'type': 'text',\n",
243
+ " 'content': 'College Minor Project'}]},\n",
244
+ " 'link': {'type': 'link',\n",
245
+ " 'content': 'Document',\n",
246
+ " 'url': 'https://drive.google.com/file/d/1N8H8YIwp3FTOcbLwzW4fNXuKU48Yo1D2/view?usp=sharing'},\n",
247
+ " 'resources': [],\n",
248
+ " 'date_description': {'segments': [{'type': 'text', 'content': '2019'}]},\n",
249
+ " 'description': [{'segments': [{'type': 'text',\n",
250
+ " 'content': 'Assessed and managed traffic flow at road junctions by implementing vehicle counting using YOLO and SORT from diverse video sources, and applied the Webster algorithm to determine optimal traffic signal timings, improving traffic efficiency.'}]}]}],\n",
251
+ " 'certifications': [{'certificate_info': {'segments': [{'type': 'link',\n",
252
+ " 'content': 'Deep Learning Specialization by DeepLearning.AI on Coursera.',\n",
253
+ " 'url': 'https://www.coursera.org/account/accomplishments/specialization/CMV425VZYK92?utm_source=link&utm_medium=certificate&utm_content=cert_image&utm_campaign=sharing_cta&utm_product=s12n'}]},\n",
254
+ " 'date': None},\n",
255
+ " {'certificate_info': {'segments': [{'type': 'link',\n",
256
+ " 'content': 'Applied Deep Learning Capstone Project by ibm on edx.',\n",
257
+ " 'url': 'https://courses.edx.org/certificates/6154999d04c34c329bd68f3fcbd7e0a2'}]},\n",
258
+ " 'date': None},\n",
259
+ " {'certificate_info': {'segments': [{'type': 'link',\n",
260
+ " 'content': 'Specialized Models: Time Series and Survival Analysis on Coursera',\n",
261
+ " 'url': 'https://www.coursera.org/account/accomplishments/certificate/5U3ZQ9767CRW'}]},\n",
262
+ " 'date': None}],\n",
263
+ " 'achievements': [{'name': {'segments': [{'type': 'link',\n",
264
+ " 'content': 'Institute of Engineering Scholarship for BE',\n",
265
+ " 'url': 'https://media.edusanjal.com/redactor/Download%20TU%20IOE%20Entrance%20Examination%20Result.pdf'}]},\n",
266
+ " 'issued_by': {'segments': [{'type': 'link',\n",
267
+ " 'content': 'Tribhuvan University, IOE',\n",
268
+ " 'url': 'https://tu.edu.np/pages/institute-of-engineering-4'}]},\n",
269
+ " 'date': {'segments': [{'type': 'text', 'content': '2017'}]},\n",
270
+ " 'description': [{'segments': [{'type': 'text',\n",
271
+ " 'content': 'Received full scholarship to study engineering in the most reputed engineering college of Nepal for securing 58th rank in a competitive entrance examination given by more than ten thousand students.'}]}]},\n",
272
+ " {'name': {'segments': [{'type': 'link',\n",
273
+ " 'content': 'Best Thematic Hardware Project',\n",
274
+ " 'url': 'https://photos.app.goo.gl/KDFNt1KtSUU9xkkXA'}]},\n",
275
+ " 'issued_by': {'segments': [{'type': 'link',\n",
276
+ " 'content': 'LOCUS',\n",
277
+ " 'url': 'https://locus.pcampus.edu.np/'}]},\n",
278
+ " 'date': {'segments': [{'type': 'text', 'content': '2019'}]},\n",
279
+ " 'description': [{'segments': [{'type': 'text',\n",
280
+ " 'content': \"Awarded for 'Precision Livestock Farming' during the 16th National Technological Festival held by LOCUS, Pulchowk Campus.\"}]}]}],\n",
281
+ " 'custom_sections': {'Exchange Program and Fellowship': [{'title': {'segments': [{'type': 'link',\n",
282
+ " 'content': 'First Nepal Winter School in AI',\n",
283
+ " 'url': 'https://photos.app.goo.gl/kBatEMLzQqRJKU37'}]},\n",
284
+ " 'subtitle': {'segments': [{'type': 'link',\n",
285
+ " 'content': 'Nepal Applied Mathematics and Informatics Institute for Research (NAAMII)',\n",
286
+ " 'url': 'https://www.naamii.org.np/'}]},\n",
287
+ " 'date_description': {'segments': [{'type': 'text',\n",
288
+ " 'content': '20th - 30th Dec, 2018'}]},\n",
289
+ " 'description': [{'segments': [{'type': 'text',\n",
290
+ " 'content': 'Gained foundational knowledge in Deep Learning, probability, statistics, and linear algebra from esteemed professors, crucial for advanced algorithm development.'}]},\n",
291
+ " {'segments': [{'type': 'text',\n",
292
+ " 'content': 'Completed hands-on lab assignments directly applying concepts in computer vision and natural language processing (NLP), aligning with key model development areas for content safety.'}]}]},\n",
293
+ " {'title': {'segments': [{'type': 'link',\n",
294
+ " 'content': 'Sakura Science Exchange Program',\n",
295
+ " 'url': 'https://photos.app.goo.gl/P8gFatguLP5F1kmM9'}]},\n",
296
+ " 'subtitle': {'segments': [{'type': 'link',\n",
297
+ " 'content': 'Japan Science and Technology Agency',\n",
298
+ " 'url': 'https://www.jst.go.jp/EN/'}]},\n",
299
+ " 'date_description': {'segments': [{'type': 'text',\n",
300
+ " 'content': '16th - 23th Dec, 2019'}]},\n",
301
+ " 'description': [{'segments': [{'type': 'text',\n",
302
+ " 'content': 'Selected as one of the top 3 students, demonstrating strong academic capability and a proactive approach to learning cutting-edge technologies.'}]},\n",
303
+ " {'segments': [{'type': 'text',\n",
304
+ " 'content': 'Engaged in technical exchange with international peers, presenting a poster and discussing solutions, enhancing communication and teamwork skills.'}]},\n",
305
+ " {'segments': [{'type': 'text',\n",
306
+ " 'content': 'Participated in sessions on advanced Artificial Intelligence and IoT, fostering a keen interest in innovative technological solutions relevant to content understanding.'}]}]}],\n",
307
+ " 'Volunteering and Teaching experience': [{'title': {'segments': [{'type': 'text',\n",
308
+ " 'content': 'Training on ML Applications'}]},\n",
309
+ " 'subtitle': {'segments': [{'type': 'text',\n",
310
+ " 'content': 'Mentors Club, Cedargate'}]},\n",
311
+ " 'date_description': {'segments': [{'type': 'text', 'content': '2023'}]},\n",
312
+ " 'description': [{'segments': [{'type': 'text',\n",
313
+ " 'content': 'Conducted comprehensive sessions for all Cedargate employees in Nepal, covering a variety of ML algorithms and providing insights into operational procedures and production-contributing projects.'}]}]},\n",
314
+ " {'title': {'segments': [{'type': 'link',\n",
315
+ " 'content': 'Hardware Fellowship',\n",
316
+ " 'url': 'https://photos.app.goo.gl/pM3E4DLs12xjgP7FA'}]},\n",
317
+ " 'subtitle': {'segments': [{'type': 'link',\n",
318
+ " 'content': 'LOCUS',\n",
319
+ " 'url': 'https://www.facebook.com/locus.ioe/posts/hardware-fellowshiphope-you-guys-are-learninglocusnepal2019/825855697584931/'}]},\n",
320
+ " 'date_description': {'segments': [{'type': 'text', 'content': '2020'}]},\n",
321
+ " 'description': [{'segments': [{'type': 'text',\n",
322
+ " 'content': 'Instructed nearly 100 students in Arduino programming and electronic hardware design, demonstrating strong communication skills and foundational technical knowledge relevant to algorithm development.'}]},\n",
323
+ " {'segments': [{'type': 'text',\n",
324
+ " 'content': 'Mentored a team of junior-year students through a project to create a 2D CNC plotter, showcasing leadership, teamwork, and an ability to guide technical projects.'}]}]}],\n",
325
+ " 'References': [{'title': {'segments': [{'type': 'text',\n",
326
+ " 'content': 'Tathagata Mukharjee, Professor at University of Alabama in Huntsville'}]},\n",
327
+ " 'subtitle': {'segments': [{'type': 'text', 'content': 'tm0130@uh.edu'}]},\n",
328
+ " 'date_description': None,\n",
329
+ " 'description': None},\n",
330
+ " {'title': {'segments': [{'type': 'text',\n",
331
+ " 'content': 'Stacey Finn, Director of Data Science and Analytics at CedarGate'}]},\n",
332
+ " 'subtitle': {'segments': [{'type': 'text',\n",
333
+ " 'content': 'safinn5@gmail.com'}]},\n",
334
+ " 'date_description': None,\n",
335
+ " 'description': None}]}}\n",
336
+ "# The path where you want the PDF to be saved\n",
337
+ "output_pdf_path = os.path.join(os.getcwd(), \"my_resume.pdf\")\n",
338
+ "\n",
339
+ "# Generate\n",
340
+ "result = json_to_latex_pdf(data, output_pdf_path, template_name=\"resume.tex.jinja\")\n",
341
+ "\n",
342
+ "if result:\n",
343
+ " print(f\"PDF successfully generated at: {output_pdf_path}\")\n",
344
+ "\n"
345
+ ]
346
+ },
347
+ {
348
+ "cell_type": "code",
349
+ "execution_count": 8,
350
+ "metadata": {},
351
+ "outputs": [
352
+ {
353
+ "data": {
354
+ "text/plain": [
355
+ "'/Users/sawale/Documents/learning/resumer/notebooks'"
356
+ ]
357
+ },
358
+ "execution_count": 8,
359
+ "metadata": {},
360
+ "output_type": "execute_result"
361
+ }
362
+ ],
363
+ "source": [
364
+ "os.getcwd()"
365
+ ]
366
+ },
367
+ {
368
+ "cell_type": "code",
369
+ "execution_count": null,
370
+ "metadata": {},
371
+ "outputs": [],
372
+ "source": []
373
+ },
374
+ {
375
+ "cell_type": "code",
376
+ "execution_count": null,
377
+ "metadata": {},
378
+ "outputs": [],
379
+ "source": []
380
+ },
381
+ {
382
+ "cell_type": "code",
383
+ "execution_count": null,
384
+ "metadata": {},
385
+ "outputs": [],
386
+ "source": []
387
+ }
388
+ ],
389
+ "metadata": {
390
+ "kernelspec": {
391
+ "display_name": ".venv",
392
+ "language": "python",
393
+ "name": "python3"
394
+ },
395
+ "language_info": {
396
+ "codemirror_mode": {
397
+ "name": "ipython",
398
+ "version": 3
399
+ },
400
+ "file_extension": ".py",
401
+ "mimetype": "text/x-python",
402
+ "name": "python",
403
+ "nbconvert_exporter": "python",
404
+ "pygments_lexer": "ipython3",
405
+ "version": "3.12.7"
406
+ }
407
+ },
408
+ "nbformat": 4,
409
+ "nbformat_minor": 2
410
+ }
notebooks/7_test_with_openapi.ipynb ADDED
@@ -0,0 +1,337 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "code",
5
+ "execution_count": 1,
6
+ "metadata": {},
7
+ "outputs": [],
8
+ "source": [
9
+ "%reload_ext autoreload\n",
10
+ "%autoreload 2"
11
+ ]
12
+ },
13
+ {
14
+ "cell_type": "code",
15
+ "execution_count": 2,
16
+ "metadata": {},
17
+ "outputs": [
18
+ {
19
+ "name": "stdout",
20
+ "output_type": "stream",
21
+ "text": [
22
+ "Added to path: /Users/sawale/Documents/learning/resumer\n"
23
+ ]
24
+ }
25
+ ],
26
+ "source": [
27
+ "import sys\n",
28
+ "import os\n",
29
+ "from pathlib import Path\n",
30
+ "\n",
31
+ "# Use Path.cwd() instead of __file__ in Notebooks\n",
32
+ "parent_dir = str(Path.cwd().parent)\n",
33
+ "\n",
34
+ "if parent_dir not in sys.path:\n",
35
+ " sys.path.append(parent_dir)\n",
36
+ "\n",
37
+ "print(f\"Added to path: {parent_dir}\")"
38
+ ]
39
+ },
40
+ {
41
+ "cell_type": "code",
42
+ "execution_count": 3,
43
+ "metadata": {},
44
+ "outputs": [],
45
+ "source": [
46
+ "import os\n",
47
+ "import instructor\n",
48
+ "from openai import OpenAI, AsyncOpenAI # Use AsyncOpenAI for your 'aclient'\n",
49
+ "from pydantic import BaseModel\n",
50
+ "from dotenv import load_dotenv\n",
51
+ "\n",
52
+ "load_dotenv()\n",
53
+ "\n",
54
+ "# 1. Initialize the standard OpenAI Client\n",
55
+ "# This automatically looks for OPENAI_API_KEY in your .env\n",
56
+ "native_client = AsyncOpenAI(\n",
57
+ " api_key=os.environ.get(\"OPENAI_API_KEY\")\n",
58
+ ")\n",
59
+ "\n",
60
+ "# 2. Patch the client with Instructor\n",
61
+ "# For OpenAI, the recommended mode is TOOLS\n",
62
+ "aclient = instructor.from_openai(\n",
63
+ " native_client, \n",
64
+ " mode=instructor.Mode.TOOLS\n",
65
+ ")"
66
+ ]
67
+ },
68
+ {
69
+ "cell_type": "code",
70
+ "execution_count": 4,
71
+ "metadata": {},
72
+ "outputs": [
73
+ {
74
+ "data": {
75
+ "text/plain": [
76
+ "<instructor.core.client.AsyncInstructor at 0x12c89ddc0>"
77
+ ]
78
+ },
79
+ "execution_count": 4,
80
+ "metadata": {},
81
+ "output_type": "execute_result"
82
+ }
83
+ ],
84
+ "source": [
85
+ "aclient"
86
+ ]
87
+ },
88
+ {
89
+ "cell_type": "code",
90
+ "execution_count": 5,
91
+ "metadata": {},
92
+ "outputs": [
93
+ {
94
+ "name": "stdout",
95
+ "output_type": "stream",
96
+ "text": [
97
+ "Consider using the pymupdf_layout package for a greatly improved page layout analysis.\n"
98
+ ]
99
+ }
100
+ ],
101
+ "source": [
102
+ "from resumer import ResumeTailorPipeline"
103
+ ]
104
+ },
105
+ {
106
+ "cell_type": "code",
107
+ "execution_count": 6,
108
+ "metadata": {},
109
+ "outputs": [],
110
+ "source": [
111
+ "pp = ResumeTailorPipeline(\n",
112
+ " aclient = aclient, \n",
113
+ " model_name = \"gpt-4o\",\n",
114
+ " resume_path = \"/Users/sawale/Documents/learning/resumer/resumer/demo/Sajil_Awale_CV_2025.pdf\", \n",
115
+ " output_dir= \"./output/\"\n",
116
+ ")\n"
117
+ ]
118
+ },
119
+ {
120
+ "cell_type": "code",
121
+ "execution_count": 7,
122
+ "metadata": {},
123
+ "outputs": [
124
+ {
125
+ "name": "stdout",
126
+ "output_type": "stream",
127
+ "text": [
128
+ "--- Scraping job details from: https://lifeattiktok.com/search/7527589557336869138 ---\n",
129
+ "--- Extracting job info via LLM ---\n",
130
+ "--- Cache miss: Extracting resume info via LLM ---\n",
131
+ "--- Successfully extracted both Resume and Job data ---\n",
132
+ "--- Adding section: summary ---\n",
133
+ "--- Adding section: work_experience ---\n",
134
+ "--- Adding section: education ---\n",
135
+ "--- Adding section: skill_sections ---\n",
136
+ "--- Adding section: projects ---\n",
137
+ "--- Adding section: certifications ---\n",
138
+ "--- Adding section: achievements ---\n",
139
+ "--- Adding section: Exchange Program and Fellowship ---\n",
140
+ "--- Adding section: Volunteering and Teaching experience ---\n",
141
+ "Running command: pdflatex -interaction=nonstopmode -output-directory=./output ./output/tailored_resume.tex\n",
142
+ "PDF generated at: ./output/tailored_resume.pdf\n"
143
+ ]
144
+ }
145
+ ],
146
+ "source": [
147
+ "await pp.generate_tailored_resume(job_url=\"https://lifeattiktok.com/search/7527589557336869138\")"
148
+ ]
149
+ },
150
+ {
151
+ "cell_type": "code",
152
+ "execution_count": null,
153
+ "metadata": {},
154
+ "outputs": [],
155
+ "source": [
156
+ "from resumer.utils.latex_ops import json_to_latex_pdf\n",
157
+ "x = json_to_latex_pdf(pp.resume_details, os.path.join(pp.output_dir, \"tailored_resume.pdf\"))"
158
+ ]
159
+ },
160
+ {
161
+ "cell_type": "code",
162
+ "execution_count": null,
163
+ "metadata": {},
164
+ "outputs": [],
165
+ "source": [
166
+ "pp.resume_details"
167
+ ]
168
+ },
169
+ {
170
+ "cell_type": "code",
171
+ "execution_count": null,
172
+ "metadata": {},
173
+ "outputs": [],
174
+ "source": [
175
+ "pp.resume_details[\"custom_sections\"].keys()"
176
+ ]
177
+ },
178
+ {
179
+ "cell_type": "code",
180
+ "execution_count": null,
181
+ "metadata": {},
182
+ "outputs": [],
183
+ "source": [
184
+ "pp.resume_details[\"custom_sections\"][\"References\"]"
185
+ ]
186
+ },
187
+ {
188
+ "cell_type": "code",
189
+ "execution_count": null,
190
+ "metadata": {},
191
+ "outputs": [],
192
+ "source": [
193
+ "pp.resume_details[\"custom_sections\"]"
194
+ ]
195
+ },
196
+ {
197
+ "cell_type": "code",
198
+ "execution_count": null,
199
+ "metadata": {},
200
+ "outputs": [],
201
+ "source": [
202
+ "pp.resume_info.model_dump()"
203
+ ]
204
+ },
205
+ {
206
+ "cell_type": "code",
207
+ "execution_count": null,
208
+ "metadata": {},
209
+ "outputs": [],
210
+ "source": [
211
+ "pp.job_info"
212
+ ]
213
+ },
214
+ {
215
+ "cell_type": "code",
216
+ "execution_count": null,
217
+ "metadata": {},
218
+ "outputs": [],
219
+ "source": [
220
+ "pp.resume_info.model_dump().keys()"
221
+ ]
222
+ },
223
+ {
224
+ "cell_type": "code",
225
+ "execution_count": null,
226
+ "metadata": {},
227
+ "outputs": [],
228
+ "source": [
229
+ "# loop through custom sections\n",
230
+ "for section in getattr(pp.resume_info, \"custom_sections\"):\n",
231
+ " temp = section.section_name\n",
232
+ " print(temp.plain_text)\n"
233
+ ]
234
+ },
235
+ {
236
+ "cell_type": "code",
237
+ "execution_count": null,
238
+ "metadata": {},
239
+ "outputs": [],
240
+ "source": [
241
+ "pp.resume_info.custom_sections[2].model_dump()"
242
+ ]
243
+ },
244
+ {
245
+ "cell_type": "code",
246
+ "execution_count": null,
247
+ "metadata": {},
248
+ "outputs": [],
249
+ "source": [
250
+ "pp.resume_info.custom_sections"
251
+ ]
252
+ },
253
+ {
254
+ "cell_type": "code",
255
+ "execution_count": null,
256
+ "metadata": {},
257
+ "outputs": [],
258
+ "source": [
259
+ "# convert the custom section to structure like other noraml section\n",
260
+ "custom_output = {}\n",
261
+ "\n",
262
+ "\n",
263
+ "# loop trhough custom section\n",
264
+ "for csection in pp.resume_info.custom_sections:\n",
265
+ " # setting the key\n",
266
+ " key_name = csection.section_name.plain_text\n",
267
+ " custom_output[key_name] = csection.model_dump()[\"section_detail\"]\n",
268
+ " print(type(custom_output[key_name]))\n",
269
+ "\n",
270
+ "\n",
271
+ "# custom_output"
272
+ ]
273
+ },
274
+ {
275
+ "cell_type": "code",
276
+ "execution_count": null,
277
+ "metadata": {},
278
+ "outputs": [],
279
+ "source": [
280
+ "type(pp.resume_info.model_dump_json(include={\"summary\"}))"
281
+ ]
282
+ },
283
+ {
284
+ "cell_type": "code",
285
+ "execution_count": null,
286
+ "metadata": {},
287
+ "outputs": [],
288
+ "source": [
289
+ "pp.resume_info.model_dump_json(include={\"work_experience\"})"
290
+ ]
291
+ },
292
+ {
293
+ "cell_type": "code",
294
+ "execution_count": null,
295
+ "metadata": {},
296
+ "outputs": [],
297
+ "source": [
298
+ "pp.resume_info.model_dump_json(include={\"skill_sections\"})"
299
+ ]
300
+ },
301
+ {
302
+ "cell_type": "code",
303
+ "execution_count": null,
304
+ "metadata": {},
305
+ "outputs": [],
306
+ "source": []
307
+ },
308
+ {
309
+ "cell_type": "code",
310
+ "execution_count": null,
311
+ "metadata": {},
312
+ "outputs": [],
313
+ "source": []
314
+ }
315
+ ],
316
+ "metadata": {
317
+ "kernelspec": {
318
+ "display_name": ".venv",
319
+ "language": "python",
320
+ "name": "python3"
321
+ },
322
+ "language_info": {
323
+ "codemirror_mode": {
324
+ "name": "ipython",
325
+ "version": 3
326
+ },
327
+ "file_extension": ".py",
328
+ "mimetype": "text/x-python",
329
+ "name": "python",
330
+ "nbconvert_exporter": "python",
331
+ "pygments_lexer": "ipython3",
332
+ "version": "3.12.7"
333
+ }
334
+ },
335
+ "nbformat": 4,
336
+ "nbformat_minor": 2
337
+ }
notebooks/8_test_with_gemini_api.ipynb ADDED
@@ -0,0 +1,375 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "code",
5
+ "execution_count": 2,
6
+ "metadata": {},
7
+ "outputs": [],
8
+ "source": [
9
+ "%reload_ext autoreload\n",
10
+ "%autoreload 2"
11
+ ]
12
+ },
13
+ {
14
+ "cell_type": "code",
15
+ "execution_count": 3,
16
+ "metadata": {},
17
+ "outputs": [
18
+ {
19
+ "name": "stdout",
20
+ "output_type": "stream",
21
+ "text": [
22
+ "Added to path: /Users/sawale/Documents/learning/resumer\n"
23
+ ]
24
+ }
25
+ ],
26
+ "source": [
27
+ "import sys\n",
28
+ "import os\n",
29
+ "from pathlib import Path\n",
30
+ "\n",
31
+ "# Use Path.cwd() instead of __file__ in Notebooks\n",
32
+ "parent_dir = str(Path.cwd().parent)\n",
33
+ "\n",
34
+ "if parent_dir not in sys.path:\n",
35
+ " sys.path.append(parent_dir)\n",
36
+ "\n",
37
+ "print(f\"Added to path: {parent_dir}\")"
38
+ ]
39
+ },
40
+ {
41
+ "cell_type": "code",
42
+ "execution_count": 3,
43
+ "metadata": {},
44
+ "outputs": [],
45
+ "source": [
46
+ "import os\n",
47
+ "import instructor\n",
48
+ "from google import genai\n",
49
+ "from dotenv import load_dotenv\n",
50
+ "\n",
51
+ "load_dotenv()\n",
52
+ "\n",
53
+ "# 1. Initialize the GenAI Client for Google AI Studio (API Key)\n",
54
+ "# Make sure GEMINI_API_KEY is set in your .env file\n",
55
+ "native_client = genai.Client(\n",
56
+ " api_key=os.environ.get(\"GEMINI_API_KEY\")\n",
57
+ ")\n",
58
+ "\n",
59
+ "# 2. Patch the client with Instructor\n",
60
+ "# The mode remains GENAI_STRUCTURED_OUTPUTS\n",
61
+ "aclient = instructor.from_genai(\n",
62
+ " native_client, \n",
63
+ " # mode=instructor.Mode.GENAI_STRUCTURED_OUTPUTS, \n",
64
+ " mode=instructor.Mode.GENAI_TOOLS,\n",
65
+ " use_async=True\n",
66
+ ")"
67
+ ]
68
+ },
69
+ {
70
+ "cell_type": "code",
71
+ "execution_count": 4,
72
+ "metadata": {},
73
+ "outputs": [
74
+ {
75
+ "data": {
76
+ "text/plain": [
77
+ "<instructor.core.client.AsyncInstructor at 0x1357b5190>"
78
+ ]
79
+ },
80
+ "execution_count": 4,
81
+ "metadata": {},
82
+ "output_type": "execute_result"
83
+ }
84
+ ],
85
+ "source": [
86
+ "aclient"
87
+ ]
88
+ },
89
+ {
90
+ "cell_type": "code",
91
+ "execution_count": 5,
92
+ "metadata": {},
93
+ "outputs": [
94
+ {
95
+ "name": "stdout",
96
+ "output_type": "stream",
97
+ "text": [
98
+ "Consider using the pymupdf_layout package for a greatly improved page layout analysis.\n"
99
+ ]
100
+ }
101
+ ],
102
+ "source": [
103
+ "from resumer import ResumeTailorPipeline"
104
+ ]
105
+ },
106
+ {
107
+ "cell_type": "code",
108
+ "execution_count": 8,
109
+ "metadata": {},
110
+ "outputs": [],
111
+ "source": [
112
+ "pp = ResumeTailorPipeline(\n",
113
+ " aclient = aclient, \n",
114
+ " model_name = \"gemini-3-pro-preview\",\n",
115
+ " resume_path = \"/Users/sawale/Documents/learning/resumer/resumer/demo/Sajil_Awale_CV_2025.pdf\", \n",
116
+ " output_dir= \"./output/\"\n",
117
+ ")\n"
118
+ ]
119
+ },
120
+ {
121
+ "cell_type": "code",
122
+ "execution_count": null,
123
+ "metadata": {},
124
+ "outputs": [
125
+ {
126
+ "name": "stdout",
127
+ "output_type": "stream",
128
+ "text": [
129
+ "--- Scraping job details from: https://lifeattiktok.com/search/7527589557336869138 ---\n",
130
+ "--- Extracting job info via LLM ---\n",
131
+ "--- Cache miss: Extracting resume info via LLM ---\n",
132
+ "--- Successfully extracted both Resume and Job data ---\n",
133
+ "--- Adding section: summary ---\n",
134
+ "--- Adding section: work_experience ---\n",
135
+ "--- Adding section: education ---\n",
136
+ "--- Adding section: skill_sections ---\n",
137
+ "--- Adding section: projects ---\n",
138
+ "--- Adding section: certifications ---\n",
139
+ "--- Adding section: achievements ---\n",
140
+ "--- Adding section: research_works ---\n",
141
+ "## LLM decided this section is not relevant ##\n",
142
+ "--- Adding section: Exchange Program and Fellowship ---\n",
143
+ "--- Adding section: Volunteering and Teaching experience ---\n",
144
+ "--- Adding section: References ---\n",
145
+ "Running command: pdflatex -interaction=nonstopmode -output-directory=./output ./output/tailored_resume.tex\n",
146
+ "PDF generated at: ./output/tailored_resume.pdf\n"
147
+ ]
148
+ }
149
+ ],
150
+ "source": [
151
+ "await pp.generate_tailored_resume(job_url=\"https://lifeattiktok.com/search/7527589557336869138\")"
152
+ ]
153
+ },
154
+ {
155
+ "cell_type": "code",
156
+ "execution_count": null,
157
+ "metadata": {},
158
+ "outputs": [],
159
+ "source": [
160
+ "from resumer.utils.latex_ops import json_to_latex_pdf\n",
161
+ "x = json_to_latex_pdf(pp.resume_details, os.path.join(pp.output_dir, \"tailored_resume.pdf\"))"
162
+ ]
163
+ },
164
+ {
165
+ "cell_type": "code",
166
+ "execution_count": null,
167
+ "metadata": {},
168
+ "outputs": [],
169
+ "source": [
170
+ "pp.resume_details"
171
+ ]
172
+ },
173
+ {
174
+ "cell_type": "code",
175
+ "execution_count": null,
176
+ "metadata": {},
177
+ "outputs": [],
178
+ "source": [
179
+ "pp.resume_details[\"custom_sections\"].keys()"
180
+ ]
181
+ },
182
+ {
183
+ "cell_type": "code",
184
+ "execution_count": null,
185
+ "metadata": {},
186
+ "outputs": [],
187
+ "source": [
188
+ "pp.resume_details[\"custom_sections\"][\"References\"]"
189
+ ]
190
+ },
191
+ {
192
+ "cell_type": "code",
193
+ "execution_count": null,
194
+ "metadata": {},
195
+ "outputs": [],
196
+ "source": [
197
+ "pp.resume_details[\"custom_sections\"]"
198
+ ]
199
+ },
200
+ {
201
+ "cell_type": "code",
202
+ "execution_count": null,
203
+ "metadata": {},
204
+ "outputs": [],
205
+ "source": [
206
+ "pp.resume_info.model_dump()"
207
+ ]
208
+ },
209
+ {
210
+ "cell_type": "code",
211
+ "execution_count": null,
212
+ "metadata": {},
213
+ "outputs": [],
214
+ "source": [
215
+ "pp.job_info"
216
+ ]
217
+ },
218
+ {
219
+ "cell_type": "code",
220
+ "execution_count": null,
221
+ "metadata": {},
222
+ "outputs": [],
223
+ "source": [
224
+ "pp.resume_info.model_dump().keys()"
225
+ ]
226
+ },
227
+ {
228
+ "cell_type": "code",
229
+ "execution_count": null,
230
+ "metadata": {},
231
+ "outputs": [],
232
+ "source": [
233
+ "# loop through custom sections\n",
234
+ "for section in getattr(pp.resume_info, \"custom_sections\"):\n",
235
+ " temp = section.section_name\n",
236
+ " print(temp.plain_text)\n"
237
+ ]
238
+ },
239
+ {
240
+ "cell_type": "code",
241
+ "execution_count": null,
242
+ "metadata": {},
243
+ "outputs": [],
244
+ "source": [
245
+ "pp.resume_info.custom_sections[2].model_dump()"
246
+ ]
247
+ },
248
+ {
249
+ "cell_type": "code",
250
+ "execution_count": null,
251
+ "metadata": {},
252
+ "outputs": [],
253
+ "source": [
254
+ "pp.resume_info.custom_sections"
255
+ ]
256
+ },
257
+ {
258
+ "cell_type": "code",
259
+ "execution_count": null,
260
+ "metadata": {},
261
+ "outputs": [],
262
+ "source": [
263
+ "# convert the custom section to structure like other noraml section\n",
264
+ "custom_output = {}\n",
265
+ "\n",
266
+ "\n",
267
+ "# loop trhough custom section\n",
268
+ "for csection in pp.resume_info.custom_sections:\n",
269
+ " # setting the key\n",
270
+ " key_name = csection.section_name.plain_text\n",
271
+ " custom_output[key_name] = csection.model_dump()[\"section_detail\"]\n",
272
+ " print(type(custom_output[key_name]))\n",
273
+ "\n",
274
+ "\n",
275
+ "# custom_output"
276
+ ]
277
+ },
278
+ {
279
+ "cell_type": "code",
280
+ "execution_count": null,
281
+ "metadata": {},
282
+ "outputs": [],
283
+ "source": [
284
+ "type(pp.resume_info.model_dump_json(include={\"summary\"}))"
285
+ ]
286
+ },
287
+ {
288
+ "cell_type": "code",
289
+ "execution_count": null,
290
+ "metadata": {},
291
+ "outputs": [],
292
+ "source": [
293
+ "pp.resume_info.model_dump_json(include={\"work_experience\"})"
294
+ ]
295
+ },
296
+ {
297
+ "cell_type": "code",
298
+ "execution_count": null,
299
+ "metadata": {},
300
+ "outputs": [],
301
+ "source": [
302
+ "pp.resume_info.model_dump_json(include={\"skill_sections\"})"
303
+ ]
304
+ },
305
+ {
306
+ "cell_type": "code",
307
+ "execution_count": 4,
308
+ "metadata": {},
309
+ "outputs": [
310
+ {
311
+ "ename": "AttributeError",
312
+ "evalue": "module 'google.genai' has no attribute 'configure'",
313
+ "output_type": "error",
314
+ "traceback": [
315
+ "\u001b[31m---------------------------------------------------------------------------\u001b[39m",
316
+ "\u001b[31mAttributeError\u001b[39m Traceback (most recent call last)",
317
+ "\u001b[36mCell\u001b[39m\u001b[36m \u001b[39m\u001b[32mIn[4]\u001b[39m\u001b[32m, line 10\u001b[39m\n\u001b[32m 4\u001b[39m \u001b[38;5;28;01mfrom\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34;01mdotenv\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;28;01mimport\u001b[39;00m load_dotenv\n\u001b[32m 6\u001b[39m load_dotenv()\n\u001b[32m---> \u001b[39m\u001b[32m10\u001b[39m \u001b[43mgenai\u001b[49m\u001b[43m.\u001b[49m\u001b[43mconfigure\u001b[49m(api_key=os.environ.get(\u001b[33m\"\u001b[39m\u001b[33mGEMINI_API_KEY\u001b[39m\u001b[33m\"\u001b[39m))\n\u001b[32m 12\u001b[39m \u001b[38;5;28;01mfor\u001b[39;00m m \u001b[38;5;129;01min\u001b[39;00m genai.list_models():\n\u001b[32m 13\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m \u001b[33m'\u001b[39m\u001b[33mgenerateContent\u001b[39m\u001b[33m'\u001b[39m \u001b[38;5;129;01min\u001b[39;00m m.supported_generation_methods:\n",
318
+ "\u001b[31mAttributeError\u001b[39m: module 'google.genai' has no attribute 'configure'"
319
+ ]
320
+ }
321
+ ],
322
+ "source": [
323
+ "import os\n",
324
+ "import instructor\n",
325
+ "from google import genai\n",
326
+ "from dotenv import load_dotenv\n",
327
+ "\n",
328
+ "load_dotenv()\n",
329
+ "\n",
330
+ "\n",
331
+ "\n",
332
+ "genai.configure(api_key=os.environ.get(\"GEMINI_API_KEY\"))\n",
333
+ "\n",
334
+ "for m in genai.list_models():\n",
335
+ " if 'generateContent' in m.supported_generation_methods:\n",
336
+ " print(f\"Model Name: {m.name}\")"
337
+ ]
338
+ },
339
+ {
340
+ "cell_type": "code",
341
+ "execution_count": null,
342
+ "metadata": {},
343
+ "outputs": [],
344
+ "source": []
345
+ },
346
+ {
347
+ "cell_type": "code",
348
+ "execution_count": null,
349
+ "metadata": {},
350
+ "outputs": [],
351
+ "source": []
352
+ }
353
+ ],
354
+ "metadata": {
355
+ "kernelspec": {
356
+ "display_name": "resumer",
357
+ "language": "python",
358
+ "name": "python3"
359
+ },
360
+ "language_info": {
361
+ "codemirror_mode": {
362
+ "name": "ipython",
363
+ "version": 3
364
+ },
365
+ "file_extension": ".py",
366
+ "mimetype": "text/x-python",
367
+ "name": "python",
368
+ "nbconvert_exporter": "python",
369
+ "pygments_lexer": "ipython3",
370
+ "version": "3.12.7"
371
+ }
372
+ },
373
+ "nbformat": 4,
374
+ "nbformat_minor": 2
375
+ }
requirements.txt ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ google-genai
2
+ anthropic
3
+ openai
4
+ streamlit>=1.28.0
5
+ instructor>=1.0.0
6
+ instructor[google-genai]
7
+ google-cloud-aiplatform
8
+ pymupdf4llm
9
+ diskcache
10
+ beautifulsoup4
11
+ requests
12
+ trafilatura
13
+ undetected-chromedriver
14
+ jinja2
15
+ python-dotenv
resumer/__init__.py ADDED
@@ -0,0 +1,359 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import instructor
3
+ import pymupdf4llm
4
+ from diskcache import Cache
5
+ from resumer.schemas.sections_schemas import ResumeSchema
6
+ from resumer.schemas.job_details_schema import JobDetails
7
+ from resumer.prompts.resume_prompt import RESUME_DETAILS_EXTRACTOR, JOB_DETAILS_EXTRACTOR
8
+ from resumer.utils.scraper import scrape_job_details
9
+ from resumer.utils.latex_ops import json_to_latex_pdf
10
+ from resumer.variables import section_mapping
11
+ from typing import Callable, Optional
12
+
13
+ class ResumeTailorPipeline:
14
+ """
15
+ Args:
16
+ aclient: used to make llm calls
17
+ resume_path: path to the original resume (pdf file location)
18
+ output_dir: folder where we can store the final taliored resume
19
+
20
+ """
21
+ def __init__(
22
+ self,
23
+ aclient: instructor.AsyncInstructor,
24
+ model_name: str,
25
+ resume_path: str,
26
+ output_dir: str,
27
+ log_callback: Optional[Callable[[str], None]] = None,
28
+ max_concurrent_sections: int = 3
29
+ ):
30
+ self.aclient = aclient
31
+ self.model_name = model_name
32
+ self.resume_path = resume_path
33
+ self.output_dir = output_dir
34
+ self.log_callback = log_callback
35
+ self.max_concurrent_sections = max_concurrent_sections
36
+ self.resume_md = None
37
+ self.resume_info = None
38
+ self.resume_details = None
39
+ self.tailored_resume_path = None
40
+ self.tailored_resume_tex_path = None
41
+
42
+ # make the output_dir
43
+ os.makedirs(self.output_dir, exist_ok=True)
44
+
45
+ # Initialize Cache in a subfolder of your output_dir
46
+ cache_dir = os.path.join(self.output_dir, ".resume_cache")
47
+ self.cache = Cache(cache_dir)
48
+
49
+ def _log(self, message: str):
50
+ """Internal logging method that uses callback if available"""
51
+ print(message)
52
+ if self.log_callback:
53
+ self.log_callback(message)
54
+
55
+ def _read_resume_pdf(self):
56
+ self._log("📄 Reading resume PDF...")
57
+ if not self.resume_md:
58
+ self._log("🔄 Converting PDF to markdown...")
59
+ self.resume_md = pymupdf4llm.to_markdown(self.resume_path)
60
+ self._log(f"✅ Successfully converted resume ({len(self.resume_md)} characters)")
61
+ return self.resume_md
62
+
63
+ async def _extract_resume_json(self):
64
+ self._log("🔍 Starting resume extraction...")
65
+ resume_md = self._read_resume_pdf()
66
+
67
+ # Use the resume markdown string as the key
68
+ cached_json = self.cache.get(resume_md)
69
+
70
+ if cached_json:
71
+ self._log("⚡ Loading resume info from disk cache...")
72
+ self.resume_info = ResumeSchema.model_validate(cached_json)
73
+ self._log("✅ Resume loaded from cache")
74
+ return self.resume_info
75
+
76
+ self._log("🤖 Cache miss: Extracting resume via LLM...(this may take a while)")
77
+ self.resume_info = await self.aclient.chat.completions.create(
78
+ model=self.model_name,
79
+ response_model=ResumeSchema,
80
+ messages=[
81
+ {"role": "system", "content": RESUME_DETAILS_EXTRACTOR},
82
+ {"role": "user", "content": resume_md},
83
+ ],
84
+ )
85
+ self._log("✅ Resume structure extracted successfully")
86
+
87
+ # Store as a Dictionary/JSON instead of a Class Object
88
+ self.cache.set(resume_md, self.resume_info.model_dump())
89
+ return self.resume_info
90
+
91
+ async def _extract_job_json(self, url: str = None, job_site_content: str = None):
92
+ """Scrapes and structures job details."""
93
+ if not url and not job_site_content:
94
+ raise ValueError("You must provide either a URL or raw job content.")
95
+
96
+ if not job_site_content:
97
+ self._log(f"🌐 Scraping job details from: {url}")
98
+ job_site_content = scrape_job_details(url)
99
+ self._log(f"✅ Job page scraped ({len(job_site_content)} characters)")
100
+
101
+ self._log("🤖 Extracting job info via LLM...")
102
+ self.job_info = await self.aclient.chat.completions.create(
103
+ model=self.model_name,
104
+ response_model=JobDetails,
105
+ messages=[
106
+ {"role": "system", "content": JOB_DETAILS_EXTRACTOR},
107
+ {"role": "user", "content": job_site_content},
108
+ ],
109
+ )
110
+ self._log("✅ Job structure extracted successfully")
111
+
112
+ # Logic check for valid content
113
+ if getattr(self.job_info, "is_noise_only", False):
114
+ self._log("⚠️ Warning: Content identified as noise")
115
+ raise ValueError("LLM identified the content as noise (ads/login walls) rather than a job post.")
116
+
117
+ # Return the 'data' field if it exists, otherwise the whole object
118
+ self.job_info = getattr(self.job_info, "data", self.job_info)
119
+ return self.job_info
120
+
121
+ def _get_all_sections(self):
122
+ """Get all resume sections"""
123
+ sections = list(self.resume_info.model_dump().keys())
124
+
125
+ if "custom_sections" in sections:
126
+ sections.remove("custom_sections")
127
+
128
+ custom_sections = []
129
+
130
+ if getattr(self.resume_info, "custom_sections"):
131
+ for section in getattr(self.resume_info, "custom_sections"):
132
+ sec_name = section.section_name.plain_text
133
+ custom_sections.append(sec_name)
134
+
135
+ return sections, custom_sections
136
+
137
+
138
+ async def _process_section(self, section_title: str, section_data: str, mapping_key: str):
139
+ """
140
+ Helper method to tailor a single section using the LLM.
141
+
142
+ Args:
143
+ section_title: The name of the section (used for XML tags).
144
+ section_data: The content of the section.
145
+ mapping_key: The key to look up prompt and schema in section_mapping.
146
+ """
147
+ self._log(f"📝 Processing section: {section_title}")
148
+
149
+ section_system_prompt = section_mapping.get(mapping_key).get("prompt")
150
+ section_schema = section_mapping.get(mapping_key).get("schema")
151
+
152
+ section_user_prompt = f"""
153
+ <{section_title.upper()}>
154
+ {section_data}
155
+ </{section_title.upper()}>
156
+
157
+ <JOB_DESCRIPTION>
158
+ {self.job_info.model_dump_json()}
159
+ </JOB_DESCRIPTION>
160
+ """
161
+
162
+ # make a llm call to get the section
163
+ section_info = await self.aclient.chat.completions.create(
164
+ model=self.model_name,
165
+ response_model=section_schema,
166
+ messages=[
167
+ {"role": "system", "content": section_system_prompt},
168
+ {"role": "user", "content": section_user_prompt},
169
+ ],
170
+ )
171
+
172
+ section_info = section_info.model_dump()
173
+
174
+ # first check if this section is relevant
175
+ if section_info.get("is_relevant", False):
176
+ self._log(f"✅ {section_title}: Tailored and included")
177
+ return section_info.get("data", None)
178
+ else:
179
+ self._log(f"⏭️ {section_title}: Not relevant to job, skipping")
180
+ return None
181
+
182
+ # async def resume_builder(self):
183
+ # """Build the tailored resume from all sections"""
184
+ # self._log("🏗️ Starting resume builder...")
185
+ # section_names, custom_section_names = self._get_all_sections()
186
+
187
+ # # remove keywords from section_names
188
+ # if "keywords" in section_names:
189
+ # section_names.remove("keywords")
190
+
191
+ # resume_details = dict()
192
+
193
+ # # add personal info
194
+ # self._log("👤 Adding personal information...")
195
+ # resume_details["personal_info"] = getattr(self.resume_info, "personal_info").model_dump()
196
+
197
+ # if "personal_info" in section_names:
198
+ # section_names.remove("personal_info")
199
+
200
+ # # Process other sections
201
+ # self._log(f"📋 Processing {len(section_names)} standard sections...")
202
+ # for section_name in section_names:
203
+ # if getattr(self.resume_info, section_name) is None:
204
+ # continue
205
+
206
+ # if section_name == "summary":
207
+ # _section_data = self.resume_info.model_dump_json()
208
+ # else:
209
+ # _section_data = self.resume_info.model_dump_json(include={section_name})
210
+
211
+ # result = await self._process_section(section_name, _section_data, section_name)
212
+ # if result:
213
+ # resume_details[section_name] = result
214
+
215
+ # # Process custom sections
216
+ # resume_details["custom_sections"] = {}
217
+ # if getattr(self.resume_info, "custom_sections") is not None:
218
+ # self._log(f"📋 Processing {len(custom_section_names)} custom sections...")
219
+ # for csection in getattr(self.resume_info, "custom_sections"):
220
+ # section_name = csection.section_name.plain_text
221
+ # _section_data = str(csection.model_dump()["section_detail"])
222
+ # result = await self._process_section(section_name, _section_data, "custom_sections")
223
+ # if result:
224
+ # resume_details["custom_sections"][section_name] = result
225
+
226
+ # self.resume_details = resume_details
227
+ # self._log("✅ Resume building complete")
228
+ # return self.resume_details
229
+
230
+ async def resume_builder(self):
231
+ """Build the tailored resume from all sections with parallel processing"""
232
+ import asyncio
233
+
234
+ self._log("🏗️ Starting resume builder...")
235
+ section_names, custom_section_names = self._get_all_sections()
236
+
237
+ # remove keywords from section_names
238
+ if "keywords" in section_names:
239
+ section_names.remove("keywords")
240
+
241
+ resume_details = dict()
242
+
243
+ # add personal info
244
+ self._log("👤 Adding personal information...")
245
+ resume_details["personal_info"] = getattr(self.resume_info, "personal_info").model_dump()
246
+
247
+ if "personal_info" in section_names:
248
+ section_names.remove("personal_info")
249
+
250
+ # Create a semaphore to limit concurrent LLM calls
251
+ semaphore = asyncio.Semaphore(self.max_concurrent_sections)
252
+
253
+ async def process_section_with_semaphore(section_name, section_data, mapping_key):
254
+ """Wrapper to limit concurrent calls"""
255
+ async with semaphore:
256
+ return await self._process_section(section_name, section_data, mapping_key)
257
+
258
+ # Process standard sections in parallel
259
+ self._log(f"📋 Processing {len(section_names)} standard sections (max {self.max_concurrent_sections} concurrent)...")
260
+
261
+ # Create tasks for all sections
262
+ tasks = []
263
+ for section_name in section_names:
264
+ if getattr(self.resume_info, section_name) is None:
265
+ continue
266
+
267
+ if section_name == "summary":
268
+ _section_data = self.resume_info.model_dump_json()
269
+ else:
270
+ _section_data = self.resume_info.model_dump_json(include={section_name})
271
+
272
+ # Create a task for this section
273
+ task = process_section_with_semaphore(section_name, _section_data, section_name)
274
+ tasks.append((section_name, task))
275
+
276
+ # Run all standard section tasks concurrently
277
+ if tasks:
278
+ results = await asyncio.gather(*[task for _, task in tasks], return_exceptions=True)
279
+ for (section_name, _), result in zip(tasks, results):
280
+ if isinstance(result, Exception):
281
+ self._log(f"❌ Error processing {section_name}: {str(result)}")
282
+ elif result:
283
+ resume_details[section_name] = result
284
+
285
+ # Process custom sections in parallel
286
+ resume_details["custom_sections"] = {}
287
+ if getattr(self.resume_info, "custom_sections") is not None:
288
+ self._log(f"📋 Processing {len(custom_section_names)} custom sections (max {self.max_concurrent_sections} concurrent)...")
289
+
290
+ custom_tasks = []
291
+ for csection in getattr(self.resume_info, "custom_sections"):
292
+ section_name = csection.section_name.plain_text
293
+ _section_data = str(csection.model_dump()["section_detail"])
294
+
295
+ # Create a task for this custom section
296
+ task = process_section_with_semaphore(section_name, _section_data, "custom_sections")
297
+ custom_tasks.append((section_name, task))
298
+
299
+ # Run all custom section tasks concurrently
300
+ if custom_tasks:
301
+ custom_results = await asyncio.gather(*[task for _, task in custom_tasks], return_exceptions=True)
302
+ for (section_name, _), result in zip(custom_tasks, custom_results):
303
+ if isinstance(result, Exception):
304
+ self._log(f"❌ Error processing {section_name}: {str(result)}")
305
+ elif result:
306
+ resume_details["custom_sections"][section_name] = result
307
+
308
+ self.resume_details = resume_details
309
+ self._log("✅ Resume building complete")
310
+ return self.resume_details
311
+
312
+
313
+ async def generate_tailored_resume(self, job_url: str = None, job_site_content: str = None):
314
+ """Generate the tailored resume"""
315
+ self._log("=" * 50)
316
+ self._log("🚀 Starting Resume Tailoring Pipeline")
317
+ self._log("=" * 50)
318
+
319
+ try:
320
+ # Step 1: Extract job details
321
+ self._log("\n📌 STEP 1: Extract Job Details")
322
+ await self._extract_job_json(job_url, job_site_content)
323
+
324
+ # Step 2: Extract resume details
325
+ self._log("\n📌 STEP 2: Extract Resume Details")
326
+ await self._extract_resume_json()
327
+
328
+ self._log("\n✅ Successfully extracted both Resume and Job data")
329
+
330
+ # Step 3: Build tailored resume
331
+ self._log("\n📌 STEP 3: Build Tailored Resume")
332
+ await self.resume_builder()
333
+
334
+ # Step 4: Generate PDF
335
+ self._log("\n📌 STEP 4: Generate PDF")
336
+ self._log("🔄 Converting to LaTeX and generating PDF...")
337
+ self.tailored_resume_path, self.tailored_resume_tex_path = json_to_latex_pdf(
338
+ self.resume_details,
339
+ os.path.join(self.output_dir, "tailored_resume.pdf")
340
+ )
341
+ self._log(f"✅ PDF generated at: {self.tailored_resume_path}")
342
+
343
+ self._log("\n" + "=" * 50)
344
+ self._log("🎉 Resume Tailoring Complete!")
345
+ self._log("=" * 50)
346
+
347
+ return self.tailored_resume_path, self.tailored_resume_tex_path
348
+
349
+ except Exception as e:
350
+ self._log(f"\n❌ Error during pipeline execution: {str(e)}")
351
+ raise
352
+
353
+ def close_cache(self):
354
+ """Cleanly close the cache connection."""
355
+ self.cache.close()
356
+
357
+
358
+
359
+
resumer/demo/Sajil_Awale_CV_2025.pdf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:32f59fa3776e9f59658cfb4d1e98dbbfd77cfd86d59d2e0da163f6eebbae3b7e
3
+ size 188408
resumer/prompts/resume_prompt.py ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ RESUME_DETAILS_EXTRACTOR = """<objective>
2
+ Parse a resume into a structured JSON format following a specific RichText schema that preserves hyperlinks and their associated anchor text.
3
+ </objective>
4
+
5
+ <instructions>
6
+ Follow these steps to extract and structure the resume information:
7
+
8
+ 1. Analyze Structure:
9
+ - Examine the text-formatted resume to identify key sections (e.g., personal information, education, experience, skills, certifications).
10
+ - Note any unique formatting or organization within the resume.
11
+
12
+ 2. Extract Information:
13
+ - Systematically parse each section, extracting relevant details.
14
+ - Pay attention to dates, titles, organizations, and descriptions.
15
+
16
+ 3. Handle Variations:
17
+ - Account for different resume styles, formats, and section orders.
18
+ - Adapt the extraction process to accurately capture data from various layouts.
19
+
20
+ 5. Optimize Output:
21
+ - Handle missing or incomplete information appropriately (use null values or empty arrays/objects as needed).
22
+ - Standardize date formats, if applicable.
23
+
24
+ 6. Validate:
25
+ - Review the extracted data for consistency and completeness.
26
+ - Ensure all required fields are populated if the information is available in the resume.
27
+ </instructions>
28
+ """
29
+
30
+ JOB_DETAILS_EXTRACTOR = """
31
+ <task>
32
+ Analyze the provided text and determine if it contains a legitimate job description or if it is "noise" (e.g., bot protection screens, login walls, "Access Denied" messages, or cookie consent pages).
33
+
34
+ You must output a structured JSON response following these rules:
35
+
36
+ 1. **Noise Detection**: First, evaluate if the text is noise. Set `is_noise_only` to `true` if the text is a system error, a login prompt, or lacks any actual job-related information.
37
+ 2. **Conditional Extraction**:
38
+ - If `is_noise_only` is `true`, set the `data` field to `null`.
39
+ - If `is_noise_only` is `false`, extract the job details into the `data` object.
40
+ 3. **Handling Missing Info**: Only populate specific fields within the `data` object if the information is explicitly stated or clearly implied. If a specific detail (like 'preferred_qualifications') is missing from the text, set that field to `null` or an empty list as appropriate. Do not hallucinate details.
41
+ 4. **Focus**: Prioritize "keywords", "job_duties_and_responsibilities", and "required_qualifications" for resume tailoring. Ensure these are concise and accurate.
42
+ </task>
43
+ """
resumer/prompts/sections_prompt.py ADDED
@@ -0,0 +1,165 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ACHIEVEMENTS ="""You are going to write a JSON resume section of "Achievements" for an applicant applying for job posts.
2
+
3
+ Step to follow:
4
+ 1. Relevance Analysis: First, decide if this whole section adds value for this specific job. Set 'is_relevant' to true/false.
5
+ 2. Data Generation: If relevant, generate the data. If not, set data to null.
6
+ 3. Analyze my achievements details to match job requirements.
7
+ 4. Create a JSON resume section that highlights strongest matches, order the points by impact, and only remove a point if it makes no sense to include it.
8
+ 5. Optimize JSON section for clarity and relevance to the job description.
9
+
10
+ Instructions:
11
+ 1. Focus: Craft relevant achievements aligned with the job description.
12
+ 2. Honesty: Prioritize truthfulness and objective language.
13
+ 3. Specificity: Prioritize relevance to the specific job over general achievements.
14
+ 4. Style:
15
+ 4.1. Voice: Use active voice whenever possible.
16
+ 4.2. Proofreading: Ensure impeccable spelling and grammar.
17
+ """
18
+
19
+ CERTIFICATIONS = """You are going to write a JSON resume section of "Certifications" for an applicant applying for job posts.
20
+
21
+ Step to follow:
22
+ 1. Relevance Analysis: First, decide if this whole section adds value for this specific job. Set 'is_relevant' to true/false.
23
+ 2. Data Generation: If relevant, generate the data. If not, set data to null.
24
+ 3. Analyze my certification details to match job requirements.
25
+ 4. Create a JSON resume section that highlights strongest matches, order the points by impact, and only remove a point if it makes no sense to include it.
26
+ 5. Optimize JSON section for clarity and relevance to the job description.
27
+
28
+ Instructions:
29
+ 1. Focus: Include relevant certifications aligned with the job description.
30
+ 2. Proofreading: Ensure impeccable spelling and grammar.
31
+ """
32
+
33
+ EDUCATIONS = """You are going to write a JSON resume section of "Education" for an applicant applying for job posts.
34
+
35
+ Step to follow:
36
+ 1. Relevance Analysis: First, decide if this whole section adds value for this specific job. Set 'is_relevant' to true/false.
37
+ 2. Data Generation: If relevant, generate the data. If not, set data to null.
38
+ 3. Analyze my education details to match job requirements.
39
+ 4. Create a JSON resume section that highlights strongest matches, order the points by impact, and only remove a point if it makes no sense to include it.
40
+ 5. Optimize JSON section for clarity and relevance to the job description.
41
+
42
+ Instructions:
43
+ - Maintain truthfulness and objectivity in listing experience.
44
+ - Prioritize specificity - with respect to job - over generality.
45
+ - Proofread and Correct spelling and grammar errors.
46
+ - Aim for clear expression over impressiveness.
47
+ - Prefer active voice over passive voice.
48
+ """
49
+
50
+
51
+ PROJECTS = """You are going to write a JSON resume section of "Project Experience" for an applicant applying for job posts.
52
+
53
+ Step to follow:
54
+ 1. Relevance Analysis: First, decide if this whole section adds value for this specific job. Set 'is_relevant' to true/false.
55
+ 2. Data Generation: If relevant, generate the data. If not, set data to null.
56
+ 3. Analyze my project details to match job requirements.
57
+ 4. Create a JSON resume section that highlights strongest matches, order the points by impact, and only remove a point if it makes no sense to include it.
58
+ 5. Optimize JSON section for clarity and relevance to the job description.
59
+
60
+ Instructions:
61
+ 1. Focus: Craft highly relevant project experiences aligned with the job description.
62
+ 2. Content:
63
+ 2.1. Bullet points: It should be concise but can consist of multiple sentences if necessary to capture the full impact.
64
+ 2.2. Impact: Quantify the bullet point for measurable results.
65
+ 2.3. Storytelling: Utilize STAR methodology (Situation, Task, Action, Result) implicitly within the single bullet point.
66
+ 2.4. Action Verbs: Showcase soft skills with strong, active verbs.
67
+ 2.5. Honesty: Prioritize truthfulness and objective language.
68
+ 2.6. Structure: The bullet point should follow the "Did X by doing Y, resulting in Z" format.
69
+ 2.7. Specificity: Prioritize the single most relevant achievement for the specific job.
70
+ 3. Style:
71
+ 3.1. Clarity: Be brief and concise. Avoid fluff or filler words.
72
+ 3.2. Voice: Use active voice whenever possible.
73
+ 3.3. Proofreading: Ensure impeccable spelling and grammar.
74
+ """
75
+
76
+ SKILLS="""You are going to write a JSON resume section of "Skills" for an applicant applying for job posts.
77
+
78
+ Step to follow:
79
+ 1. Relevance Analysis: First, decide if this whole section adds value for this specific job. Set 'is_relevant' to true/false.
80
+ 2. Data Generation: If relevant, generate the data. If not, set data to null.
81
+ 3. Analyze my Skills details to match job requirements.
82
+ 4. Create a JSON resume section that highlights strongest matches, order the points by impact, and only remove a point if it makes no sense to include it.
83
+ 5. Optimize JSON section for clarity and relevance to the job description.
84
+
85
+ Instructions:
86
+ - Specificity: Prioritize relevance to the specific job over general skillset.
87
+ - Proofreading: Ensure impeccable spelling and grammar.
88
+ """
89
+
90
+
91
+ EXPERIENCE = """You are going to write a JSON resume section of "Work Experience" for an applicant applying for job posts.
92
+
93
+ Step to follow:
94
+ 1. Relevance Analysis: First, decide if this whole section adds value for this specific job. Set 'is_relevant' to true/false.
95
+ 2. Data Generation: If relevant, generate the data. If not, set data to null.
96
+ 3. Analyze my Work details to match job requirements.
97
+ 4. Create a JSON resume section that highlights strongest matches, order the points by impact, and only remove a point if it makes no sense to include it.
98
+ 5. Optimize JSON section for clarity and relevance to the job description.
99
+
100
+ Instructions:
101
+ 1. Focus: Craft highly relevant work experiences aligned with the job description.
102
+ 2. Content:
103
+ 2.1. Bullet points: It should be concise but can consist of multiple sentences if necessary to capture the full impact.
104
+ 2.2. Impact: Quantify the bullet point for measurable results.
105
+ 2.3. Storytelling: Utilize STAR methodology (Situation, Task, Action, Result) implicitly within the single bullet point.
106
+ 2.4. Action Verbs: Showcase soft skills with strong, active verbs.
107
+ 2.5. Honesty: Prioritize truthfulness and objective language.
108
+ 2.6. Structure: The single bullet point must follow the "Did X by doing Y, resulting in Z" format.
109
+ 2.7. Specificity: Prioritize the single most relevant achievement for the specific job.
110
+ 3. Style:
111
+ 3.1. Clarity: Be brief and concise. Avoid fluff or filler words. Clear expression trumps impressiveness.
112
+ 3.2. Voice: Use active voice whenever possible.
113
+ 3.3. Proofreading: Ensure impeccable spelling and grammar.
114
+ """
115
+
116
+
117
+
118
+ SUMMARY = """You are going to write a JSON resume section of "Summary" for an applicant applying for job posts.
119
+
120
+ Step to follow:
121
+ 1. Relevance Analysis: Summary section is essential in resume, so set 'is_relevant' to true.
122
+ 2. Analyze the full resume to identify the most impactful skills, experiences, and achievements.
123
+ 3. Cross-reference these highlights with the <job_description> to find the strongest professional matches.
124
+ 4. Synthesize this data into a concise, high-level summary.
125
+
126
+ Instructions:
127
+ 1. Length: STRICTLY 2 to 3 sentences only.
128
+ 2. Scope: Look at the entire resume context (experience, projects, skills, education, certifications, achievements, and keywords) to form the summary, not just a previous summary section.
129
+ 3. Relevance: Tailor every word to the specific requirements and keywords found in the job description.
130
+ 4. Content: Focus on your unique value proposition—what you’ve done, your top skill, and the measurable impact you can bring to this specific role.
131
+ 5. Style:
132
+ 5.1. Clarity: Be extremely concise; avoid fluff, generic objectives (e.g., "seeking a role"), or filler phrases.
133
+ 5.2. Voice: Use strong active voice and professional tone.
134
+ 5.3. Proofreading: Ensure impeccable spelling and grammar.
135
+ """
136
+
137
+ RESEARCH_WORK = """You are going to write a JSON resume section of "Research Work" for an applicant applying for job posts.
138
+
139
+ Step to follow:
140
+ 1. Relevance Analysis: First, decide if this whole section adds value for this specific job. Set 'is_relevant' to true/false.
141
+ 2. Data Generation: If relevant, generate the data. If not, set data to null.
142
+ 3. Analyze my Research Work details to match job requirements.
143
+ 4. Create a JSON resume section that highlights strongest matches, order the points by impact, and only remove a point if it makes no sense to include it..
144
+ 5. Optimize JSON section for clarity and relevance to the job description.
145
+
146
+ Instructions:
147
+ - Specificity: Prioritize relevance to the specific job over general skillset.
148
+ - Proofreading: Ensure impeccable spelling and grammar.
149
+ """
150
+
151
+ CUSTOM_SECTIONS = """You are going to write a JSON resume section of "Custom Sections" for an applicant applying for job posts.
152
+
153
+ Step to follow:
154
+ 1. Relevance Analysis: First, decide if this whole section adds value for this specific job. Set 'is_relevant' to true/false.
155
+ 2. Data Generation: If relevant, generate the data. If not, set data to null.
156
+ 3. Analyze my Custom Sections details to match job requirements.
157
+ 4. Create a JSON resume section that highlights strongest matches, order the points by impact, and only remove a point if it makes no sense to include it.
158
+ 5. Optimize JSON section for clarity and relevance to the job description.
159
+
160
+ Instructions:
161
+ - Specificity: Prioritize relevance to the specific job over general skillset.
162
+ - Proofreading: Ensure impeccable spelling and grammar.
163
+ """
164
+
165
+
resumer/schemas/job_details_schema.py ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from typing import List, Optional
2
+ from pydantic import BaseModel, Field
3
+
4
+ class JobInfo(BaseModel):
5
+ job_title: str = Field(description="The specific role, its level, and scope within the organization.")
6
+ job_purpose: str = Field(description="A high-level overview of the role and why it exists in the organization.")
7
+ keywords: List[str] = Field(description="Key expertise, skills, and requirements the job demands.")
8
+ job_duties_and_responsibilities: List[str] = Field(description="Focus on essential functions, their frequency and importance, level of decision-making, areas of accountability, and any supervisory responsibilities.")
9
+ required_qualifications: List[str] = Field(description="Including education, minimum experience, specific knowledge, skills, abilities, and any required licenses or certifications.")
10
+ preferred_qualifications: List[str] = Field(description="Additional \"nice-to-have\" qualifications that could set a candidate apart.")
11
+ company_name: str = Field(description="The name of the hiring organization.")
12
+ company_details: str = Field(description="Overview, mission, values, or way of working that could be relevant for tailoring a resume or cover letter.")
13
+
14
+
15
+ class JobDetails(BaseModel):
16
+ is_noise_only: bool = Field(
17
+ description="True if the text is junk (bot protection, login page, or non-job text)."
18
+ )
19
+ data: Optional[JobInfo] = Field(
20
+ default=None,
21
+ description="Populated only if is_noise_only is False. Contains whatever job info could be extracted."
22
+ )
resumer/schemas/sections_schemas.py ADDED
@@ -0,0 +1,159 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from typing import List, Optional, Union, Literal, TypeVar, Generic
2
+ from pydantic import BaseModel, Field, ConfigDict
3
+
4
+ # --- Core Building Blocks ---
5
+
6
+ class TextSegment(BaseModel):
7
+ # Removed Literal default to satisfy Gemini's strict schema rules
8
+ type: str = Field(description="Must be 'text'")
9
+ content: str
10
+
11
+ class LinkSegment(BaseModel):
12
+ # Removed Literal default
13
+ type: str = Field(description="Must be 'link'")
14
+ content: str = Field(description="The display text that is clickable.")
15
+ url: str = Field(description="The destination URL.")
16
+
17
+ class RichText(BaseModel):
18
+ # Gemini handles Union better when the objects are distinct
19
+ # but the Literal constants are removed.
20
+ segments: List[Union[TextSegment, LinkSegment]] = Field(
21
+ description="An ordered list of segments. Use 'text' for plain strings and 'link' for URLs."
22
+ )
23
+
24
+ @property
25
+ def plain_text(self) -> str:
26
+ return "".join(s.content for s in self.segments)
27
+
28
+ # --- Reusable Components ---
29
+
30
+ class DatePeriod(BaseModel):
31
+ """Consolidated date handling to remove redundancy across models"""
32
+ date_description: RichText = Field(
33
+ description="The timeframe. Use a single date 'Oct 2023' or a range 'Aug 2023 - Present'."
34
+ )
35
+
36
+ # --- Resume Sections ---
37
+
38
+ class Certification(BaseModel):
39
+ certificate_info: RichText = Field(description="Certificate name, issuer and/or other inforamtion.")
40
+ date: Optional[RichText] = Field(description="The date of issuance or expiration.")
41
+
42
+ class Education(BaseModel):
43
+ degree: RichText = Field(description="The degree and major. e.g., 'B.S. in Computer Science'.")
44
+ university: RichText = Field(description="Institution name e.g. 'Arizona State University'.")
45
+ location: Optional[RichText] = Field(description="Location wherer the institution is")
46
+ date_description: RichText = Field(description="The period of study. e.g., 'Aug 2021 - May 2025'.")
47
+ grade: Optional[RichText] = Field(description="GPA, honors, scholarships, or class standing. e.g., '3.9/4.0 GPA', 'Dean's List', 'Scholarship Recipient'")
48
+ courses: Optional[List[RichText]] = Field(description="Relevant coursework. e.g. ['Operating Systems', 'Calculus'].")
49
+
50
+ class Project(BaseModel):
51
+ name: RichText = Field(description="The name or title of the project.")
52
+ type: Optional[RichText] = Field(description="Category, e.g., 'Open Source', 'Class Project', or 'Hackathon'.")
53
+ link: Optional[LinkSegment] = Field(description="The primary project URL (GitHub, Demo, etc.).")
54
+ resources: Optional[List[LinkSegment]] = Field(description="Supplementary links like slides, docs, or video demos.")
55
+ date_description: RichText = Field(description="Timeframe of the project. Specific date point (e.g., 'Oct 2023' or '2021') or duration (e.g., 'Aug 2023 - Nov 2023' or 'Aug 2023 - Present')")
56
+ description: List[RichText] = Field(
57
+ description="bullet points using STAR/XYZ format: 'Did X by doing Y, achieved Z'."
58
+ )
59
+
60
+ class SkillSection(BaseModel):
61
+ name: RichText = Field(description="Category name of the skill group. e.g., 'Languages' or 'Cloud Infrastructure'.")
62
+ skills: List[RichText] = Field(description="Specific skills or competencies within the skill group. e.g., ['Python', 'Docker', 'AWS'].")
63
+
64
+ class Experience(BaseModel):
65
+ role: RichText = Field(description="Job title or position held. e.g., 'Senior Frontend Developer'.")
66
+ company: RichText = Field(description="Name of the employer.")
67
+ location: RichText = Field(description="The location of the company or organization. e.g. San Francisco, USA.")
68
+ date_description: RichText = Field(description="Employment duration. e.g., 'Jan 2020 - Present'.")
69
+ description: List[RichText] = Field(description="High-impact bullet points quantifying your professional achievements.")
70
+
71
+
72
+ class Media(BaseModel):
73
+ portfolio: Optional[str] = Field(description="Personal profile website URL")
74
+ linkedin: Optional[str] = Field(description="LinkedIn profile URL")
75
+ github: Optional[str] = Field(description="GitHub profile URL")
76
+ medium: Optional[str] = Field(description="Medium profile URL")
77
+ devpost: Optional[str] = Field(description="Devpost profile URL")
78
+
79
+
80
+ class Achievement(BaseModel):
81
+ name: RichText = Field(description="Title of the award or recognition.")
82
+ issued_by: RichText = Field(description="The awarding body.")
83
+ date: RichText = Field(description="Date of receipt.")
84
+ description: List[RichText] = Field(description="Details of the award's significance or selectivity.")
85
+
86
+ class ResearchWork(BaseModel):
87
+ title: RichText = Field(description="Research role or project title.")
88
+ publication: Optional[RichText] = Field(description="Venue of publication (Journal/Conference).")
89
+ date_description: RichText = Field(description="Duration of research or publication date.")
90
+ link: Optional[LinkSegment] = Field(description="Link to paper (DOI) or lab project page.")
91
+ description: List[RichText] = Field(description="Bullet points describing methodology and findings.")
92
+
93
+
94
+ class GenericElement(BaseModel):
95
+ title: RichText = Field(description="The primary heading for the entry (e.g., 'Volunteer Lead') or the main text content")
96
+ subtitle: Optional[RichText] = Field(description="The organization, location, or secondary context associated with the title.")
97
+ date_description: Optional[RichText] = Field(description="The timeframe for the activity (single date or range).")
98
+ description: Optional[List[RichText]] = Field(description="A list of bullet points detailing responsibilities, impact, and key accomplishments using the STAR methodology.")
99
+
100
+ class GenericSection(BaseModel):
101
+ section_name: RichText = Field(description="Title for the section")
102
+ section_detail: List[GenericElement] = Field(description="The specific entries belonging to this section.")
103
+
104
+ # --- Master Schema ---
105
+
106
+ T = TypeVar("T")
107
+
108
+ class SectionBase(BaseModel, Generic[T]):
109
+ is_relevant: bool = Field(description="Is this section relevant to the job description?")
110
+ data: Optional[T] = Field(description="The content of the section if relevant.", default=None)
111
+
112
+ class PersonalData(BaseModel):
113
+ name: RichText = Field(description="Full legal name.")
114
+ location: RichText = Field(description="Location of the candidate")
115
+ phone: RichText = Field(description="Contact phone number.")
116
+ email: RichText = Field(description="Professional email address.")
117
+ media: Media = Field(description="Professional social and web presence.")
118
+
119
+ class Summary(SectionBase[RichText]):
120
+ data: Optional[RichText] = Field(description="A brief summary or objective statement highlighting key skills, experience, and career goals.", alias="summary")
121
+
122
+ class Experiences(SectionBase[List[Experience]]):
123
+ data: Optional[List[Experience]] = Field(default_factory=list, description="Professional work history.", alias="work_experience")
124
+
125
+ class Projects(SectionBase[List[Project]]):
126
+ data: Optional[List[Project]] = Field(default_factory=list, description="Technical or academic projects.", alias="projects")
127
+
128
+ class SkillSections(SectionBase[List[SkillSection]]):
129
+ data: Optional[List[SkillSection]] = Field(default_factory=list, description="Categorized technical and soft skills.", alias="skill_sections")
130
+
131
+ class Educations(SectionBase[List[Education]]):
132
+ data: Optional[List[Education]] = Field(default_factory=list, description="Academic background.", alias="education")
133
+
134
+ class Certifications(SectionBase[List[Certification]]):
135
+ data: Optional[List[Certification]] = Field(default_factory=list, description="Earned certifications and licenses.", alias="certifications")
136
+
137
+ class Achievements(SectionBase[List[Achievement]]):
138
+ data: Optional[List[Achievement]] = Field(default_factory=list, description="Awards and recognitions.", alias="achievements")
139
+
140
+ class ResearchWorks(SectionBase[List[ResearchWork]]):
141
+ data: Optional[List[ResearchWork]] = Field(default_factory=list, description="Scientific or academic research contributions.", alias="research_works")
142
+
143
+ class CustomSections(SectionBase[List[GenericElement]]):
144
+ data: Optional[List[GenericElement]] = Field(default_factory=list, description="An additional section like Volunteer work or Interests.", alias="custom_sections")
145
+
146
+ class ResumeSchema(BaseModel):
147
+ personal_info: PersonalData = Field(description="Primary candidate information.")
148
+ summary: Optional[RichText] = Field(description="A brief summary or objective statement highlighting key skills, experience, and career goals.")
149
+ work_experience: List[Experience] = Field(default_factory=list, description="Professional work history.")
150
+ education: List[Education] = Field(default_factory=list, description="Academic background.")
151
+ skill_sections: List[SkillSection] = Field(default_factory=list, description="Categorized technical and soft skills.")
152
+ projects: List[Project] = Field(default_factory=list, description="Technical or academic projects.")
153
+ certifications: Optional[List[Certification]] = Field(default_factory=list, description="Earned certifications and licenses.")
154
+ achievements: Optional[List[Achievement]] = Field(default_factory=list, description="Awards and recognitions.")
155
+ research_works: Optional[List[ResearchWork]] = Field(default_factory=list, description="Scientific or academic research contributions.")
156
+ custom_sections: Optional[List[GenericSection]] = Field(default_factory=list, description="Additional sections like Volunteer work or Interests or Reference.")
157
+ keywords: List[str] = Field(
158
+ default_factory=list, description="Strategic industry terms and technical concepts for ATS optimization. Includes methodologies (Agile, SDLC), domains (NLP, Cloud), and core competencies not explicitly listed as skills."
159
+ )
resumer/structures.py ADDED
File without changes
resumer/templates/_resume.tex.jinja ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ \documentclass[letterpaper,10.8pt]{article}
2
+
3
+
4
+
5
+
6
+ %%%%%% CV STARTS HERE %%%%%%%%%%%%%%%%%%%%%%%%%%%%
7
+
8
+ \begin{document}
9
+ \textbf{{\LARGE \VAR{richtext_to_latex(personal_info.name)}}}
10
+
11
+
12
+ % -------------------------------------------
13
+ \end{document}
resumer/templates/resume.tex.jinja ADDED
@@ -0,0 +1,266 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ % -------------------------
2
+ % Resume in Latex
3
+ % License : MIT
4
+ % ------------------------
5
+
6
+ \documentclass[letterpaper,10.8pt]{article}
7
+
8
+ \usepackage{latexsym}
9
+ \usepackage[empty]{fullpage}
10
+ \usepackage{titlesec}
11
+ \usepackage{marvosym}
12
+ \usepackage[usenames,dvipsnames]{color}
13
+ \usepackage{verbatim}
14
+ \usepackage{enumitem}
15
+ \usepackage[pdftex]{hyperref}
16
+ \usepackage{fancyhdr}
17
+ \usepackage{xcolor}
18
+ \definecolor{dblue}{HTML}{003900}
19
+ \hypersetup{
20
+ colorlinks=true,
21
+ linkcolor=dblue,
22
+ filecolor=dblue,
23
+ urlcolor=dblue,
24
+ }
25
+
26
+ \definecolor{defaulttextcolor}{HTML}{030303}
27
+ \color{defaulttextcolor}
28
+
29
+ \pagestyle{fancy}
30
+ \fancyhf{} % clear all header and footer fields
31
+ \fancyfoot{}
32
+ \renewcommand{\headrulewidth}{0pt}
33
+ \renewcommand{\footrulewidth}{0pt}
34
+
35
+ % Adjust margins
36
+ \addtolength{\oddsidemargin}{-0.375in}
37
+ \addtolength{\evensidemargin}{-0.375in}
38
+ \addtolength{\textwidth}{1in}
39
+ \addtolength{\topmargin}{-.5in}
40
+ \addtolength{\textheight}{1in}
41
+
42
+ \urlstyle{rm}
43
+
44
+ \raggedbottom
45
+ \raggedright
46
+ \setlength{\tabcolsep}{0in}
47
+
48
+ % Sections formatting
49
+ \titleformat{\section}{
50
+ \vspace{-3pt}\scshape\raggedright\large
51
+ }{}{0em}{}[\color{black}\titlerule \vspace{-5pt}]
52
+
53
+ % -------------------------
54
+ % Custom commands
55
+ \newcommand{\resumeItem}[2]{
56
+ \item\small{
57
+ \textbf{#1}{: #2 \vspace{-2pt}}
58
+ }
59
+ }
60
+
61
+ \newcommand{\resumeItemWithoutTitle}[1]{
62
+ \item\small{
63
+ {\vspace{-2pt}}
64
+ }
65
+ }
66
+
67
+ \iffalse
68
+ \newcommand{\resumeSubheading}[5]{
69
+ \vspace{-1pt}\item
70
+ \begin{tabular*}{0.97\textwidth}{l@{\extracolsep{\fill}}r}
71
+ \textbf{#1} & #2 \\
72
+ \textit{\small#3} & \textit{\small #4} \\
73
+ \ifx&#5&%
74
+ % If the 5th argument is empty, do nothing
75
+ \else
76
+ \textit{\scriptsize #5} & \\ % Add the row only if argument exists
77
+ \fi
78
+ \end{tabular*}\vspace{-5pt}
79
+ }
80
+ \fi
81
+
82
+ \newcommand{\resumeSubheading}[5]{
83
+ \vspace{-1pt}\item
84
+ \begin{tabular*}{0.97\textwidth}{l@{\extracolsep{\fill}}r}
85
+ \textbf{#1} & #2 \\
86
+ % Only render this row if #3 OR #4 have content
87
+ \ifx&#3&\ifx&#4&\else
88
+ \textit{\small#3} & \textit{\small #4} \\
89
+ \fi\else
90
+ \textit{\small#3} & \textit{\small #4} \\
91
+ \fi
92
+ % Only render this row if #5 has content
93
+ \ifx&#5&\else
94
+ \textit{\scriptsize #5} & \\
95
+ \fi
96
+ \end{tabular*}\vspace{-6pt} % Slightly increased negative space here
97
+ }
98
+
99
+
100
+
101
+ \newcommand{\resumeSubItem}[2]{\resumeItem{#1}{#2}\vspace{-4pt}}
102
+
103
+ \renewcommand{\labelitemii}{$\circ$}
104
+
105
+ \newcommand{\resumeSubHeadingListStart}{\begin{itemize}[leftmargin=*]}
106
+ \newcommand{\resumeSubHeadingListEnd}{\end{itemize}}
107
+
108
+ \newcommand{\resumeItemListStart}{\begin{itemize}}
109
+ \newcommand{\resumeItemListEnd}{\end{itemize}\vspace{-5pt}}
110
+
111
+ % -------------------------------------------
112
+ %%%%%% CV STARTS HERE %%%%%%%%%%%%%%%%%%%%%%%%%%%%
113
+
114
+ \begin{document}
115
+ \textbf{{\LARGE \VAR{richtext_to_latex(personal_info.name)}}}
116
+ % ----------HEADING-----------------
117
+ \begin{tabular*}{\textwidth}{l@{\extracolsep{\fill}}r}
118
+ {Address: \VAR{richtext_to_latex(personal_info.location)}} &
119
+ {Portfolio: \href{\VAR{personal_info.media.portfolio}}{\VAR{personal_info.media.portfolio}}} \\
120
+ {Email: \VAR{richtext_to_latex(personal_info.email)}} &
121
+ {Github: \href{\VAR{personal_info.media.github}}{\VAR{personal_info.media.github}}} \\
122
+ {Mobile: \VAR{richtext_to_latex(personal_info.phone)}} &
123
+ {Linkedin: \href{\VAR{personal_info.media.linkedin}}{\VAR{personal_info.media.linkedin}}} \\
124
+ \end{tabular*}
125
+
126
+ % -----------SUMMARY-----------------
127
+
128
+ \BLOCK{ if summary }
129
+ \section{Summary}
130
+ \VAR{richtext_to_latex(summary)}
131
+ \BLOCK{ endif }
132
+
133
+ % -----------EDUCATION-----------------
134
+ \BLOCK{ if education and len(education) > 0 }
135
+ \section{Education}
136
+ \resumeSubHeadingListStart
137
+ \BLOCK{ for edu in education }
138
+ \resumeSubheading
139
+ {\VAR{richtext_to_latex(edu.university)}}{\BLOCK{ if edu.location }\VAR{richtext_to_latex(edu.location)} \BLOCK{ endif } }
140
+ {\VAR{richtext_to_latex(edu.degree)}}{\VAR{richtext_to_latex(edu.date_description)}}
141
+ {\BLOCK{ if edu.grade }\VAR{richtext_to_latex(edu.grade)}\BLOCK{ endif }} % Now passed as the 5th argument
142
+ \BLOCK{ endfor }
143
+ \resumeSubHeadingListEnd
144
+ \BLOCK{ endif }
145
+
146
+ % -----------SKILLS-----------
147
+ \BLOCK{ if skill_sections and len(skill_sections) > 0 }
148
+ \section{Skills}
149
+ \resumeSubHeadingListStart
150
+ \BLOCK{ for skill_group in skill_sections }
151
+ \resumeSubItem{\VAR{richtext_to_latex(skill_group.name)}}{\BLOCK{ for sk in skill_group.skills }\VAR{richtext_to_latex(sk)}\BLOCK{ if not loop.last }, \BLOCK{ endif }\BLOCK{ endfor }}
152
+ \BLOCK{ endfor }
153
+ \resumeSubHeadingListEnd
154
+ \BLOCK{ endif }
155
+
156
+
157
+ % -----------EXPERIENCE-----------
158
+ \BLOCK{ if work_experience and len(work_experience) > 0 }
159
+ \section{Experience}
160
+ \resumeSubHeadingListStart
161
+ \BLOCK{ for exp in work_experience }
162
+ \resumeSubheading
163
+ {\VAR{richtext_to_latex(exp.role)}}{\BLOCK{ if exp.location }\VAR{richtext_to_latex(exp.location)} \BLOCK{ endif }}
164
+ {\VAR{richtext_to_latex(exp.company)}}{\VAR{richtext_to_latex(exp.date_description)}}
165
+ {}
166
+ \resumeItemListStart
167
+ \BLOCK{ for item in exp.description }
168
+ \item{\VAR{richtext_to_latex(item)}}
169
+ \BLOCK{ endfor }
170
+ \resumeItemListEnd
171
+ \BLOCK{ endfor }
172
+ \resumeSubHeadingListEnd
173
+ \BLOCK{ endif }
174
+
175
+
176
+ % -----------PROJECTS-----------
177
+ \BLOCK{ if projects and len(projects) > 0 }
178
+ \section{Projects}
179
+ \resumeSubHeadingListStart
180
+ \BLOCK{ for proj in projects }
181
+ \resumeSubheading
182
+ {\VAR{richtext_to_latex(proj.name)}}{}
183
+ {\BLOCK{ if proj.type }\VAR{richtext_to_latex(proj.type)} \BLOCK{ endif }}{\VAR{richtext_to_latex(proj.date_description)}}
184
+ {}
185
+ \resumeItemListStart
186
+ \BLOCK{ for desc in proj.description }
187
+ \item{\VAR{richtext_to_latex(desc)}}
188
+ \BLOCK{ endfor }
189
+ \resumeItemListEnd
190
+ \BLOCK{ endfor }
191
+ \resumeSubHeadingListEnd
192
+ \BLOCK{ endif }
193
+
194
+ % -----------ACHIEVEMENTS-----------
195
+ \BLOCK{ if achievements and len(achievements) > 0 }
196
+ \section{Honors and Awards}
197
+ \resumeSubHeadingListStart
198
+ \BLOCK{ for ach in achievements }
199
+ \resumeSubheading
200
+ {\VAR{richtext_to_latex(ach.name)}}{}
201
+ {\VAR{richtext_to_latex(ach.issued_by)}}{\VAR{richtext_to_latex(ach.date)}}
202
+ {}
203
+ \resumeItemListStart
204
+ \BLOCK{ for desc in ach.description }
205
+ \item{\VAR{richtext_to_latex(desc)}}
206
+ \BLOCK{ endfor }
207
+ \resumeItemListEnd
208
+ \BLOCK{ endfor }
209
+ \resumeSubHeadingListEnd
210
+ \BLOCK{ endif }
211
+
212
+ % -----------CERTIFICATIONS-----------
213
+ \BLOCK{ if certifications and len(certifications) > 0 }
214
+ \section{Certifications}
215
+ \begin{description}[font=$\bullet$]
216
+ \BLOCK{ for cert in certifications }
217
+ \item{\VAR{richtext_to_latex(cert.certificate_info)}} \BLOCK{ if cert.date }\hfill \VAR{richtext_to_latex(cert.date)}\BLOCK{ endif }
218
+ \BLOCK{ endfor }
219
+ \end{description}
220
+ \BLOCK{ endif }
221
+
222
+
223
+ % -----------RESEARCH WORKS-----------
224
+ \BLOCK{ if research_works and len(research_works) > 0 }
225
+ \section{Research Works}
226
+ \resumeSubHeadingListStart
227
+ \BLOCK{ for research_work in research_works }
228
+ \resumeSubheading
229
+ {\VAR{richtext_to_latex(research_work.name)}, \VAR{richtext_to_latex(research_work.publication)}}{}
230
+ {\VAR{richtext_to_latex(research_work.link)}}{\VAR{richtext_to_latex(research_work.date)}}
231
+ {}
232
+ \resumeItemListStart
233
+ \BLOCK{ for desc in research_work.description }
234
+ \item{\VAR{richtext_to_latex(desc)}}
235
+ \BLOCK{ endfor }
236
+ \resumeItemListEnd
237
+ \BLOCK{ endfor }
238
+ \resumeSubHeadingListEnd
239
+ \BLOCK{ endif }
240
+
241
+
242
+ % -----------CUSTOM SECTION-----------
243
+ \BLOCK{ if custom_sections and len(custom_sections) > 0 }
244
+ \BLOCK{ for csection_name, csection in custom_sections.items() }
245
+ \section{\VAR{richtext_to_latex(csection_name)}}
246
+ \resumeSubHeadingListStart
247
+ \BLOCK{ for csection_item in csection }
248
+ \resumeSubheading
249
+ {\VAR{richtext_to_latex(csection_item.title)}}{}
250
+ {\BLOCK{ if csection_item.subtitle }\VAR{richtext_to_latex(csection_item.subtitle)}\BLOCK{ endif }}{\BLOCK{ if csection_item.date_description }\VAR{richtext_to_latex(csection_item.date_description)}\BLOCK{ endif }}
251
+ {}
252
+ \BLOCK{ if csection_item.description }
253
+ \resumeItemListStart
254
+ \BLOCK{ for desc in csection_item.description }
255
+ \item{\VAR{richtext_to_latex(desc)}}
256
+ \BLOCK{ endfor }
257
+ \resumeItemListEnd
258
+ \BLOCK{ endif }
259
+ \BLOCK{ endfor }
260
+ \resumeSubHeadingListEnd
261
+ \BLOCK{ endfor }
262
+ \BLOCK{ endif }
263
+
264
+
265
+ % -------------------------------------------
266
+ \end{document}
resumer/utils/latex_ops.py ADDED
@@ -0,0 +1,115 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import jinja2
3
+ import subprocess
4
+
5
+ def write_file(file_path, content):
6
+ with open(file_path, "w") as f:
7
+ f.write(content)
8
+
9
+ def save_latex_as_pdf(tex_path, dst_path):
10
+ output_dir = os.path.dirname(dst_path)
11
+ # Run pdflatex. Using nonstopmode to prevent hanging on errors.
12
+ # Output directory must exist.
13
+ cmd = ["pdflatex", "-interaction=nonstopmode", f"-output-directory={output_dir}", tex_path]
14
+
15
+ print(f"Running command: {' '.join(cmd)}")
16
+ try:
17
+ # Run twice to resolve references/page numbers if needed
18
+ subprocess.run(cmd, check=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
19
+ except subprocess.CalledProcessError as e:
20
+ print(f"Error compiling LaTeX: {e}")
21
+ print(f"Stdout: {e.stdout.decode('utf-8')}")
22
+ print(f"Stderr: {e.stderr.decode('utf-8')}")
23
+
24
+ def escape_for_latex(data):
25
+ if isinstance(data, dict):
26
+ new_data = {}
27
+ for key in data.keys():
28
+ new_data[key] = escape_for_latex(data[key])
29
+ return new_data
30
+ elif isinstance(data, list):
31
+ return [escape_for_latex(item) for item in data]
32
+ elif isinstance(data, str):
33
+ latex_special_chars = {
34
+ "&": r"\&",
35
+ "%": r"\%",
36
+ "$": r"\$",
37
+ "#": r"\#",
38
+ "_": r"\_",
39
+ "{": r"\{",
40
+ "}": r"\}",
41
+ "~": r"\textasciitilde{}",
42
+ "^": r"\^{}",
43
+ "\\": r"\textbackslash{}",
44
+ "\n": "\\newline%\n",
45
+ "-": r"{-}",
46
+ "\xA0": "~",
47
+ "[": r"{[}",
48
+ "]": r"{]}",
49
+ }
50
+ return "".join([latex_special_chars.get(c, c) for c in data])
51
+
52
+ return data
53
+
54
+
55
+ # defining function that are called in the template
56
+ def richtext_to_latex(richtext_dict: dict) -> str:
57
+ if isinstance(richtext_dict, str):
58
+ return richtext_dict
59
+ if not richtext_dict or not isinstance(richtext_dict, dict):
60
+ return ""
61
+ response = []
62
+ segments = richtext_dict.get("segments", [])
63
+ for segment in segments:
64
+ content = segment.get("content", "")
65
+ if segment.get("type") == "text":
66
+ response.append(content)
67
+ elif segment.get("type") == "link":
68
+ url = segment.get("url", "")
69
+ response.append(rf"\href{{{url}}}{{{content}}}")
70
+ return " ".join(response)
71
+
72
+ def json_to_latex_pdf(json_resume, dst_path, template_name = "resume.tex.jinja"):
73
+ try:
74
+ module_dir = os.path.dirname(__file__)
75
+ templates_path = os.path.join(os.path.dirname(module_dir), 'templates')
76
+
77
+ latex_jinja_env = jinja2.Environment(
78
+ block_start_string="\\BLOCK{",
79
+ block_end_string="}",
80
+ variable_start_string="\\VAR{",
81
+ variable_end_string="}",
82
+ comment_start_string="\\#{",
83
+ comment_end_string="}",
84
+ line_statement_prefix="%-",
85
+ line_comment_prefix="%#",
86
+ trim_blocks=True,
87
+ autoescape=False,
88
+ loader=jinja2.FileSystemLoader(templates_path),
89
+ )
90
+
91
+ # add the functions to the template
92
+ latex_jinja_env.globals.update(richtext_to_latex=richtext_to_latex)
93
+ latex_jinja_env.globals.update(len=len)
94
+
95
+ escaped_json_resume = escape_for_latex(json_resume)
96
+
97
+ try:
98
+ resume_template = latex_jinja_env.get_template(template_name)
99
+ except jinja2.exceptions.TemplateNotFound:
100
+ print(f"Template {template_name} not found in {templates_path}")
101
+ return None, None
102
+
103
+ resume_latex = resume_template.render(escaped_json_resume)
104
+
105
+ tex_path = dst_path.replace(".pdf", ".tex")
106
+
107
+ write_file(tex_path, resume_latex)
108
+ save_latex_as_pdf(tex_path, dst_path)
109
+
110
+ print(f"PDF generated at: {dst_path}")
111
+ return dst_path, tex_path
112
+ except Exception as e:
113
+ print(f"Error in json_to_latex_pdf: {e}")
114
+ return None, None
115
+
resumer/utils/scraper.py ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import requests
2
+ import trafilatura
3
+ import random
4
+
5
+ def scrape_job_details(url):
6
+ # 1. Setup headers to look like a real browser
7
+ user_agents = [
8
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
9
+ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
10
+ ]
11
+
12
+ headers = {
13
+ "User-Agent": random.choice(user_agents),
14
+ "Accept-Language": "en-US,en;q=0.9",
15
+ }
16
+
17
+ try:
18
+ # 2. Fetch the HTML manually using requests
19
+ response = requests.get(url, headers=headers, timeout=15)
20
+ response.raise_for_status() # Check for HTTP errors
21
+
22
+ # 3. Pass the raw HTML to trafilatura for extraction
23
+ # We use 'extract' on the response text directly
24
+ content = trafilatura.extract(
25
+ response.text,
26
+ include_formatting=True,
27
+ include_links=False,
28
+ favor_precision=True
29
+ )
30
+
31
+ if not content:
32
+ return "Error: Could not identify the main content of the page."
33
+
34
+ return content
35
+
36
+ except requests.exceptions.RequestException as e:
37
+ return f"Network error: {e}"
38
+ except Exception as e:
39
+ return f"An unexpected error occurred: {e}"
40
+
41
+ # # --- Usage ---
42
+ # url = "https://careers.qualcomm.com/careers/job/446715275527?hl=en-US&domain=qualcomm.com&source=APPLICANT_SOURCE-6-2"
43
+ # print(scrape_job_details(url))
resumer/variables.py ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from resumer.prompts.sections_prompt import SUMMARY, EXPERIENCE, SKILLS, PROJECTS, EDUCATIONS, CERTIFICATIONS, ACHIEVEMENTS, RESEARCH_WORK, CUSTOM_SECTIONS
2
+ from resumer.schemas.sections_schemas import Summary, Experiences, Projects, SkillSections, Educations, Certifications, Achievements, ResearchWorks, CustomSections
3
+
4
+
5
+ section_mapping = {
6
+ "summary": {"prompt":SUMMARY, "schema": Summary},
7
+ "work_experience": {"prompt":EXPERIENCE, "schema": Experiences},
8
+ "projects": {"prompt":PROJECTS, "schema": Projects},
9
+ "skill_sections": {"prompt":SKILLS, "schema": SkillSections},
10
+ "education": {"prompt":EDUCATIONS, "schema": Educations},
11
+ "certifications": {"prompt":CERTIFICATIONS, "schema": Certifications},
12
+ "achievements": {"prompt":ACHIEVEMENTS, "schema": Achievements},
13
+ "research_works": {"prompt":RESEARCH_WORK, "schema": ResearchWorks},
14
+ "custom_sections": {"prompt":CUSTOM_SECTIONS, "schema": CustomSections},
15
+ }