File size: 18,341 Bytes
5d6d60e
 
 
 
 
 
 
 
 
 
 
7498f2c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
---

title: Multi-Agent Job Application Assistant
emoji: 🚀
colorFrom: purple
colorTo: indigo
sdk: gradio
sdk_version: "4.44.0"
app_file: app.py
pinned: false
---


## Multi‑Agent Job Application Assistant (Streamlit + Gradio/Hugging Face)

A production‑ready system to discover jobs, generate ATS‑optimized resumes and cover letters, and export documents to Word/PowerPoint/Excel. Includes secure LinkedIn OAuth (optional), multi‑source job aggregation, Gemini‑powered generation, and advanced agent capabilities (parallelism, temporal tracking, observability, context engineering).

---

### What you get
- **Two UIs**: Streamlit (`app.py`) and Gradio/HF (`hf_app.py`)
- **LinkedIn OAuth 2.0** (optional; CSRF‑safe state validation)
- **Job aggregation**: Adzuna (5k/month) plus resilient fallbacks
- **ATS‑optimized drafting**: resumes + cover letters (Gemini)
- **Office exports**:
  - Word resumes and cover letters (5 templates)
  - PowerPoint CV (4 templates)
  - Excel application tracker (5 analytical sheets)
- **Advanced agents**: parallel execution, temporal memory, observability/tracing, and context engineering/flywheel
- **LangExtract integration**: structured extraction with Gemini key; robust regex fallback in constrained environments
- **New**: Router pipeline, Temporal KG integration, Parallel-agents demo, HF minimal Space branch
- **New (Aug 2025)**: UK resume rules, action-verb upgrades, anti-buzzword scrub, skills proficiency, remote readiness, Muse/Reed/Novorésumé/StandOut CV checklists, and interactive output controls (exact length, cycles, layout presets)

---

## Quickstart

### 1) Environment (.env)
Create a UTF‑8 `.env` (values optional if you want mock mode). See `.env.example` for the full list of variables:
```ini

# Behavior

MOCK_MODE=true

PORT=7860



# LLM / Research

LLM_PROVIDER=gemini

LLM_MODEL=gemini-2.5-flash

GEMINI_API_KEY=

# Optional per-agent Gemini keys

GEMINI_API_KEY_CV=

GEMINI_API_KEY_COVER=

GEMINI_API_KEY_CHAT=

GEMINI_API_KEY_PARSER=

GEMINI_API_KEY_MATCH=

GEMINI_API_KEY_TAILOR=

OPENAI_API_KEY=

ANTHROPIC_API_KEY=



TAVILY_API_KEY=



# Job APIs

ADZUNA_APP_ID=

ADZUNA_APP_KEY=



# Office MCP (optional)

POWERPOINT_MCP_URL=http://localhost:3000

WORD_MCP_URL=http://localhost:3001

EXCEL_MCP_URL=http://localhost:3002



# LangExtract uses GEMINI key by default

LANGEXTRACT_API_KEY=

```

Hardcoded keys have been removed from utility scripts. Use `switch_api_key.py` to safely set keys into `.env` without embedding them in code.

### 2) Install
- Windows PowerShell
```powershell

python -m venv .venv

.\.venv\Scripts\Activate.ps1

pip install -r requirements.txt

```
- Linux/macOS
```bash

python3 -m venv .venv

source .venv/bin/activate

pip install -r requirements.txt

```

### 3) Run the apps
- Streamlit (PATH‑safe)
```powershell

python -m streamlit run app.py --server.port 8501

```
- Gradio / Hugging Face (avoid port conflicts)
```powershell

$env:PORT=7861; python hf_app.py

```
```bash

PORT=7861 python hf_app.py

```
The HF app binds on 0.0.0.0:$PORT.

---

## 📊 System Architecture Overview

This is a **production-ready, multi-agent job application system** with sophisticated AI capabilities and enterprise-grade features:

### 🏗️ Core Architecture

#### **Dual Interface Design**
- **Streamlit Interface** (`app.py`) - Traditional web application for desktop use
- **Gradio/HF Interface** (`hf_app.py`) - Modern, mobile-friendly, deployable to Hugging Face Spaces

#### **Multi-Agent System** (15 Specialized Agents)

**Core Processing Agents:**
- **`OrchestratorAgent`** - Central coordinator managing workflow and job orchestration
- **`CVOwnerAgent`** - ATS-optimized resume generation with UK-specific formatting rules
- **`CoverLetterAgent`** - Personalized cover letter generation with keyword optimization
- **`ProfileAgent`** - Intelligent CV parsing and structured profile extraction
- **`JobAgent`** - Job posting analysis and requirement extraction
- **`RouterAgent`** - Dynamic routing based on payload state and workflow stage

**Advanced AI Agents:**
- **`ParallelExecutor`** - Concurrent processing for 3-5x faster multi-job handling
- **`TemporalTracker`** - Time-stamped application history and pattern analysis
- **`ObservabilityAgent`** - Real-time tracing, metrics collection, and monitoring
- **`ContextEngineer`** - Flywheel learning and context optimization
- **`ContextScaler`** - L1/L2/L3 memory management for scalable context handling
- **`LinkedInManager`** - OAuth 2.0 integration and profile synchronization
- **`MetaAgent`** - Combines outputs from multiple specialized analysis agents
- **`TriageAgent`** - Intelligent task prioritization and routing

#### **Guidelines Enforcement System** (`agents/guidelines.py`)
Comprehensive rule engine ensuring document quality:
- **UK Compliance**: British English, UK date formats (MMM YYYY), £ currency normalization
- **ATS Optimization**: Plain text formatting, keyword density, section structure
- **Content Quality**: Anti-buzzword filtering, action verb strengthening, first-person removal
- **Layout Rules**: Exact length enforcement, heading validation, bullet point formatting

### 🔌 Integration Ecosystem

#### **LLM Integration** (`services/llm.py`)
- **Multi-Provider Support**: OpenAI, Anthropic Claude, Google Gemini
- **Per-Agent API Keys**: Cost optimization through agent-specific key allocation
- **Intelligent Fallbacks**: Graceful degradation when providers unavailable
- **Configurable Models**: Per-agent model selection for optimal performance/cost

#### **Job Aggregation** (`services/job_aggregator.py`, `services/jobspy_client.py`)
- **Primary Sources**: Adzuna API (5,000 jobs/month free tier)
- **JobSpy Integration**: Indeed, LinkedIn, Glassdoor aggregation
- **Additional APIs**: Remotive, The Muse, GitHub Jobs
- **Smart Deduplication**: Title + company matching with fuzzy logic
- **SSL Bypass**: Automatic retry for corporate environments

#### **Document Generation** (`services/`)
- **Word Documents** (`word_cv.py`): 5 professional templates, MCP server integration
- **PowerPoint CVs** (`powerpoint_cv.py`): 4 visual templates for presentations
- **Excel Trackers** (`excel_tracker.py`): 5 analytical sheets with metrics
- **PDF Export**: Cross-platform compatibility with formatting preservation

### 📈 Advanced Features

#### **Pipeline Architecture** (`agents/pipeline.py`)
```

User Input → Router → Profile Analysis → Job Analysis → Resume Generation → Cover Letter → Review → Memory Storage

                ↓           ↓                ↓              ↓                    ↓            ↓

           Event Log   Profile Cache    Job Cache    Document Cache      Metrics Log   Temporal KG

```

#### **Memory & Persistence**
- **File-backed Storage** (`memory/store.py`): Atomic writes, thread-safe operations
- **Temporal Knowledge Graph**: Application tracking with time-stamped relationships
- **Event Sourcing** (`events.jsonl`): Complete audit trail of all agent actions
- **Caching System** (`utils/cache.py`): TTL-based caching with automatic eviction

#### **LangExtract Integration** (`services/langextract_service.py`)

- **Structured Extraction**: Job requirements, skills, company culture

- **ATS Optimization**: Keyword extraction and scoring

- **Fallback Mechanisms**: Regex-based extraction when API unavailable

- **Result Caching**: Performance optimization for repeated analyses



### 🛡️ Security & Configuration



#### **Authentication & Security**

- **OAuth 2.0**: LinkedIn integration with CSRF protection

- **Input Sanitization**: Path traversal and injection prevention

- **Environment Isolation**: Secrets management via `.env`

- **Rate Limiting**: API throttling and abuse prevention



#### **Configuration Management**

- **Environment Variables**: All sensitive data in `.env`

- **Agent Configuration** (`utils/config.py`): Centralized settings

- **Template System**: Customizable document templates

- **Feature Flags**: Progressive enhancement based on available services



### 📁 Project Structure



```

2096955/

├── agents/               # Multi-agent system components

│   ├── orchestrator.py   # Main orchestration logic

│   ├── cv_owner.py       # Resume generation with guidelines
│   ├── guidelines.py     # UK rules and ATS optimization
│   ├── pipeline.py       # Application pipeline flow
│   └── ...              # Additional specialized agents
├── services/            # External integrations and services
│   ├── llm.py           # Multi-provider LLM client
│   ├── job_aggregator.py # Job source aggregation

│   ├── word_cv.py       # Word document generation
│   └── ...              # Document and API services
├── utils/               # Utility functions and helpers
│   ├── ats.py           # ATS scoring and optimization
│   ├── cache.py         # TTL caching system
│   ├── consistency.py   # Contradiction detection
│   └── ...              # Text processing and helpers
├── models/              # Data models and schemas
│   └── schemas.py       # Pydantic models for type safety
├── mcp/                 # Model Context Protocol servers
│   ├── cv_owner_server.py
│   ├── cover_letter_server.py
│   └── orchestrator_server.py

├── memory/              # Persistent storage

│   ├── store.py         # File-backed memory store

│   └── data/            # Application state and history

├── app.py               # Streamlit interface

├── hf_app.py            # Gradio/HF interface
└── api_llm_integration.py # REST API endpoints
```



### 🚀 Performance Optimizations



- **Parallel Processing**: Async job handling with `asyncio` and `nest_asyncio`

- **Lazy Loading**: Dependencies loaded only when needed

- **Smart Caching**: Multi-level caching (memory, file, API responses)

- **Batch Operations**: Efficient multi-job processing

- **Event-Driven**: Asynchronous event handling for responsiveness



### 🧪 Testing & Quality



- **Test Suites**: Comprehensive tests in `tests/` directory

- **Integration Tests**: API and service integration validation

- **Mock Mode**: Development without API keys

- **Smoke Tests**: Quick validation scripts for deployment

- **Observability**: Built-in tracing and metrics collection



---



## Router pipeline (User → Router → Profile → Job → Resume → Cover → Review)

- Implemented in `agents/pipeline.py` and exposed via API in `api_llm_integration.py` (`/api/llm/pipeline_run`).

- Agents:

  - `RouterAgent`: routes based on payload state

  - `ProfileAgent`: parses CV to structured profile (LLM with fallback)

  - `JobAgent`: analyzes job posting (LLM with fallback)

  - `CVOwnerAgent` and `CoverLetterAgent`: draft documents (Gemini, per-agent keys)

  - Review: contradiction checks and memory persist

- Temporal tracking: on review, a `drafted` status is recorded in the temporal KG with issues metadata.



**Flow diagram**

```mermaid

flowchart TD

  U["User"] --> R["RouterAgent"]

  R -->|cv_text present| P["ProfileAgent (LLM)"]

  R -->|job_posting present| J["JobAgent (LLM)"]

  P --> RESUME["CVOwnerAgent"]

  J --> RESUME

  RESUME --> COVER["CoverLetterAgent"]

  COVER --> REVIEW["Orchestrator Review"]

  REVIEW --> M["MemoryStore (file-backed)"]

  REVIEW --> TKG["Temporal KG (triplets)"]

  subgraph LLM["LLM Client (Gemini 2.5 Flash, per-agent keys)"]

    P

    J

    RESUME

    COVER

  end

  subgraph UI["Gradio (HF)"]

    U

  end

  subgraph API["Flask API"]

    PR["/api/llm/pipeline_run"]

  end

  U -. optional .-> PR

```

---

## Hugging Face / Gradio (interactive controls)
- In the CV Analysis tab, you can now set:
  - **Refinement cycles** (1–5)
  - **Exact target length** (characters) to enforce resume and cover length deterministically
  - **Layout preset**: `classic`, `modern`, `minimalist`, `executive`
    - classic: Summary → Skills → Experience → Education (above the fold for Summary/Skills)
    - modern: Summary → Experience → Skills → Projects/Certifications → Education
    - minimalist: concise Summary → Skills → Experience → Education
    - executive: Summary → Selected Achievements (3–5) → Experience → Skills → Education → Certifications

---

## UK resume/cover rules (built-in)
- UK English and dates (MMM YYYY)
- Current role in present tense; previous roles in past tense
- Digits for numbers; £ and % normalization
- Remove first‑person pronouns in resume bullets; maintain active voice
- Hard skills first (max ~10), then soft skills; verbatim critical JD keywords in bullets
- Strip DOB/photo lines; compress older roles (>15 years) to title/company/dates

These rules are applied by `agents/cv_owner.py` and validated by checklists.

---

## Checklists and observability
- Checklists integrate guidance from:
  - Reed: CV layout and mistakes
  - The Muse: action verbs and layout basics
  - Novorésumé: one‑page bias, clean sections, links
  - StandOut CV: quantification, bullet density, recent‑role focus
- Observability tab aggregates per‑agent events and displays checklist outcomes. Events are stored in `memory/data/events.jsonl`.

---

## Scripts (headless runs)
- Capco (Anthony Lui → Capco):
```powershell

python .\scripts\run_with_env.py .\scripts\run_anthony_capco.py

```
- Anthropic (Anthony Lui → Anthropic):
```powershell

python .\scripts\run_with_env.py .\scripts\run_anthropic_job.py

```
- Pipeline (Router + Agents + Review + Events):
```powershell

python .\scripts\run_with_env.py .\scripts\pipeline_anthony_capco.py

```

These scripts print document lengths, agent diagnostics, and whether Gemini is enabled. Set `.env` with `LLM_PROVIDER=gemini`, `LLM_MODEL=gemini-2.5-flash`, and `GEMINI_API_KEY`.

---

## Temporal knowledge graph (micro‑memory)
- `agents/temporal_tracker.py` stores time‑stamped triplets with non‑destructive invalidation.
- Integrated in pipeline review to track job application states and history.
- Utilities for timelines, active applications, and pattern analysis included.

---

## Parallel agents + meta‑agent demo
- Notebook: `notebooks/agents_parallel_demo.ipynb`
- Runs 4 analysis agents in parallel and combines outputs via a meta‑agent, with a timeline plot.
- Uses the central LLM client (`services/llm.py`) with `LLM_PROVIDER=gemini` and `LLM_MODEL=gemini-2.5-flash`.

Run (Jupyter/VSCode):
```python

%pip install nest_asyncio matplotlib

# Ensure GEMINI_API_KEY is set in your environment

```
Open and run the notebook cells.

---

## LinkedIn OAuth (optional)
1) Create a LinkedIn Developer App, then add redirect URLs:
```

http://localhost:8501

http://localhost:8501/callback

```
2) Products: enable “Sign In with LinkedIn using OpenID Connect”.
3) Update `.env` and set `MOCK_MODE=false`.
4) In the UI, use the “LinkedIn Authentication” section to kick off the flow.

Notes:
- LinkedIn Jobs API is enterprise‑only. The system uses Adzuna + other sources for job data.

---

## Job sources
- **Adzuna**: global coverage, 5,000 free jobs/month
- **Resilient aggregator** and optional **JobSpy MCP** for broader search
- **Custom jobs**: add your own postings in the UI
- Corporate SSL environments: Adzuna calls auto‑retries with `verify=False` fallback

---

## LLMs and configuration
- Central client supports OpenAI, Anthropic, and Gemini with per‑agent Gemini keys (`services/llm.py`).
- Recommended defaults for this project:
  - `LLM_PROVIDER=gemini`
  - `LLM_MODEL=gemini-2.5-flash`
- Agents pass `agent="cv|cover|parser|match|tailor|chat"` to use per‑agent keys when provided.

---

## Advanced agents (built‑in)
- **Parallel processing**: 3–5× faster multi‑job drafting
- **Temporal tracking**: time‑stamped history and pattern analysis
- **Observability**: tracing, metrics, timeline visualization
- **Context engineering**: flywheel learning, L1/L2/L3 memory, scalable context

Toggle these in the HF app under “🚀 Advanced AI Features”.

---

## LangExtract + Gemini
- Uses the same `GEMINI_API_KEY` (auto‑applied to `LANGEXTRACT_API_KEY` when empty)
- Official `langextract.extract(...)` requires examples; the UI also exposes a robust regex‑based fallback (`services/langextract_service.py`) so features work even when cloud extraction is constrained
- In HF app (“🔍 Enhanced Job Analysis”), you can:
  - Analyze job postings (structured fields + skills)
  - Optimize resume for ATS (score + missing keywords)
  - Bulk analyze multiple jobs

---

## Office exports
- **Word** (`services/word_cv.py`): resumes + cover letters (5 templates; `python‑docx` fallback)
- **PowerPoint** (`services/powerpoint_cv.py`): visual CV (4 templates; `python‑pptx` fallback)
- **Excel** (`services/excel_tracker.py`): tracker with 5 analytical sheets (`openpyxl` fallback)
- MCP servers supported when available; local libraries are used otherwise

In HF app, after generation, expand:
- “📊 Export to PowerPoint CV”
- “📝 Export to Word Documents”
- “📈 Export Excel Tracker”

---

## Hugging Face minimal Space branch
- Clean branch containing only `app.py` and `requirements.txt` for Spaces.
- Branch name: `hf-space-min` (push from a clean worktree).
- `.gitignore` includes `.env` and `.env.*` to avoid leaking secrets.

---

## Tests & scripts
- Run test suites in `tests/`
- Useful scripts: `test_*` files in project root (integration checks)

---

## Security
- OAuth state validation, input/path/url sanitization
- Sensitive data via environment variables; avoid committing secrets
- Atomic writes in memory store

---

## Run summary
- Streamlit: `python -m streamlit run app.py --server.port 8501`
- Gradio/HF: `PORT=7861 python hf_app.py`

Your system is fully documented here in one place and ready for local or HF deployment.