Spaces:
Runtime error
Runtime error
File size: 18,341 Bytes
5d6d60e 7498f2c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 |
---
title: Multi-Agent Job Application Assistant
emoji: 🚀
colorFrom: purple
colorTo: indigo
sdk: gradio
sdk_version: "4.44.0"
app_file: app.py
pinned: false
---
## Multi‑Agent Job Application Assistant (Streamlit + Gradio/Hugging Face)
A production‑ready system to discover jobs, generate ATS‑optimized resumes and cover letters, and export documents to Word/PowerPoint/Excel. Includes secure LinkedIn OAuth (optional), multi‑source job aggregation, Gemini‑powered generation, and advanced agent capabilities (parallelism, temporal tracking, observability, context engineering).
---
### What you get
- **Two UIs**: Streamlit (`app.py`) and Gradio/HF (`hf_app.py`)
- **LinkedIn OAuth 2.0** (optional; CSRF‑safe state validation)
- **Job aggregation**: Adzuna (5k/month) plus resilient fallbacks
- **ATS‑optimized drafting**: resumes + cover letters (Gemini)
- **Office exports**:
- Word resumes and cover letters (5 templates)
- PowerPoint CV (4 templates)
- Excel application tracker (5 analytical sheets)
- **Advanced agents**: parallel execution, temporal memory, observability/tracing, and context engineering/flywheel
- **LangExtract integration**: structured extraction with Gemini key; robust regex fallback in constrained environments
- **New**: Router pipeline, Temporal KG integration, Parallel-agents demo, HF minimal Space branch
- **New (Aug 2025)**: UK resume rules, action-verb upgrades, anti-buzzword scrub, skills proficiency, remote readiness, Muse/Reed/Novorésumé/StandOut CV checklists, and interactive output controls (exact length, cycles, layout presets)
---
## Quickstart
### 1) Environment (.env)
Create a UTF‑8 `.env` (values optional if you want mock mode). See `.env.example` for the full list of variables:
```ini
# Behavior
MOCK_MODE=true
PORT=7860
# LLM / Research
LLM_PROVIDER=gemini
LLM_MODEL=gemini-2.5-flash
GEMINI_API_KEY=
# Optional per-agent Gemini keys
GEMINI_API_KEY_CV=
GEMINI_API_KEY_COVER=
GEMINI_API_KEY_CHAT=
GEMINI_API_KEY_PARSER=
GEMINI_API_KEY_MATCH=
GEMINI_API_KEY_TAILOR=
OPENAI_API_KEY=
ANTHROPIC_API_KEY=
TAVILY_API_KEY=
# Job APIs
ADZUNA_APP_ID=
ADZUNA_APP_KEY=
# Office MCP (optional)
POWERPOINT_MCP_URL=http://localhost:3000
WORD_MCP_URL=http://localhost:3001
EXCEL_MCP_URL=http://localhost:3002
# LangExtract uses GEMINI key by default
LANGEXTRACT_API_KEY=
```
Hardcoded keys have been removed from utility scripts. Use `switch_api_key.py` to safely set keys into `.env` without embedding them in code.
### 2) Install
- Windows PowerShell
```powershell
python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt
```
- Linux/macOS
```bash
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
```
### 3) Run the apps
- Streamlit (PATH‑safe)
```powershell
python -m streamlit run app.py --server.port 8501
```
- Gradio / Hugging Face (avoid port conflicts)
```powershell
$env:PORT=7861; python hf_app.py
```
```bash
PORT=7861 python hf_app.py
```
The HF app binds on 0.0.0.0:$PORT.
---
## 📊 System Architecture Overview
This is a **production-ready, multi-agent job application system** with sophisticated AI capabilities and enterprise-grade features:
### 🏗️ Core Architecture
#### **Dual Interface Design**
- **Streamlit Interface** (`app.py`) - Traditional web application for desktop use
- **Gradio/HF Interface** (`hf_app.py`) - Modern, mobile-friendly, deployable to Hugging Face Spaces
#### **Multi-Agent System** (15 Specialized Agents)
**Core Processing Agents:**
- **`OrchestratorAgent`** - Central coordinator managing workflow and job orchestration
- **`CVOwnerAgent`** - ATS-optimized resume generation with UK-specific formatting rules
- **`CoverLetterAgent`** - Personalized cover letter generation with keyword optimization
- **`ProfileAgent`** - Intelligent CV parsing and structured profile extraction
- **`JobAgent`** - Job posting analysis and requirement extraction
- **`RouterAgent`** - Dynamic routing based on payload state and workflow stage
**Advanced AI Agents:**
- **`ParallelExecutor`** - Concurrent processing for 3-5x faster multi-job handling
- **`TemporalTracker`** - Time-stamped application history and pattern analysis
- **`ObservabilityAgent`** - Real-time tracing, metrics collection, and monitoring
- **`ContextEngineer`** - Flywheel learning and context optimization
- **`ContextScaler`** - L1/L2/L3 memory management for scalable context handling
- **`LinkedInManager`** - OAuth 2.0 integration and profile synchronization
- **`MetaAgent`** - Combines outputs from multiple specialized analysis agents
- **`TriageAgent`** - Intelligent task prioritization and routing
#### **Guidelines Enforcement System** (`agents/guidelines.py`)
Comprehensive rule engine ensuring document quality:
- **UK Compliance**: British English, UK date formats (MMM YYYY), £ currency normalization
- **ATS Optimization**: Plain text formatting, keyword density, section structure
- **Content Quality**: Anti-buzzword filtering, action verb strengthening, first-person removal
- **Layout Rules**: Exact length enforcement, heading validation, bullet point formatting
### 🔌 Integration Ecosystem
#### **LLM Integration** (`services/llm.py`)
- **Multi-Provider Support**: OpenAI, Anthropic Claude, Google Gemini
- **Per-Agent API Keys**: Cost optimization through agent-specific key allocation
- **Intelligent Fallbacks**: Graceful degradation when providers unavailable
- **Configurable Models**: Per-agent model selection for optimal performance/cost
#### **Job Aggregation** (`services/job_aggregator.py`, `services/jobspy_client.py`)
- **Primary Sources**: Adzuna API (5,000 jobs/month free tier)
- **JobSpy Integration**: Indeed, LinkedIn, Glassdoor aggregation
- **Additional APIs**: Remotive, The Muse, GitHub Jobs
- **Smart Deduplication**: Title + company matching with fuzzy logic
- **SSL Bypass**: Automatic retry for corporate environments
#### **Document Generation** (`services/`)
- **Word Documents** (`word_cv.py`): 5 professional templates, MCP server integration
- **PowerPoint CVs** (`powerpoint_cv.py`): 4 visual templates for presentations
- **Excel Trackers** (`excel_tracker.py`): 5 analytical sheets with metrics
- **PDF Export**: Cross-platform compatibility with formatting preservation
### 📈 Advanced Features
#### **Pipeline Architecture** (`agents/pipeline.py`)
```
User Input → Router → Profile Analysis → Job Analysis → Resume Generation → Cover Letter → Review → Memory Storage
↓ ↓ ↓ ↓ ↓ ↓
Event Log Profile Cache Job Cache Document Cache Metrics Log Temporal KG
```
#### **Memory & Persistence**
- **File-backed Storage** (`memory/store.py`): Atomic writes, thread-safe operations
- **Temporal Knowledge Graph**: Application tracking with time-stamped relationships
- **Event Sourcing** (`events.jsonl`): Complete audit trail of all agent actions
- **Caching System** (`utils/cache.py`): TTL-based caching with automatic eviction
#### **LangExtract Integration** (`services/langextract_service.py`)
- **Structured Extraction**: Job requirements, skills, company culture
- **ATS Optimization**: Keyword extraction and scoring
- **Fallback Mechanisms**: Regex-based extraction when API unavailable
- **Result Caching**: Performance optimization for repeated analyses
### 🛡️ Security & Configuration
#### **Authentication & Security**
- **OAuth 2.0**: LinkedIn integration with CSRF protection
- **Input Sanitization**: Path traversal and injection prevention
- **Environment Isolation**: Secrets management via `.env`
- **Rate Limiting**: API throttling and abuse prevention
#### **Configuration Management**
- **Environment Variables**: All sensitive data in `.env`
- **Agent Configuration** (`utils/config.py`): Centralized settings
- **Template System**: Customizable document templates
- **Feature Flags**: Progressive enhancement based on available services
### 📁 Project Structure
```
2096955/
├── agents/ # Multi-agent system components
│ ├── orchestrator.py # Main orchestration logic
│ ├── cv_owner.py # Resume generation with guidelines
│ ├── guidelines.py # UK rules and ATS optimization
│ ├── pipeline.py # Application pipeline flow
│ └── ... # Additional specialized agents
├── services/ # External integrations and services
│ ├── llm.py # Multi-provider LLM client
│ ├── job_aggregator.py # Job source aggregation
│ ├── word_cv.py # Word document generation
│ └── ... # Document and API services
├── utils/ # Utility functions and helpers
│ ├── ats.py # ATS scoring and optimization
│ ├── cache.py # TTL caching system
│ ├── consistency.py # Contradiction detection
│ └── ... # Text processing and helpers
├── models/ # Data models and schemas
│ └── schemas.py # Pydantic models for type safety
├── mcp/ # Model Context Protocol servers
│ ├── cv_owner_server.py
│ ├── cover_letter_server.py
│ └── orchestrator_server.py
├── memory/ # Persistent storage
│ ├── store.py # File-backed memory store
│ └── data/ # Application state and history
├── app.py # Streamlit interface
├── hf_app.py # Gradio/HF interface
└── api_llm_integration.py # REST API endpoints
```
### 🚀 Performance Optimizations
- **Parallel Processing**: Async job handling with `asyncio` and `nest_asyncio`
- **Lazy Loading**: Dependencies loaded only when needed
- **Smart Caching**: Multi-level caching (memory, file, API responses)
- **Batch Operations**: Efficient multi-job processing
- **Event-Driven**: Asynchronous event handling for responsiveness
### 🧪 Testing & Quality
- **Test Suites**: Comprehensive tests in `tests/` directory
- **Integration Tests**: API and service integration validation
- **Mock Mode**: Development without API keys
- **Smoke Tests**: Quick validation scripts for deployment
- **Observability**: Built-in tracing and metrics collection
---
## Router pipeline (User → Router → Profile → Job → Resume → Cover → Review)
- Implemented in `agents/pipeline.py` and exposed via API in `api_llm_integration.py` (`/api/llm/pipeline_run`).
- Agents:
- `RouterAgent`: routes based on payload state
- `ProfileAgent`: parses CV to structured profile (LLM with fallback)
- `JobAgent`: analyzes job posting (LLM with fallback)
- `CVOwnerAgent` and `CoverLetterAgent`: draft documents (Gemini, per-agent keys)
- Review: contradiction checks and memory persist
- Temporal tracking: on review, a `drafted` status is recorded in the temporal KG with issues metadata.
**Flow diagram**
```mermaid
flowchart TD
U["User"] --> R["RouterAgent"]
R -->|cv_text present| P["ProfileAgent (LLM)"]
R -->|job_posting present| J["JobAgent (LLM)"]
P --> RESUME["CVOwnerAgent"]
J --> RESUME
RESUME --> COVER["CoverLetterAgent"]
COVER --> REVIEW["Orchestrator Review"]
REVIEW --> M["MemoryStore (file-backed)"]
REVIEW --> TKG["Temporal KG (triplets)"]
subgraph LLM["LLM Client (Gemini 2.5 Flash, per-agent keys)"]
P
J
RESUME
COVER
end
subgraph UI["Gradio (HF)"]
U
end
subgraph API["Flask API"]
PR["/api/llm/pipeline_run"]
end
U -. optional .-> PR
```
---
## Hugging Face / Gradio (interactive controls)
- In the CV Analysis tab, you can now set:
- **Refinement cycles** (1–5)
- **Exact target length** (characters) to enforce resume and cover length deterministically
- **Layout preset**: `classic`, `modern`, `minimalist`, `executive`
- classic: Summary → Skills → Experience → Education (above the fold for Summary/Skills)
- modern: Summary → Experience → Skills → Projects/Certifications → Education
- minimalist: concise Summary → Skills → Experience → Education
- executive: Summary → Selected Achievements (3–5) → Experience → Skills → Education → Certifications
---
## UK resume/cover rules (built-in)
- UK English and dates (MMM YYYY)
- Current role in present tense; previous roles in past tense
- Digits for numbers; £ and % normalization
- Remove first‑person pronouns in resume bullets; maintain active voice
- Hard skills first (max ~10), then soft skills; verbatim critical JD keywords in bullets
- Strip DOB/photo lines; compress older roles (>15 years) to title/company/dates
These rules are applied by `agents/cv_owner.py` and validated by checklists.
---
## Checklists and observability
- Checklists integrate guidance from:
- Reed: CV layout and mistakes
- The Muse: action verbs and layout basics
- Novorésumé: one‑page bias, clean sections, links
- StandOut CV: quantification, bullet density, recent‑role focus
- Observability tab aggregates per‑agent events and displays checklist outcomes. Events are stored in `memory/data/events.jsonl`.
---
## Scripts (headless runs)
- Capco (Anthony Lui → Capco):
```powershell
python .\scripts\run_with_env.py .\scripts\run_anthony_capco.py
```
- Anthropic (Anthony Lui → Anthropic):
```powershell
python .\scripts\run_with_env.py .\scripts\run_anthropic_job.py
```
- Pipeline (Router + Agents + Review + Events):
```powershell
python .\scripts\run_with_env.py .\scripts\pipeline_anthony_capco.py
```
These scripts print document lengths, agent diagnostics, and whether Gemini is enabled. Set `.env` with `LLM_PROVIDER=gemini`, `LLM_MODEL=gemini-2.5-flash`, and `GEMINI_API_KEY`.
---
## Temporal knowledge graph (micro‑memory)
- `agents/temporal_tracker.py` stores time‑stamped triplets with non‑destructive invalidation.
- Integrated in pipeline review to track job application states and history.
- Utilities for timelines, active applications, and pattern analysis included.
---
## Parallel agents + meta‑agent demo
- Notebook: `notebooks/agents_parallel_demo.ipynb`
- Runs 4 analysis agents in parallel and combines outputs via a meta‑agent, with a timeline plot.
- Uses the central LLM client (`services/llm.py`) with `LLM_PROVIDER=gemini` and `LLM_MODEL=gemini-2.5-flash`.
Run (Jupyter/VSCode):
```python
%pip install nest_asyncio matplotlib
# Ensure GEMINI_API_KEY is set in your environment
```
Open and run the notebook cells.
---
## LinkedIn OAuth (optional)
1) Create a LinkedIn Developer App, then add redirect URLs:
```
http://localhost:8501
http://localhost:8501/callback
```
2) Products: enable “Sign In with LinkedIn using OpenID Connect”.
3) Update `.env` and set `MOCK_MODE=false`.
4) In the UI, use the “LinkedIn Authentication” section to kick off the flow.
Notes:
- LinkedIn Jobs API is enterprise‑only. The system uses Adzuna + other sources for job data.
---
## Job sources
- **Adzuna**: global coverage, 5,000 free jobs/month
- **Resilient aggregator** and optional **JobSpy MCP** for broader search
- **Custom jobs**: add your own postings in the UI
- Corporate SSL environments: Adzuna calls auto‑retries with `verify=False` fallback
---
## LLMs and configuration
- Central client supports OpenAI, Anthropic, and Gemini with per‑agent Gemini keys (`services/llm.py`).
- Recommended defaults for this project:
- `LLM_PROVIDER=gemini`
- `LLM_MODEL=gemini-2.5-flash`
- Agents pass `agent="cv|cover|parser|match|tailor|chat"` to use per‑agent keys when provided.
---
## Advanced agents (built‑in)
- **Parallel processing**: 3–5× faster multi‑job drafting
- **Temporal tracking**: time‑stamped history and pattern analysis
- **Observability**: tracing, metrics, timeline visualization
- **Context engineering**: flywheel learning, L1/L2/L3 memory, scalable context
Toggle these in the HF app under “🚀 Advanced AI Features”.
---
## LangExtract + Gemini
- Uses the same `GEMINI_API_KEY` (auto‑applied to `LANGEXTRACT_API_KEY` when empty)
- Official `langextract.extract(...)` requires examples; the UI also exposes a robust regex‑based fallback (`services/langextract_service.py`) so features work even when cloud extraction is constrained
- In HF app (“🔍 Enhanced Job Analysis”), you can:
- Analyze job postings (structured fields + skills)
- Optimize resume for ATS (score + missing keywords)
- Bulk analyze multiple jobs
---
## Office exports
- **Word** (`services/word_cv.py`): resumes + cover letters (5 templates; `python‑docx` fallback)
- **PowerPoint** (`services/powerpoint_cv.py`): visual CV (4 templates; `python‑pptx` fallback)
- **Excel** (`services/excel_tracker.py`): tracker with 5 analytical sheets (`openpyxl` fallback)
- MCP servers supported when available; local libraries are used otherwise
In HF app, after generation, expand:
- “📊 Export to PowerPoint CV”
- “📝 Export to Word Documents”
- “📈 Export Excel Tracker”
---
## Hugging Face minimal Space branch
- Clean branch containing only `app.py` and `requirements.txt` for Spaces.
- Branch name: `hf-space-min` (push from a clean worktree).
- `.gitignore` includes `.env` and `.env.*` to avoid leaking secrets.
---
## Tests & scripts
- Run test suites in `tests/`
- Useful scripts: `test_*` files in project root (integration checks)
---
## Security
- OAuth state validation, input/path/url sanitization
- Sensitive data via environment variables; avoid committing secrets
- Atomic writes in memory store
---
## Run summary
- Streamlit: `python -m streamlit run app.py --server.port 8501`
- Gradio/HF: `PORT=7861 python hf_app.py`
Your system is fully documented here in one place and ready for local or HF deployment.
|