Spaces:

Noo88ear
/

Job-Application-Assistant

Runtime error

File size: 18,341 Bytes

---

title: Multi-Agent Job Application Assistant
emoji: 🚀
colorFrom: purple
colorTo: indigo
sdk: gradio
sdk_version: "4.44.0"
app_file: app.py
pinned: false
---


## Multi‑Agent Job Application Assistant (Streamlit + Gradio/Hugging Face)

A production‑ready system to discover jobs, generate ATS‑optimized resumes and cover letters, and export documents to Word/PowerPoint/Excel. Includes secure LinkedIn OAuth (optional), multi‑source job aggregation, Gemini‑powered generation, and advanced agent capabilities (parallelism, temporal tracking, observability, context engineering).

---

### What you get
- **Two UIs**: Streamlit (`app.py`) and Gradio/HF (`hf_app.py`)
- **LinkedIn OAuth 2.0** (optional; CSRF‑safe state validation)
- **Job aggregation**: Adzuna (5k/month) plus resilient fallbacks
- **ATS‑optimized drafting**: resumes + cover letters (Gemini)
- **Office exports**:
  - Word resumes and cover letters (5 templates)
  - PowerPoint CV (4 templates)
  - Excel application tracker (5 analytical sheets)
- **Advanced agents**: parallel execution, temporal memory, observability/tracing, and context engineering/flywheel
- **LangExtract integration**: structured extraction with Gemini key; robust regex fallback in constrained environments
- **New**: Router pipeline, Temporal KG integration, Parallel-agents demo, HF minimal Space branch
- **New (Aug 2025)**: UK resume rules, action-verb upgrades, anti-buzzword scrub, skills proficiency, remote readiness, Muse/Reed/Novorésumé/StandOut CV checklists, and interactive output controls (exact length, cycles, layout presets)

---

## Quickstart

### 1) Environment (.env)
Create a UTF‑8 `.env` (values optional if you want mock mode). See `.env.example` for the full list of variables:
```ini

# Behavior

MOCK_MODE=true

PORT=7860



# LLM / Research

LLM_PROVIDER=gemini

LLM_MODEL=gemini-2.5-flash

GEMINI_API_KEY=

# Optional per-agent Gemini keys

GEMINI_API_KEY_CV=

GEMINI_API_KEY_COVER=

GEMINI_API_KEY_CHAT=

GEMINI_API_KEY_PARSER=

GEMINI_API_KEY_MATCH=

GEMINI_API_KEY_TAILOR=

OPENAI_API_KEY=

ANTHROPIC_API_KEY=



TAVILY_API_KEY=



# Job APIs

ADZUNA_APP_ID=

ADZUNA_APP_KEY=



# Office MCP (optional)

POWERPOINT_MCP_URL=http://localhost:3000

WORD_MCP_URL=http://localhost:3001

EXCEL_MCP_URL=http://localhost:3002



# LangExtract uses GEMINI key by default

LANGEXTRACT_API_KEY=

```

Hardcoded keys have been removed from utility scripts. Use `switch_api_key.py` to safely set keys into `.env` without embedding them in code.

### 2) Install
- Windows PowerShell
```powershell

python -m venv .venv

.\.venv\Scripts\Activate.ps1

pip install -r requirements.txt

```
- Linux/macOS
```bash

python3 -m venv .venv

source .venv/bin/activate

pip install -r requirements.txt

```

### 3) Run the apps
- Streamlit (PATH‑safe)
```powershell

python -m streamlit run app.py --server.port 8501

```
- Gradio / Hugging Face (avoid port conflicts)
```powershell

$env:PORT=7861; python hf_app.py

```
```bash

PORT=7861 python hf_app.py

```
The HF app binds on 0.0.0.0:$PORT.

---

## 📊 System Architecture Overview

This is a **production-ready, multi-agent job application system** with sophisticated AI capabilities and enterprise-grade features:

### 🏗️ Core Architecture

#### **Dual Interface Design**
- **Streamlit Interface** (`app.py`) - Traditional web application for desktop use
- **Gradio/HF Interface** (`hf_app.py`) - Modern, mobile-friendly, deployable to Hugging Face Spaces

#### **Multi-Agent System** (15 Specialized Agents)

**Core Processing Agents:**
- **`OrchestratorAgent`** - Central coordinator managing workflow and job orchestration
- **`CVOwnerAgent`** - ATS-optimized resume generation with UK-specific formatting rules
- **`CoverLetterAgent`** - Personalized cover letter generation with keyword optimization
- **`ProfileAgent`** - Intelligent CV parsing and structured profile extraction
- **`JobAgent`** - Job posting analysis and requirement extraction
- **`RouterAgent`** - Dynamic routing based on payload state and workflow stage

**Advanced AI Agents:**
- **`ParallelExecutor`** - Concurrent processing for 3-5x faster multi-job handling
- **`TemporalTracker`** - Time-stamped application history and pattern analysis
- **`ObservabilityAgent`** - Real-time tracing, metrics collection, and monitoring
- **`ContextEngineer`** - Flywheel learning and context optimization
- **`ContextScaler`** - L1/L2/L3 memory management for scalable context handling
- **`LinkedInManager`** - OAuth 2.0 integration and profile synchronization
- **`MetaAgent`** - Combines outputs from multiple specialized analysis agents
- **`TriageAgent`** - Intelligent task prioritization and routing

#### **Guidelines Enforcement System** (`agents/guidelines.py`)
Comprehensive rule engine ensuring document quality:
- **UK Compliance**: British English, UK date formats (MMM YYYY), £ currency normalization
- **ATS Optimization**: Plain text formatting, keyword density, section structure
- **Content Quality**: Anti-buzzword filtering, action verb strengthening, first-person removal
- **Layout Rules**: Exact length enforcement, heading validation, bullet point formatting

### 🔌 Integration Ecosystem

#### **LLM Integration** (`services/llm.py`)
- **Multi-Provider Support**: OpenAI, Anthropic Claude, Google Gemini
- **Per-Agent API Keys**: Cost optimization through agent-specific key allocation
- **Intelligent Fallbacks**: Graceful degradation when providers unavailable
- **Configurable Models**: Per-agent model selection for optimal performance/cost

#### **Job Aggregation** (`services/job_aggregator.py`, `services/jobspy_client.py`)
- **Primary Sources**: Adzuna API (5,000 jobs/month free tier)
- **JobSpy Integration**: Indeed, LinkedIn, Glassdoor aggregation
- **Additional APIs**: Remotive, The Muse, GitHub Jobs
- **Smart Deduplication**: Title + company matching with fuzzy logic
- **SSL Bypass**: Automatic retry for corporate environments

#### **Document Generation** (`services/`)
- **Word Documents** (`word_cv.py`): 5 professional templates, MCP server integration
- **PowerPoint CVs** (`powerpoint_cv.py`): 4 visual templates for presentations
- **Excel Trackers** (`excel_tracker.py`): 5 analytical sheets with metrics
- **PDF Export**: Cross-platform compatibility with formatting preservation

### 📈 Advanced Features

#### **Pipeline Architecture** (`agents/pipeline.py`)
```

User Input → Router → Profile Analysis → Job Analysis → Resume Generation → Cover Letter → Review → Memory Storage

                ↓           ↓                ↓              ↓                    ↓            ↓

           Event Log   Profile Cache    Job Cache    Document Cache      Metrics Log   Temporal KG

```

#### **Memory & Persistence**
- **File-backed Storage** (`memory/store.py`): Atomic writes, thread-safe operations
- **Temporal Knowledge Graph**: Application tracking with time-stamped relationships
- **Event Sourcing** (`events.jsonl`): Complete audit trail of all agent actions
- **Caching System** (`utils/cache.py`): TTL-based caching with automatic eviction

#### **LangExtract Integration** (`services/langextract_service.py`)

- **Structured Extraction**: Job requirements, skills, company culture

- **ATS Optimization**: Keyword extraction and scoring

- **Fallback Mechanisms**: Regex-based extraction when API unavailable

- **Result Caching**: Performance optimization for repeated analyses



### 🛡️ Security & Configuration



#### **Authentication & Security**

- **OAuth 2.0**: LinkedIn integration with CSRF protection

- **Input Sanitization**: Path traversal and injection prevention

- **Environment Isolation**: Secrets management via `.env`

- **Rate Limiting**: API throttling and abuse prevention



#### **Configuration Management**

- **Environment Variables**: All sensitive data in `.env`

- **Agent Configuration** (`utils/config.py`): Centralized settings

- **Template System**: Customizable document templates

- **Feature Flags**: Progressive enhancement based on available services



### 📁 Project Structure



```

2096955/

├── agents/               # Multi-agent system components

│   ├── orchestrator.py   # Main orchestration logic

│   ├── cv_owner.py       # Resume generation with guidelines
│   ├── guidelines.py     # UK rules and ATS optimization
│   ├── pipeline.py       # Application pipeline flow
│   └── ...              # Additional specialized agents
├── services/            # External integrations and services
│   ├── llm.py           # Multi-provider LLM client
│   ├── job_aggregator.py # Job source aggregation

│   ├── word_cv.py       # Word document generation
│   └── ...              # Document and API services
├── utils/               # Utility functions and helpers
│   ├── ats.py           # ATS scoring and optimization
│   ├── cache.py         # TTL caching system
│   ├── consistency.py   # Contradiction detection
│   └── ...              # Text processing and helpers
├── models/              # Data models and schemas
│   └── schemas.py       # Pydantic models for type safety
├── mcp/                 # Model Context Protocol servers
│   ├── cv_owner_server.py
│   ├── cover_letter_server.py
│   └── orchestrator_server.py

├── memory/              # Persistent storage

│   ├── store.py         # File-backed memory store

│   └── data/            # Application state and history

├── app.py               # Streamlit interface

├── hf_app.py            # Gradio/HF interface
└── api_llm_integration.py # REST API endpoints
```



### 🚀 Performance Optimizations



- **Parallel Processing**: Async job handling with `asyncio` and `nest_asyncio`

- **Lazy Loading**: Dependencies loaded only when needed

- **Smart Caching**: Multi-level caching (memory, file, API responses)

- **Batch Operations**: Efficient multi-job processing

- **Event-Driven**: Asynchronous event handling for responsiveness



### 🧪 Testing & Quality



- **Test Suites**: Comprehensive tests in `tests/` directory

- **Integration Tests**: API and service integration validation

- **Mock Mode**: Development without API keys

- **Smoke Tests**: Quick validation scripts for deployment

- **Observability**: Built-in tracing and metrics collection



---



## Router pipeline (User → Router → Profile → Job → Resume → Cover → Review)

- Implemented in `agents/pipeline.py` and exposed via API in `api_llm_integration.py` (`/api/llm/pipeline_run`).

- Agents:

  - `RouterAgent`: routes based on payload state

  - `ProfileAgent`: parses CV to structured profile (LLM with fallback)

  - `JobAgent`: analyzes job posting (LLM with fallback)

  - `CVOwnerAgent` and `CoverLetterAgent`: draft documents (Gemini, per-agent keys)

  - Review: contradiction checks and memory persist

- Temporal tracking: on review, a `drafted` status is recorded in the temporal KG with issues metadata.



**Flow diagram**

```mermaid

flowchart TD

  U["User"] --> R["RouterAgent"]

  R -->|cv_text present| P["ProfileAgent (LLM)"]

  R -->|job_posting present| J["JobAgent (LLM)"]

  P --> RESUME["CVOwnerAgent"]

  J --> RESUME

  RESUME --> COVER["CoverLetterAgent"]

  COVER --> REVIEW["Orchestrator Review"]

  REVIEW --> M["MemoryStore (file-backed)"]

  REVIEW --> TKG["Temporal KG (triplets)"]

  subgraph LLM["LLM Client (Gemini 2.5 Flash, per-agent keys)"]

    P

    J

    RESUME

    COVER

  end

  subgraph UI["Gradio (HF)"]

    U

  end

  subgraph API["Flask API"]

    PR["/api/llm/pipeline_run"]

  end

  U -. optional .-> PR

```

---

## Hugging Face / Gradio (interactive controls)
- In the CV Analysis tab, you can now set:
  - **Refinement cycles** (1–5)
  - **Exact target length** (characters) to enforce resume and cover length deterministically
  - **Layout preset**: `classic`, `modern`, `minimalist`, `executive`
    - classic: Summary → Skills → Experience → Education (above the fold for Summary/Skills)
    - modern: Summary → Experience → Skills → Projects/Certifications → Education
    - minimalist: concise Summary → Skills → Experience → Education
    - executive: Summary → Selected Achievements (3–5) → Experience → Skills → Education → Certifications

---

## UK resume/cover rules (built-in)
- UK English and dates (MMM YYYY)
- Current role in present tense; previous roles in past tense
- Digits for numbers; £ and % normalization
- Remove first‑person pronouns in resume bullets; maintain active voice
- Hard skills first (max ~10), then soft skills; verbatim critical JD keywords in bullets
- Strip DOB/photo lines; compress older roles (>15 years) to title/company/dates

These rules are applied by `agents/cv_owner.py` and validated by checklists.

---

## Checklists and observability
- Checklists integrate guidance from:
  - Reed: CV layout and mistakes
  - The Muse: action verbs and layout basics
  - Novorésumé: one‑page bias, clean sections, links
  - StandOut CV: quantification, bullet density, recent‑role focus
- Observability tab aggregates per‑agent events and displays checklist outcomes. Events are stored in `memory/data/events.jsonl`.

---

## Scripts (headless runs)
- Capco (Anthony Lui → Capco):
```powershell

python .\scripts\run_with_env.py .\scripts\run_anthony_capco.py

```
- Anthropic (Anthony Lui → Anthropic):
```powershell

python .\scripts\run_with_env.py .\scripts\run_anthropic_job.py

```
- Pipeline (Router + Agents + Review + Events):
```powershell

python .\scripts\run_with_env.py .\scripts\pipeline_anthony_capco.py

```

These scripts print document lengths, agent diagnostics, and whether Gemini is enabled. Set `.env` with `LLM_PROVIDER=gemini`, `LLM_MODEL=gemini-2.5-flash`, and `GEMINI_API_KEY`.

---

## Temporal knowledge graph (micro‑memory)
- `agents/temporal_tracker.py` stores time‑stamped triplets with non‑destructive invalidation.
- Integrated in pipeline review to track job application states and history.
- Utilities for timelines, active applications, and pattern analysis included.

---

## Parallel agents + meta‑agent demo
- Notebook: `notebooks/agents_parallel_demo.ipynb`
- Runs 4 analysis agents in parallel and combines outputs via a meta‑agent, with a timeline plot.
- Uses the central LLM client (`services/llm.py`) with `LLM_PROVIDER=gemini` and `LLM_MODEL=gemini-2.5-flash`.

Run (Jupyter/VSCode):
```python

%pip install nest_asyncio matplotlib

# Ensure GEMINI_API_KEY is set in your environment

```
Open and run the notebook cells.

---

## LinkedIn OAuth (optional)
1) Create a LinkedIn Developer App, then add redirect URLs:
```

http://localhost:8501

http://localhost:8501/callback

```
2) Products: enable “Sign In with LinkedIn using OpenID Connect”.
3) Update `.env` and set `MOCK_MODE=false`.
4) In the UI, use the “LinkedIn Authentication” section to kick off the flow.

Notes:
- LinkedIn Jobs API is enterprise‑only. The system uses Adzuna + other sources for job data.

---

## Job sources
- **Adzuna**: global coverage, 5,000 free jobs/month
- **Resilient aggregator** and optional **JobSpy MCP** for broader search
- **Custom jobs**: add your own postings in the UI
- Corporate SSL environments: Adzuna calls auto‑retries with `verify=False` fallback

---

## LLMs and configuration
- Central client supports OpenAI, Anthropic, and Gemini with per‑agent Gemini keys (`services/llm.py`).
- Recommended defaults for this project:
  - `LLM_PROVIDER=gemini`
  - `LLM_MODEL=gemini-2.5-flash`
- Agents pass `agent="cv|cover|parser|match|tailor|chat"` to use per‑agent keys when provided.

---

## Advanced agents (built‑in)
- **Parallel processing**: 3–5× faster multi‑job drafting
- **Temporal tracking**: time‑stamped history and pattern analysis
- **Observability**: tracing, metrics, timeline visualization
- **Context engineering**: flywheel learning, L1/L2/L3 memory, scalable context

Toggle these in the HF app under “🚀 Advanced AI Features”.

---

## LangExtract + Gemini
- Uses the same `GEMINI_API_KEY` (auto‑applied to `LANGEXTRACT_API_KEY` when empty)
- Official `langextract.extract(...)` requires examples; the UI also exposes a robust regex‑based fallback (`services/langextract_service.py`) so features work even when cloud extraction is constrained
- In HF app (“🔍 Enhanced Job Analysis”), you can:
  - Analyze job postings (structured fields + skills)
  - Optimize resume for ATS (score + missing keywords)
  - Bulk analyze multiple jobs

---

## Office exports
- **Word** (`services/word_cv.py`): resumes + cover letters (5 templates; `python‑docx` fallback)
- **PowerPoint** (`services/powerpoint_cv.py`): visual CV (4 templates; `python‑pptx` fallback)
- **Excel** (`services/excel_tracker.py`): tracker with 5 analytical sheets (`openpyxl` fallback)
- MCP servers supported when available; local libraries are used otherwise

In HF app, after generation, expand:
- “📊 Export to PowerPoint CV”
- “📝 Export to Word Documents”
- “📈 Export Excel Tracker”

---

## Hugging Face minimal Space branch
- Clean branch containing only `app.py` and `requirements.txt` for Spaces.
- Branch name: `hf-space-min` (push from a clean worktree).
- `.gitignore` includes `.env` and `.env.*` to avoid leaking secrets.

---

## Tests & scripts
- Run test suites in `tests/`
- Useful scripts: `test_*` files in project root (integration checks)

---

## Security
- OAuth state validation, input/path/url sanitization
- Sensitive data via environment variables; avoid committing secrets
- Atomic writes in memory store

---

## Run summary
- Streamlit: `python -m streamlit run app.py --server.port 8501`
- Gradio/HF: `PORT=7861 python hf_app.py`

Your system is fully documented here in one place and ready for local or HF deployment.