--- title: Multi-Agent Job Application Assistant emoji: 🚀 colorFrom: purple colorTo: indigo sdk: gradio sdk_version: "4.44.0" app_file: app.py pinned: false --- ## Multi‑Agent Job Application Assistant (Streamlit + Gradio/Hugging Face) A production‑ready system to discover jobs, generate ATS‑optimized resumes and cover letters, and export documents to Word/PowerPoint/Excel. Includes secure LinkedIn OAuth (optional), multi‑source job aggregation, Gemini‑powered generation, and advanced agent capabilities (parallelism, temporal tracking, observability, context engineering). --- ### What you get - **Two UIs**: Streamlit (`app.py`) and Gradio/HF (`hf_app.py`) - **LinkedIn OAuth 2.0** (optional; CSRF‑safe state validation) - **Job aggregation**: Adzuna (5k/month) plus resilient fallbacks - **ATS‑optimized drafting**: resumes + cover letters (Gemini) - **Office exports**: - Word resumes and cover letters (5 templates) - PowerPoint CV (4 templates) - Excel application tracker (5 analytical sheets) - **Advanced agents**: parallel execution, temporal memory, observability/tracing, and context engineering/flywheel - **LangExtract integration**: structured extraction with Gemini key; robust regex fallback in constrained environments - **New**: Router pipeline, Temporal KG integration, Parallel-agents demo, HF minimal Space branch - **New (Aug 2025)**: UK resume rules, action-verb upgrades, anti-buzzword scrub, skills proficiency, remote readiness, Muse/Reed/Novorésumé/StandOut CV checklists, and interactive output controls (exact length, cycles, layout presets) --- ## Quickstart ### 1) Environment (.env) Create a UTF‑8 `.env` (values optional if you want mock mode). See `.env.example` for the full list of variables: ```ini # Behavior MOCK_MODE=true PORT=7860 # LLM / Research LLM_PROVIDER=gemini LLM_MODEL=gemini-2.5-flash GEMINI_API_KEY= # Optional per-agent Gemini keys GEMINI_API_KEY_CV= GEMINI_API_KEY_COVER= GEMINI_API_KEY_CHAT= GEMINI_API_KEY_PARSER= GEMINI_API_KEY_MATCH= GEMINI_API_KEY_TAILOR= OPENAI_API_KEY= ANTHROPIC_API_KEY= TAVILY_API_KEY= # Job APIs ADZUNA_APP_ID= ADZUNA_APP_KEY= # Office MCP (optional) POWERPOINT_MCP_URL=http://localhost:3000 WORD_MCP_URL=http://localhost:3001 EXCEL_MCP_URL=http://localhost:3002 # LangExtract uses GEMINI key by default LANGEXTRACT_API_KEY= ``` Hardcoded keys have been removed from utility scripts. Use `switch_api_key.py` to safely set keys into `.env` without embedding them in code. ### 2) Install - Windows PowerShell ```powershell python -m venv .venv .\.venv\Scripts\Activate.ps1 pip install -r requirements.txt ``` - Linux/macOS ```bash python3 -m venv .venv source .venv/bin/activate pip install -r requirements.txt ``` ### 3) Run the apps - Streamlit (PATH‑safe) ```powershell python -m streamlit run app.py --server.port 8501 ``` - Gradio / Hugging Face (avoid port conflicts) ```powershell $env:PORT=7861; python hf_app.py ``` ```bash PORT=7861 python hf_app.py ``` The HF app binds on 0.0.0.0:$PORT. --- ## 📊 System Architecture Overview This is a **production-ready, multi-agent job application system** with sophisticated AI capabilities and enterprise-grade features: ### 🏗️ Core Architecture #### **Dual Interface Design** - **Streamlit Interface** (`app.py`) - Traditional web application for desktop use - **Gradio/HF Interface** (`hf_app.py`) - Modern, mobile-friendly, deployable to Hugging Face Spaces #### **Multi-Agent System** (15 Specialized Agents) **Core Processing Agents:** - **`OrchestratorAgent`** - Central coordinator managing workflow and job orchestration - **`CVOwnerAgent`** - ATS-optimized resume generation with UK-specific formatting rules - **`CoverLetterAgent`** - Personalized cover letter generation with keyword optimization - **`ProfileAgent`** - Intelligent CV parsing and structured profile extraction - **`JobAgent`** - Job posting analysis and requirement extraction - **`RouterAgent`** - Dynamic routing based on payload state and workflow stage **Advanced AI Agents:** - **`ParallelExecutor`** - Concurrent processing for 3-5x faster multi-job handling - **`TemporalTracker`** - Time-stamped application history and pattern analysis - **`ObservabilityAgent`** - Real-time tracing, metrics collection, and monitoring - **`ContextEngineer`** - Flywheel learning and context optimization - **`ContextScaler`** - L1/L2/L3 memory management for scalable context handling - **`LinkedInManager`** - OAuth 2.0 integration and profile synchronization - **`MetaAgent`** - Combines outputs from multiple specialized analysis agents - **`TriageAgent`** - Intelligent task prioritization and routing #### **Guidelines Enforcement System** (`agents/guidelines.py`) Comprehensive rule engine ensuring document quality: - **UK Compliance**: British English, UK date formats (MMM YYYY), £ currency normalization - **ATS Optimization**: Plain text formatting, keyword density, section structure - **Content Quality**: Anti-buzzword filtering, action verb strengthening, first-person removal - **Layout Rules**: Exact length enforcement, heading validation, bullet point formatting ### 🔌 Integration Ecosystem #### **LLM Integration** (`services/llm.py`) - **Multi-Provider Support**: OpenAI, Anthropic Claude, Google Gemini - **Per-Agent API Keys**: Cost optimization through agent-specific key allocation - **Intelligent Fallbacks**: Graceful degradation when providers unavailable - **Configurable Models**: Per-agent model selection for optimal performance/cost #### **Job Aggregation** (`services/job_aggregator.py`, `services/jobspy_client.py`) - **Primary Sources**: Adzuna API (5,000 jobs/month free tier) - **JobSpy Integration**: Indeed, LinkedIn, Glassdoor aggregation - **Additional APIs**: Remotive, The Muse, GitHub Jobs - **Smart Deduplication**: Title + company matching with fuzzy logic - **SSL Bypass**: Automatic retry for corporate environments #### **Document Generation** (`services/`) - **Word Documents** (`word_cv.py`): 5 professional templates, MCP server integration - **PowerPoint CVs** (`powerpoint_cv.py`): 4 visual templates for presentations - **Excel Trackers** (`excel_tracker.py`): 5 analytical sheets with metrics - **PDF Export**: Cross-platform compatibility with formatting preservation ### 📈 Advanced Features #### **Pipeline Architecture** (`agents/pipeline.py`) ``` User Input → Router → Profile Analysis → Job Analysis → Resume Generation → Cover Letter → Review → Memory Storage ↓ ↓ ↓ ↓ ↓ ↓ Event Log Profile Cache Job Cache Document Cache Metrics Log Temporal KG ``` #### **Memory & Persistence** - **File-backed Storage** (`memory/store.py`): Atomic writes, thread-safe operations - **Temporal Knowledge Graph**: Application tracking with time-stamped relationships - **Event Sourcing** (`events.jsonl`): Complete audit trail of all agent actions - **Caching System** (`utils/cache.py`): TTL-based caching with automatic eviction #### **LangExtract Integration** (`services/langextract_service.py`) - **Structured Extraction**: Job requirements, skills, company culture - **ATS Optimization**: Keyword extraction and scoring - **Fallback Mechanisms**: Regex-based extraction when API unavailable - **Result Caching**: Performance optimization for repeated analyses ### 🛡️ Security & Configuration #### **Authentication & Security** - **OAuth 2.0**: LinkedIn integration with CSRF protection - **Input Sanitization**: Path traversal and injection prevention - **Environment Isolation**: Secrets management via `.env` - **Rate Limiting**: API throttling and abuse prevention #### **Configuration Management** - **Environment Variables**: All sensitive data in `.env` - **Agent Configuration** (`utils/config.py`): Centralized settings - **Template System**: Customizable document templates - **Feature Flags**: Progressive enhancement based on available services ### 📁 Project Structure ``` 2096955/ ├── agents/ # Multi-agent system components │ ├── orchestrator.py # Main orchestration logic │ ├── cv_owner.py # Resume generation with guidelines │ ├── guidelines.py # UK rules and ATS optimization │ ├── pipeline.py # Application pipeline flow │ └── ... # Additional specialized agents ├── services/ # External integrations and services │ ├── llm.py # Multi-provider LLM client │ ├── job_aggregator.py # Job source aggregation │ ├── word_cv.py # Word document generation │ └── ... # Document and API services ├── utils/ # Utility functions and helpers │ ├── ats.py # ATS scoring and optimization │ ├── cache.py # TTL caching system │ ├── consistency.py # Contradiction detection │ └── ... # Text processing and helpers ├── models/ # Data models and schemas │ └── schemas.py # Pydantic models for type safety ├── mcp/ # Model Context Protocol servers │ ├── cv_owner_server.py │ ├── cover_letter_server.py │ └── orchestrator_server.py ├── memory/ # Persistent storage │ ├── store.py # File-backed memory store │ └── data/ # Application state and history ├── app.py # Streamlit interface ├── hf_app.py # Gradio/HF interface └── api_llm_integration.py # REST API endpoints ``` ### 🚀 Performance Optimizations - **Parallel Processing**: Async job handling with `asyncio` and `nest_asyncio` - **Lazy Loading**: Dependencies loaded only when needed - **Smart Caching**: Multi-level caching (memory, file, API responses) - **Batch Operations**: Efficient multi-job processing - **Event-Driven**: Asynchronous event handling for responsiveness ### 🧪 Testing & Quality - **Test Suites**: Comprehensive tests in `tests/` directory - **Integration Tests**: API and service integration validation - **Mock Mode**: Development without API keys - **Smoke Tests**: Quick validation scripts for deployment - **Observability**: Built-in tracing and metrics collection --- ## Router pipeline (User → Router → Profile → Job → Resume → Cover → Review) - Implemented in `agents/pipeline.py` and exposed via API in `api_llm_integration.py` (`/api/llm/pipeline_run`). - Agents: - `RouterAgent`: routes based on payload state - `ProfileAgent`: parses CV to structured profile (LLM with fallback) - `JobAgent`: analyzes job posting (LLM with fallback) - `CVOwnerAgent` and `CoverLetterAgent`: draft documents (Gemini, per-agent keys) - Review: contradiction checks and memory persist - Temporal tracking: on review, a `drafted` status is recorded in the temporal KG with issues metadata. **Flow diagram** ```mermaid flowchart TD U["User"] --> R["RouterAgent"] R -->|cv_text present| P["ProfileAgent (LLM)"] R -->|job_posting present| J["JobAgent (LLM)"] P --> RESUME["CVOwnerAgent"] J --> RESUME RESUME --> COVER["CoverLetterAgent"] COVER --> REVIEW["Orchestrator Review"] REVIEW --> M["MemoryStore (file-backed)"] REVIEW --> TKG["Temporal KG (triplets)"] subgraph LLM["LLM Client (Gemini 2.5 Flash, per-agent keys)"] P J RESUME COVER end subgraph UI["Gradio (HF)"] U end subgraph API["Flask API"] PR["/api/llm/pipeline_run"] end U -. optional .-> PR ``` --- ## Hugging Face / Gradio (interactive controls) - In the CV Analysis tab, you can now set: - **Refinement cycles** (1–5) - **Exact target length** (characters) to enforce resume and cover length deterministically - **Layout preset**: `classic`, `modern`, `minimalist`, `executive` - classic: Summary → Skills → Experience → Education (above the fold for Summary/Skills) - modern: Summary → Experience → Skills → Projects/Certifications → Education - minimalist: concise Summary → Skills → Experience → Education - executive: Summary → Selected Achievements (3–5) → Experience → Skills → Education → Certifications --- ## UK resume/cover rules (built-in) - UK English and dates (MMM YYYY) - Current role in present tense; previous roles in past tense - Digits for numbers; £ and % normalization - Remove first‑person pronouns in resume bullets; maintain active voice - Hard skills first (max ~10), then soft skills; verbatim critical JD keywords in bullets - Strip DOB/photo lines; compress older roles (>15 years) to title/company/dates These rules are applied by `agents/cv_owner.py` and validated by checklists. --- ## Checklists and observability - Checklists integrate guidance from: - Reed: CV layout and mistakes - The Muse: action verbs and layout basics - Novorésumé: one‑page bias, clean sections, links - StandOut CV: quantification, bullet density, recent‑role focus - Observability tab aggregates per‑agent events and displays checklist outcomes. Events are stored in `memory/data/events.jsonl`. --- ## Scripts (headless runs) - Capco (Anthony Lui → Capco): ```powershell python .\scripts\run_with_env.py .\scripts\run_anthony_capco.py ``` - Anthropic (Anthony Lui → Anthropic): ```powershell python .\scripts\run_with_env.py .\scripts\run_anthropic_job.py ``` - Pipeline (Router + Agents + Review + Events): ```powershell python .\scripts\run_with_env.py .\scripts\pipeline_anthony_capco.py ``` These scripts print document lengths, agent diagnostics, and whether Gemini is enabled. Set `.env` with `LLM_PROVIDER=gemini`, `LLM_MODEL=gemini-2.5-flash`, and `GEMINI_API_KEY`. --- ## Temporal knowledge graph (micro‑memory) - `agents/temporal_tracker.py` stores time‑stamped triplets with non‑destructive invalidation. - Integrated in pipeline review to track job application states and history. - Utilities for timelines, active applications, and pattern analysis included. --- ## Parallel agents + meta‑agent demo - Notebook: `notebooks/agents_parallel_demo.ipynb` - Runs 4 analysis agents in parallel and combines outputs via a meta‑agent, with a timeline plot. - Uses the central LLM client (`services/llm.py`) with `LLM_PROVIDER=gemini` and `LLM_MODEL=gemini-2.5-flash`. Run (Jupyter/VSCode): ```python %pip install nest_asyncio matplotlib # Ensure GEMINI_API_KEY is set in your environment ``` Open and run the notebook cells. --- ## LinkedIn OAuth (optional) 1) Create a LinkedIn Developer App, then add redirect URLs: ``` http://localhost:8501 http://localhost:8501/callback ``` 2) Products: enable “Sign In with LinkedIn using OpenID Connect”. 3) Update `.env` and set `MOCK_MODE=false`. 4) In the UI, use the “LinkedIn Authentication” section to kick off the flow. Notes: - LinkedIn Jobs API is enterprise‑only. The system uses Adzuna + other sources for job data. --- ## Job sources - **Adzuna**: global coverage, 5,000 free jobs/month - **Resilient aggregator** and optional **JobSpy MCP** for broader search - **Custom jobs**: add your own postings in the UI - Corporate SSL environments: Adzuna calls auto‑retries with `verify=False` fallback --- ## LLMs and configuration - Central client supports OpenAI, Anthropic, and Gemini with per‑agent Gemini keys (`services/llm.py`). - Recommended defaults for this project: - `LLM_PROVIDER=gemini` - `LLM_MODEL=gemini-2.5-flash` - Agents pass `agent="cv|cover|parser|match|tailor|chat"` to use per‑agent keys when provided. --- ## Advanced agents (built‑in) - **Parallel processing**: 3–5× faster multi‑job drafting - **Temporal tracking**: time‑stamped history and pattern analysis - **Observability**: tracing, metrics, timeline visualization - **Context engineering**: flywheel learning, L1/L2/L3 memory, scalable context Toggle these in the HF app under “🚀 Advanced AI Features”. --- ## LangExtract + Gemini - Uses the same `GEMINI_API_KEY` (auto‑applied to `LANGEXTRACT_API_KEY` when empty) - Official `langextract.extract(...)` requires examples; the UI also exposes a robust regex‑based fallback (`services/langextract_service.py`) so features work even when cloud extraction is constrained - In HF app (“🔍 Enhanced Job Analysis”), you can: - Analyze job postings (structured fields + skills) - Optimize resume for ATS (score + missing keywords) - Bulk analyze multiple jobs --- ## Office exports - **Word** (`services/word_cv.py`): resumes + cover letters (5 templates; `python‑docx` fallback) - **PowerPoint** (`services/powerpoint_cv.py`): visual CV (4 templates; `python‑pptx` fallback) - **Excel** (`services/excel_tracker.py`): tracker with 5 analytical sheets (`openpyxl` fallback) - MCP servers supported when available; local libraries are used otherwise In HF app, after generation, expand: - “📊 Export to PowerPoint CV” - “📝 Export to Word Documents” - “📈 Export Excel Tracker” --- ## Hugging Face minimal Space branch - Clean branch containing only `app.py` and `requirements.txt` for Spaces. - Branch name: `hf-space-min` (push from a clean worktree). - `.gitignore` includes `.env` and `.env.*` to avoid leaking secrets. --- ## Tests & scripts - Run test suites in `tests/` - Useful scripts: `test_*` files in project root (integration checks) --- ## Security - OAuth state validation, input/path/url sanitization - Sensitive data via environment variables; avoid committing secrets - Atomic writes in memory store --- ## Run summary - Streamlit: `python -m streamlit run app.py --server.port 8501` - Gradio/HF: `PORT=7861 python hf_app.py` Your system is fully documented here in one place and ready for local or HF deployment.