Noo88ear's picture
Add HuggingFace configuration header to README.md
5d6d60e

A newer version of the Gradio SDK is available: 6.5.1

Upgrade
metadata
title: Multi-Agent Job Application Assistant
emoji: 🚀
colorFrom: purple
colorTo: indigo
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false

Multi‑Agent Job Application Assistant (Streamlit + Gradio/Hugging Face)

A production‑ready system to discover jobs, generate ATS‑optimized resumes and cover letters, and export documents to Word/PowerPoint/Excel. Includes secure LinkedIn OAuth (optional), multi‑source job aggregation, Gemini‑powered generation, and advanced agent capabilities (parallelism, temporal tracking, observability, context engineering).


What you get

  • Two UIs: Streamlit (app.py) and Gradio/HF (hf_app.py)
  • LinkedIn OAuth 2.0 (optional; CSRF‑safe state validation)
  • Job aggregation: Adzuna (5k/month) plus resilient fallbacks
  • ATS‑optimized drafting: resumes + cover letters (Gemini)
  • Office exports:
    • Word resumes and cover letters (5 templates)
    • PowerPoint CV (4 templates)
    • Excel application tracker (5 analytical sheets)
  • Advanced agents: parallel execution, temporal memory, observability/tracing, and context engineering/flywheel
  • LangExtract integration: structured extraction with Gemini key; robust regex fallback in constrained environments
  • New: Router pipeline, Temporal KG integration, Parallel-agents demo, HF minimal Space branch
  • New (Aug 2025): UK resume rules, action-verb upgrades, anti-buzzword scrub, skills proficiency, remote readiness, Muse/Reed/Novorésumé/StandOut CV checklists, and interactive output controls (exact length, cycles, layout presets)

Quickstart

1) Environment (.env)

Create a UTF‑8 .env (values optional if you want mock mode). See .env.example for the full list of variables:

# Behavior
MOCK_MODE=true
PORT=7860

# LLM / Research
LLM_PROVIDER=gemini
LLM_MODEL=gemini-2.5-flash
GEMINI_API_KEY=
# Optional per-agent Gemini keys
GEMINI_API_KEY_CV=
GEMINI_API_KEY_COVER=
GEMINI_API_KEY_CHAT=
GEMINI_API_KEY_PARSER=
GEMINI_API_KEY_MATCH=
GEMINI_API_KEY_TAILOR=
OPENAI_API_KEY=
ANTHROPIC_API_KEY=

TAVILY_API_KEY=

# Job APIs
ADZUNA_APP_ID=
ADZUNA_APP_KEY=

# Office MCP (optional)
POWERPOINT_MCP_URL=http://localhost:3000
WORD_MCP_URL=http://localhost:3001
EXCEL_MCP_URL=http://localhost:3002

# LangExtract uses GEMINI key by default
LANGEXTRACT_API_KEY=

Hardcoded keys have been removed from utility scripts. Use switch_api_key.py to safely set keys into .env without embedding them in code.

2) Install

  • Windows PowerShell
python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt
  • Linux/macOS
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

3) Run the apps

  • Streamlit (PATH‑safe)
python -m streamlit run app.py --server.port 8501
  • Gradio / Hugging Face (avoid port conflicts)
$env:PORT=7861; python hf_app.py
PORT=7861 python hf_app.py

The HF app binds on 0.0.0.0:$PORT.


📊 System Architecture Overview

This is a production-ready, multi-agent job application system with sophisticated AI capabilities and enterprise-grade features:

🏗️ Core Architecture

Dual Interface Design

  • Streamlit Interface (app.py) - Traditional web application for desktop use
  • Gradio/HF Interface (hf_app.py) - Modern, mobile-friendly, deployable to Hugging Face Spaces

Multi-Agent System (15 Specialized Agents)

Core Processing Agents:

  • OrchestratorAgent - Central coordinator managing workflow and job orchestration
  • CVOwnerAgent - ATS-optimized resume generation with UK-specific formatting rules
  • CoverLetterAgent - Personalized cover letter generation with keyword optimization
  • ProfileAgent - Intelligent CV parsing and structured profile extraction
  • JobAgent - Job posting analysis and requirement extraction
  • RouterAgent - Dynamic routing based on payload state and workflow stage

Advanced AI Agents:

  • ParallelExecutor - Concurrent processing for 3-5x faster multi-job handling
  • TemporalTracker - Time-stamped application history and pattern analysis
  • ObservabilityAgent - Real-time tracing, metrics collection, and monitoring
  • ContextEngineer - Flywheel learning and context optimization
  • ContextScaler - L1/L2/L3 memory management for scalable context handling
  • LinkedInManager - OAuth 2.0 integration and profile synchronization
  • MetaAgent - Combines outputs from multiple specialized analysis agents
  • TriageAgent - Intelligent task prioritization and routing

Guidelines Enforcement System (agents/guidelines.py)

Comprehensive rule engine ensuring document quality:

  • UK Compliance: British English, UK date formats (MMM YYYY), £ currency normalization
  • ATS Optimization: Plain text formatting, keyword density, section structure
  • Content Quality: Anti-buzzword filtering, action verb strengthening, first-person removal
  • Layout Rules: Exact length enforcement, heading validation, bullet point formatting

🔌 Integration Ecosystem

LLM Integration (services/llm.py)

  • Multi-Provider Support: OpenAI, Anthropic Claude, Google Gemini
  • Per-Agent API Keys: Cost optimization through agent-specific key allocation
  • Intelligent Fallbacks: Graceful degradation when providers unavailable
  • Configurable Models: Per-agent model selection for optimal performance/cost

Job Aggregation (services/job_aggregator.py, services/jobspy_client.py)

  • Primary Sources: Adzuna API (5,000 jobs/month free tier)
  • JobSpy Integration: Indeed, LinkedIn, Glassdoor aggregation
  • Additional APIs: Remotive, The Muse, GitHub Jobs
  • Smart Deduplication: Title + company matching with fuzzy logic
  • SSL Bypass: Automatic retry for corporate environments

Document Generation (services/)

  • Word Documents (word_cv.py): 5 professional templates, MCP server integration
  • PowerPoint CVs (powerpoint_cv.py): 4 visual templates for presentations
  • Excel Trackers (excel_tracker.py): 5 analytical sheets with metrics
  • PDF Export: Cross-platform compatibility with formatting preservation

📈 Advanced Features

Pipeline Architecture (agents/pipeline.py)

User Input → Router → Profile Analysis → Job Analysis → Resume Generation → Cover Letter → Review → Memory Storage
                ↓           ↓                ↓              ↓                    ↓            ↓
           Event Log   Profile Cache    Job Cache    Document Cache      Metrics Log   Temporal KG

Memory & Persistence

  • File-backed Storage (memory/store.py): Atomic writes, thread-safe operations
  • Temporal Knowledge Graph: Application tracking with time-stamped relationships
  • Event Sourcing (events.jsonl): Complete audit trail of all agent actions
  • Caching System (utils/cache.py): TTL-based caching with automatic eviction

LangExtract Integration (services/langextract_service.py)

  • Structured Extraction: Job requirements, skills, company culture
  • ATS Optimization: Keyword extraction and scoring
  • Fallback Mechanisms: Regex-based extraction when API unavailable
  • Result Caching: Performance optimization for repeated analyses

🛡️ Security & Configuration

Authentication & Security

  • OAuth 2.0: LinkedIn integration with CSRF protection
  • Input Sanitization: Path traversal and injection prevention
  • Environment Isolation: Secrets management via .env
  • Rate Limiting: API throttling and abuse prevention

Configuration Management

  • Environment Variables: All sensitive data in .env
  • Agent Configuration (utils/config.py): Centralized settings
  • Template System: Customizable document templates
  • Feature Flags: Progressive enhancement based on available services

📁 Project Structure

2096955/
├── agents/               # Multi-agent system components
│   ├── orchestrator.py   # Main orchestration logic
│   ├── cv_owner.py       # Resume generation with guidelines
│   ├── guidelines.py     # UK rules and ATS optimization
│   ├── pipeline.py       # Application pipeline flow
│   └── ...              # Additional specialized agents
├── services/            # External integrations and services
│   ├── llm.py           # Multi-provider LLM client
│   ├── job_aggregator.py # Job source aggregation
│   ├── word_cv.py       # Word document generation
│   └── ...              # Document and API services
├── utils/               # Utility functions and helpers
│   ├── ats.py           # ATS scoring and optimization
│   ├── cache.py         # TTL caching system
│   ├── consistency.py   # Contradiction detection
│   └── ...              # Text processing and helpers
├── models/              # Data models and schemas
│   └── schemas.py       # Pydantic models for type safety
├── mcp/                 # Model Context Protocol servers
│   ├── cv_owner_server.py
│   ├── cover_letter_server.py
│   └── orchestrator_server.py
├── memory/              # Persistent storage
│   ├── store.py         # File-backed memory store
│   └── data/            # Application state and history
├── app.py               # Streamlit interface
├── hf_app.py            # Gradio/HF interface
└── api_llm_integration.py # REST API endpoints

🚀 Performance Optimizations

  • Parallel Processing: Async job handling with asyncio and nest_asyncio
  • Lazy Loading: Dependencies loaded only when needed
  • Smart Caching: Multi-level caching (memory, file, API responses)
  • Batch Operations: Efficient multi-job processing
  • Event-Driven: Asynchronous event handling for responsiveness

🧪 Testing & Quality

  • Test Suites: Comprehensive tests in tests/ directory
  • Integration Tests: API and service integration validation
  • Mock Mode: Development without API keys
  • Smoke Tests: Quick validation scripts for deployment
  • Observability: Built-in tracing and metrics collection

Router pipeline (User → Router → Profile → Job → Resume → Cover → Review)

  • Implemented in agents/pipeline.py and exposed via API in api_llm_integration.py (/api/llm/pipeline_run).
  • Agents:
    • RouterAgent: routes based on payload state
    • ProfileAgent: parses CV to structured profile (LLM with fallback)
    • JobAgent: analyzes job posting (LLM with fallback)
    • CVOwnerAgent and CoverLetterAgent: draft documents (Gemini, per-agent keys)
    • Review: contradiction checks and memory persist
  • Temporal tracking: on review, a drafted status is recorded in the temporal KG with issues metadata.

Flow diagram

flowchart TD
  U["User"] --> R["RouterAgent"]
  R -->|cv_text present| P["ProfileAgent (LLM)"]
  R -->|job_posting present| J["JobAgent (LLM)"]
  P --> RESUME["CVOwnerAgent"]
  J --> RESUME
  RESUME --> COVER["CoverLetterAgent"]
  COVER --> REVIEW["Orchestrator Review"]
  REVIEW --> M["MemoryStore (file-backed)"]
  REVIEW --> TKG["Temporal KG (triplets)"]
  subgraph LLM["LLM Client (Gemini 2.5 Flash, per-agent keys)"]
    P
    J
    RESUME
    COVER
  end
  subgraph UI["Gradio (HF)"]
    U
  end
  subgraph API["Flask API"]
    PR["/api/llm/pipeline_run"]
  end
  U -. optional .-> PR

Hugging Face / Gradio (interactive controls)

  • In the CV Analysis tab, you can now set:
    • Refinement cycles (1–5)
    • Exact target length (characters) to enforce resume and cover length deterministically
    • Layout preset: classic, modern, minimalist, executive
      • classic: Summary → Skills → Experience → Education (above the fold for Summary/Skills)
      • modern: Summary → Experience → Skills → Projects/Certifications → Education
      • minimalist: concise Summary → Skills → Experience → Education
      • executive: Summary → Selected Achievements (3–5) → Experience → Skills → Education → Certifications

UK resume/cover rules (built-in)

  • UK English and dates (MMM YYYY)
  • Current role in present tense; previous roles in past tense
  • Digits for numbers; £ and % normalization
  • Remove first‑person pronouns in resume bullets; maintain active voice
  • Hard skills first (max ~10), then soft skills; verbatim critical JD keywords in bullets
  • Strip DOB/photo lines; compress older roles (>15 years) to title/company/dates

These rules are applied by agents/cv_owner.py and validated by checklists.


Checklists and observability

  • Checklists integrate guidance from:
    • Reed: CV layout and mistakes
    • The Muse: action verbs and layout basics
    • Novorésumé: one‑page bias, clean sections, links
    • StandOut CV: quantification, bullet density, recent‑role focus
  • Observability tab aggregates per‑agent events and displays checklist outcomes. Events are stored in memory/data/events.jsonl.

Scripts (headless runs)

  • Capco (Anthony Lui → Capco):
python .\scripts\run_with_env.py .\scripts\run_anthony_capco.py
  • Anthropic (Anthony Lui → Anthropic):
python .\scripts\run_with_env.py .\scripts\run_anthropic_job.py
  • Pipeline (Router + Agents + Review + Events):
python .\scripts\run_with_env.py .\scripts\pipeline_anthony_capco.py

These scripts print document lengths, agent diagnostics, and whether Gemini is enabled. Set .env with LLM_PROVIDER=gemini, LLM_MODEL=gemini-2.5-flash, and GEMINI_API_KEY.


Temporal knowledge graph (micro‑memory)

  • agents/temporal_tracker.py stores time‑stamped triplets with non‑destructive invalidation.
  • Integrated in pipeline review to track job application states and history.
  • Utilities for timelines, active applications, and pattern analysis included.

Parallel agents + meta‑agent demo

  • Notebook: notebooks/agents_parallel_demo.ipynb
  • Runs 4 analysis agents in parallel and combines outputs via a meta‑agent, with a timeline plot.
  • Uses the central LLM client (services/llm.py) with LLM_PROVIDER=gemini and LLM_MODEL=gemini-2.5-flash.

Run (Jupyter/VSCode):

%pip install nest_asyncio matplotlib
# Ensure GEMINI_API_KEY is set in your environment

Open and run the notebook cells.


LinkedIn OAuth (optional)

  1. Create a LinkedIn Developer App, then add redirect URLs:
http://localhost:8501
http://localhost:8501/callback
  1. Products: enable “Sign In with LinkedIn using OpenID Connect”.
  2. Update .env and set MOCK_MODE=false.
  3. In the UI, use the “LinkedIn Authentication” section to kick off the flow.

Notes:

  • LinkedIn Jobs API is enterprise‑only. The system uses Adzuna + other sources for job data.

Job sources

  • Adzuna: global coverage, 5,000 free jobs/month
  • Resilient aggregator and optional JobSpy MCP for broader search
  • Custom jobs: add your own postings in the UI
  • Corporate SSL environments: Adzuna calls auto‑retries with verify=False fallback

LLMs and configuration

  • Central client supports OpenAI, Anthropic, and Gemini with per‑agent Gemini keys (services/llm.py).
  • Recommended defaults for this project:
    • LLM_PROVIDER=gemini
    • LLM_MODEL=gemini-2.5-flash
  • Agents pass agent="cv|cover|parser|match|tailor|chat" to use per‑agent keys when provided.

Advanced agents (built‑in)

  • Parallel processing: 3–5× faster multi‑job drafting
  • Temporal tracking: time‑stamped history and pattern analysis
  • Observability: tracing, metrics, timeline visualization
  • Context engineering: flywheel learning, L1/L2/L3 memory, scalable context

Toggle these in the HF app under “🚀 Advanced AI Features”.


LangExtract + Gemini

  • Uses the same GEMINI_API_KEY (auto‑applied to LANGEXTRACT_API_KEY when empty)
  • Official langextract.extract(...) requires examples; the UI also exposes a robust regex‑based fallback (services/langextract_service.py) so features work even when cloud extraction is constrained
  • In HF app (“🔍 Enhanced Job Analysis”), you can:
    • Analyze job postings (structured fields + skills)
    • Optimize resume for ATS (score + missing keywords)
    • Bulk analyze multiple jobs

Office exports

  • Word (services/word_cv.py): resumes + cover letters (5 templates; python‑docx fallback)
  • PowerPoint (services/powerpoint_cv.py): visual CV (4 templates; python‑pptx fallback)
  • Excel (services/excel_tracker.py): tracker with 5 analytical sheets (openpyxl fallback)
  • MCP servers supported when available; local libraries are used otherwise

In HF app, after generation, expand:

  • “📊 Export to PowerPoint CV”
  • “📝 Export to Word Documents”
  • “📈 Export Excel Tracker”

Hugging Face minimal Space branch

  • Clean branch containing only app.py and requirements.txt for Spaces.
  • Branch name: hf-space-min (push from a clean worktree).
  • .gitignore includes .env and .env.* to avoid leaking secrets.

Tests & scripts

  • Run test suites in tests/
  • Useful scripts: test_* files in project root (integration checks)

Security

  • OAuth state validation, input/path/url sanitization
  • Sensitive data via environment variables; avoid committing secrets
  • Atomic writes in memory store

Run summary

  • Streamlit: python -m streamlit run app.py --server.port 8501
  • Gradio/HF: PORT=7861 python hf_app.py

Your system is fully documented here in one place and ready for local or HF deployment.