Spaces:
Runtime error
A newer version of the Gradio SDK is available:
6.5.1
title: Multi-Agent Job Application Assistant
emoji: 🚀
colorFrom: purple
colorTo: indigo
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
Multi‑Agent Job Application Assistant (Streamlit + Gradio/Hugging Face)
A production‑ready system to discover jobs, generate ATS‑optimized resumes and cover letters, and export documents to Word/PowerPoint/Excel. Includes secure LinkedIn OAuth (optional), multi‑source job aggregation, Gemini‑powered generation, and advanced agent capabilities (parallelism, temporal tracking, observability, context engineering).
What you get
- Two UIs: Streamlit (
app.py) and Gradio/HF (hf_app.py) - LinkedIn OAuth 2.0 (optional; CSRF‑safe state validation)
- Job aggregation: Adzuna (5k/month) plus resilient fallbacks
- ATS‑optimized drafting: resumes + cover letters (Gemini)
- Office exports:
- Word resumes and cover letters (5 templates)
- PowerPoint CV (4 templates)
- Excel application tracker (5 analytical sheets)
- Advanced agents: parallel execution, temporal memory, observability/tracing, and context engineering/flywheel
- LangExtract integration: structured extraction with Gemini key; robust regex fallback in constrained environments
- New: Router pipeline, Temporal KG integration, Parallel-agents demo, HF minimal Space branch
- New (Aug 2025): UK resume rules, action-verb upgrades, anti-buzzword scrub, skills proficiency, remote readiness, Muse/Reed/Novorésumé/StandOut CV checklists, and interactive output controls (exact length, cycles, layout presets)
Quickstart
1) Environment (.env)
Create a UTF‑8 .env (values optional if you want mock mode). See .env.example for the full list of variables:
# Behavior
MOCK_MODE=true
PORT=7860
# LLM / Research
LLM_PROVIDER=gemini
LLM_MODEL=gemini-2.5-flash
GEMINI_API_KEY=
# Optional per-agent Gemini keys
GEMINI_API_KEY_CV=
GEMINI_API_KEY_COVER=
GEMINI_API_KEY_CHAT=
GEMINI_API_KEY_PARSER=
GEMINI_API_KEY_MATCH=
GEMINI_API_KEY_TAILOR=
OPENAI_API_KEY=
ANTHROPIC_API_KEY=
TAVILY_API_KEY=
# Job APIs
ADZUNA_APP_ID=
ADZUNA_APP_KEY=
# Office MCP (optional)
POWERPOINT_MCP_URL=http://localhost:3000
WORD_MCP_URL=http://localhost:3001
EXCEL_MCP_URL=http://localhost:3002
# LangExtract uses GEMINI key by default
LANGEXTRACT_API_KEY=
Hardcoded keys have been removed from utility scripts. Use switch_api_key.py to safely set keys into .env without embedding them in code.
2) Install
- Windows PowerShell
python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt
- Linux/macOS
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
3) Run the apps
- Streamlit (PATH‑safe)
python -m streamlit run app.py --server.port 8501
- Gradio / Hugging Face (avoid port conflicts)
$env:PORT=7861; python hf_app.py
PORT=7861 python hf_app.py
The HF app binds on 0.0.0.0:$PORT.
📊 System Architecture Overview
This is a production-ready, multi-agent job application system with sophisticated AI capabilities and enterprise-grade features:
🏗️ Core Architecture
Dual Interface Design
- Streamlit Interface (
app.py) - Traditional web application for desktop use - Gradio/HF Interface (
hf_app.py) - Modern, mobile-friendly, deployable to Hugging Face Spaces
Multi-Agent System (15 Specialized Agents)
Core Processing Agents:
OrchestratorAgent- Central coordinator managing workflow and job orchestrationCVOwnerAgent- ATS-optimized resume generation with UK-specific formatting rulesCoverLetterAgent- Personalized cover letter generation with keyword optimizationProfileAgent- Intelligent CV parsing and structured profile extractionJobAgent- Job posting analysis and requirement extractionRouterAgent- Dynamic routing based on payload state and workflow stage
Advanced AI Agents:
ParallelExecutor- Concurrent processing for 3-5x faster multi-job handlingTemporalTracker- Time-stamped application history and pattern analysisObservabilityAgent- Real-time tracing, metrics collection, and monitoringContextEngineer- Flywheel learning and context optimizationContextScaler- L1/L2/L3 memory management for scalable context handlingLinkedInManager- OAuth 2.0 integration and profile synchronizationMetaAgent- Combines outputs from multiple specialized analysis agentsTriageAgent- Intelligent task prioritization and routing
Guidelines Enforcement System (agents/guidelines.py)
Comprehensive rule engine ensuring document quality:
- UK Compliance: British English, UK date formats (MMM YYYY), £ currency normalization
- ATS Optimization: Plain text formatting, keyword density, section structure
- Content Quality: Anti-buzzword filtering, action verb strengthening, first-person removal
- Layout Rules: Exact length enforcement, heading validation, bullet point formatting
🔌 Integration Ecosystem
LLM Integration (services/llm.py)
- Multi-Provider Support: OpenAI, Anthropic Claude, Google Gemini
- Per-Agent API Keys: Cost optimization through agent-specific key allocation
- Intelligent Fallbacks: Graceful degradation when providers unavailable
- Configurable Models: Per-agent model selection for optimal performance/cost
Job Aggregation (services/job_aggregator.py, services/jobspy_client.py)
- Primary Sources: Adzuna API (5,000 jobs/month free tier)
- JobSpy Integration: Indeed, LinkedIn, Glassdoor aggregation
- Additional APIs: Remotive, The Muse, GitHub Jobs
- Smart Deduplication: Title + company matching with fuzzy logic
- SSL Bypass: Automatic retry for corporate environments
Document Generation (services/)
- Word Documents (
word_cv.py): 5 professional templates, MCP server integration - PowerPoint CVs (
powerpoint_cv.py): 4 visual templates for presentations - Excel Trackers (
excel_tracker.py): 5 analytical sheets with metrics - PDF Export: Cross-platform compatibility with formatting preservation
📈 Advanced Features
Pipeline Architecture (agents/pipeline.py)
User Input → Router → Profile Analysis → Job Analysis → Resume Generation → Cover Letter → Review → Memory Storage
↓ ↓ ↓ ↓ ↓ ↓
Event Log Profile Cache Job Cache Document Cache Metrics Log Temporal KG
Memory & Persistence
- File-backed Storage (
memory/store.py): Atomic writes, thread-safe operations - Temporal Knowledge Graph: Application tracking with time-stamped relationships
- Event Sourcing (
events.jsonl): Complete audit trail of all agent actions - Caching System (
utils/cache.py): TTL-based caching with automatic eviction
LangExtract Integration (services/langextract_service.py)
- Structured Extraction: Job requirements, skills, company culture
- ATS Optimization: Keyword extraction and scoring
- Fallback Mechanisms: Regex-based extraction when API unavailable
- Result Caching: Performance optimization for repeated analyses
🛡️ Security & Configuration
Authentication & Security
- OAuth 2.0: LinkedIn integration with CSRF protection
- Input Sanitization: Path traversal and injection prevention
- Environment Isolation: Secrets management via
.env - Rate Limiting: API throttling and abuse prevention
Configuration Management
- Environment Variables: All sensitive data in
.env - Agent Configuration (
utils/config.py): Centralized settings - Template System: Customizable document templates
- Feature Flags: Progressive enhancement based on available services
📁 Project Structure
2096955/
├── agents/ # Multi-agent system components
│ ├── orchestrator.py # Main orchestration logic
│ ├── cv_owner.py # Resume generation with guidelines
│ ├── guidelines.py # UK rules and ATS optimization
│ ├── pipeline.py # Application pipeline flow
│ └── ... # Additional specialized agents
├── services/ # External integrations and services
│ ├── llm.py # Multi-provider LLM client
│ ├── job_aggregator.py # Job source aggregation
│ ├── word_cv.py # Word document generation
│ └── ... # Document and API services
├── utils/ # Utility functions and helpers
│ ├── ats.py # ATS scoring and optimization
│ ├── cache.py # TTL caching system
│ ├── consistency.py # Contradiction detection
│ └── ... # Text processing and helpers
├── models/ # Data models and schemas
│ └── schemas.py # Pydantic models for type safety
├── mcp/ # Model Context Protocol servers
│ ├── cv_owner_server.py
│ ├── cover_letter_server.py
│ └── orchestrator_server.py
├── memory/ # Persistent storage
│ ├── store.py # File-backed memory store
│ └── data/ # Application state and history
├── app.py # Streamlit interface
├── hf_app.py # Gradio/HF interface
└── api_llm_integration.py # REST API endpoints
🚀 Performance Optimizations
- Parallel Processing: Async job handling with
asyncioandnest_asyncio - Lazy Loading: Dependencies loaded only when needed
- Smart Caching: Multi-level caching (memory, file, API responses)
- Batch Operations: Efficient multi-job processing
- Event-Driven: Asynchronous event handling for responsiveness
🧪 Testing & Quality
- Test Suites: Comprehensive tests in
tests/directory - Integration Tests: API and service integration validation
- Mock Mode: Development without API keys
- Smoke Tests: Quick validation scripts for deployment
- Observability: Built-in tracing and metrics collection
Router pipeline (User → Router → Profile → Job → Resume → Cover → Review)
- Implemented in
agents/pipeline.pyand exposed via API inapi_llm_integration.py(/api/llm/pipeline_run). - Agents:
RouterAgent: routes based on payload stateProfileAgent: parses CV to structured profile (LLM with fallback)JobAgent: analyzes job posting (LLM with fallback)CVOwnerAgentandCoverLetterAgent: draft documents (Gemini, per-agent keys)- Review: contradiction checks and memory persist
- Temporal tracking: on review, a
draftedstatus is recorded in the temporal KG with issues metadata.
Flow diagram
flowchart TD
U["User"] --> R["RouterAgent"]
R -->|cv_text present| P["ProfileAgent (LLM)"]
R -->|job_posting present| J["JobAgent (LLM)"]
P --> RESUME["CVOwnerAgent"]
J --> RESUME
RESUME --> COVER["CoverLetterAgent"]
COVER --> REVIEW["Orchestrator Review"]
REVIEW --> M["MemoryStore (file-backed)"]
REVIEW --> TKG["Temporal KG (triplets)"]
subgraph LLM["LLM Client (Gemini 2.5 Flash, per-agent keys)"]
P
J
RESUME
COVER
end
subgraph UI["Gradio (HF)"]
U
end
subgraph API["Flask API"]
PR["/api/llm/pipeline_run"]
end
U -. optional .-> PR
Hugging Face / Gradio (interactive controls)
- In the CV Analysis tab, you can now set:
- Refinement cycles (1–5)
- Exact target length (characters) to enforce resume and cover length deterministically
- Layout preset:
classic,modern,minimalist,executive- classic: Summary → Skills → Experience → Education (above the fold for Summary/Skills)
- modern: Summary → Experience → Skills → Projects/Certifications → Education
- minimalist: concise Summary → Skills → Experience → Education
- executive: Summary → Selected Achievements (3–5) → Experience → Skills → Education → Certifications
UK resume/cover rules (built-in)
- UK English and dates (MMM YYYY)
- Current role in present tense; previous roles in past tense
- Digits for numbers; £ and % normalization
- Remove first‑person pronouns in resume bullets; maintain active voice
- Hard skills first (max ~10), then soft skills; verbatim critical JD keywords in bullets
- Strip DOB/photo lines; compress older roles (>15 years) to title/company/dates
These rules are applied by agents/cv_owner.py and validated by checklists.
Checklists and observability
- Checklists integrate guidance from:
- Reed: CV layout and mistakes
- The Muse: action verbs and layout basics
- Novorésumé: one‑page bias, clean sections, links
- StandOut CV: quantification, bullet density, recent‑role focus
- Observability tab aggregates per‑agent events and displays checklist outcomes. Events are stored in
memory/data/events.jsonl.
Scripts (headless runs)
- Capco (Anthony Lui → Capco):
python .\scripts\run_with_env.py .\scripts\run_anthony_capco.py
- Anthropic (Anthony Lui → Anthropic):
python .\scripts\run_with_env.py .\scripts\run_anthropic_job.py
- Pipeline (Router + Agents + Review + Events):
python .\scripts\run_with_env.py .\scripts\pipeline_anthony_capco.py
These scripts print document lengths, agent diagnostics, and whether Gemini is enabled. Set .env with LLM_PROVIDER=gemini, LLM_MODEL=gemini-2.5-flash, and GEMINI_API_KEY.
Temporal knowledge graph (micro‑memory)
agents/temporal_tracker.pystores time‑stamped triplets with non‑destructive invalidation.- Integrated in pipeline review to track job application states and history.
- Utilities for timelines, active applications, and pattern analysis included.
Parallel agents + meta‑agent demo
- Notebook:
notebooks/agents_parallel_demo.ipynb - Runs 4 analysis agents in parallel and combines outputs via a meta‑agent, with a timeline plot.
- Uses the central LLM client (
services/llm.py) withLLM_PROVIDER=geminiandLLM_MODEL=gemini-2.5-flash.
Run (Jupyter/VSCode):
%pip install nest_asyncio matplotlib
# Ensure GEMINI_API_KEY is set in your environment
Open and run the notebook cells.
LinkedIn OAuth (optional)
- Create a LinkedIn Developer App, then add redirect URLs:
http://localhost:8501
http://localhost:8501/callback
- Products: enable “Sign In with LinkedIn using OpenID Connect”.
- Update
.envand setMOCK_MODE=false. - In the UI, use the “LinkedIn Authentication” section to kick off the flow.
Notes:
- LinkedIn Jobs API is enterprise‑only. The system uses Adzuna + other sources for job data.
Job sources
- Adzuna: global coverage, 5,000 free jobs/month
- Resilient aggregator and optional JobSpy MCP for broader search
- Custom jobs: add your own postings in the UI
- Corporate SSL environments: Adzuna calls auto‑retries with
verify=Falsefallback
LLMs and configuration
- Central client supports OpenAI, Anthropic, and Gemini with per‑agent Gemini keys (
services/llm.py). - Recommended defaults for this project:
LLM_PROVIDER=geminiLLM_MODEL=gemini-2.5-flash
- Agents pass
agent="cv|cover|parser|match|tailor|chat"to use per‑agent keys when provided.
Advanced agents (built‑in)
- Parallel processing: 3–5× faster multi‑job drafting
- Temporal tracking: time‑stamped history and pattern analysis
- Observability: tracing, metrics, timeline visualization
- Context engineering: flywheel learning, L1/L2/L3 memory, scalable context
Toggle these in the HF app under “🚀 Advanced AI Features”.
LangExtract + Gemini
- Uses the same
GEMINI_API_KEY(auto‑applied toLANGEXTRACT_API_KEYwhen empty) - Official
langextract.extract(...)requires examples; the UI also exposes a robust regex‑based fallback (services/langextract_service.py) so features work even when cloud extraction is constrained - In HF app (“🔍 Enhanced Job Analysis”), you can:
- Analyze job postings (structured fields + skills)
- Optimize resume for ATS (score + missing keywords)
- Bulk analyze multiple jobs
Office exports
- Word (
services/word_cv.py): resumes + cover letters (5 templates;python‑docxfallback) - PowerPoint (
services/powerpoint_cv.py): visual CV (4 templates;python‑pptxfallback) - Excel (
services/excel_tracker.py): tracker with 5 analytical sheets (openpyxlfallback) - MCP servers supported when available; local libraries are used otherwise
In HF app, after generation, expand:
- “📊 Export to PowerPoint CV”
- “📝 Export to Word Documents”
- “📈 Export Excel Tracker”
Hugging Face minimal Space branch
- Clean branch containing only
app.pyandrequirements.txtfor Spaces. - Branch name:
hf-space-min(push from a clean worktree). .gitignoreincludes.envand.env.*to avoid leaking secrets.
Tests & scripts
- Run test suites in
tests/ - Useful scripts:
test_*files in project root (integration checks)
Security
- OAuth state validation, input/path/url sanitization
- Sensitive data via environment variables; avoid committing secrets
- Atomic writes in memory store
Run summary
- Streamlit:
python -m streamlit run app.py --server.port 8501 - Gradio/HF:
PORT=7861 python hf_app.py
Your system is fully documented here in one place and ready for local or HF deployment.