Spaces:
Runtime error
Runtime error
| title: Multi-Agent Job Application Assistant | |
| emoji: 🚀 | |
| colorFrom: purple | |
| colorTo: indigo | |
| sdk: gradio | |
| sdk_version: "4.44.0" | |
| app_file: app.py | |
| pinned: false | |
| ## Multi‑Agent Job Application Assistant (Streamlit + Gradio/Hugging Face) | |
| A production‑ready system to discover jobs, generate ATS‑optimized resumes and cover letters, and export documents to Word/PowerPoint/Excel. Includes secure LinkedIn OAuth (optional), multi‑source job aggregation, Gemini‑powered generation, and advanced agent capabilities (parallelism, temporal tracking, observability, context engineering). | |
| --- | |
| ### What you get | |
| - **Two UIs**: Streamlit (`app.py`) and Gradio/HF (`hf_app.py`) | |
| - **LinkedIn OAuth 2.0** (optional; CSRF‑safe state validation) | |
| - **Job aggregation**: Adzuna (5k/month) plus resilient fallbacks | |
| - **ATS‑optimized drafting**: resumes + cover letters (Gemini) | |
| - **Office exports**: | |
| - Word resumes and cover letters (5 templates) | |
| - PowerPoint CV (4 templates) | |
| - Excel application tracker (5 analytical sheets) | |
| - **Advanced agents**: parallel execution, temporal memory, observability/tracing, and context engineering/flywheel | |
| - **LangExtract integration**: structured extraction with Gemini key; robust regex fallback in constrained environments | |
| - **New**: Router pipeline, Temporal KG integration, Parallel-agents demo, HF minimal Space branch | |
| - **New (Aug 2025)**: UK resume rules, action-verb upgrades, anti-buzzword scrub, skills proficiency, remote readiness, Muse/Reed/Novorésumé/StandOut CV checklists, and interactive output controls (exact length, cycles, layout presets) | |
| --- | |
| ## Quickstart | |
| ### 1) Environment (.env) | |
| Create a UTF‑8 `.env` (values optional if you want mock mode). See `.env.example` for the full list of variables: | |
| ```ini | |
| # Behavior | |
| MOCK_MODE=true | |
| PORT=7860 | |
| # LLM / Research | |
| LLM_PROVIDER=gemini | |
| LLM_MODEL=gemini-2.5-flash | |
| GEMINI_API_KEY= | |
| # Optional per-agent Gemini keys | |
| GEMINI_API_KEY_CV= | |
| GEMINI_API_KEY_COVER= | |
| GEMINI_API_KEY_CHAT= | |
| GEMINI_API_KEY_PARSER= | |
| GEMINI_API_KEY_MATCH= | |
| GEMINI_API_KEY_TAILOR= | |
| OPENAI_API_KEY= | |
| ANTHROPIC_API_KEY= | |
| TAVILY_API_KEY= | |
| # Job APIs | |
| ADZUNA_APP_ID= | |
| ADZUNA_APP_KEY= | |
| # Office MCP (optional) | |
| POWERPOINT_MCP_URL=http://localhost:3000 | |
| WORD_MCP_URL=http://localhost:3001 | |
| EXCEL_MCP_URL=http://localhost:3002 | |
| # LangExtract uses GEMINI key by default | |
| LANGEXTRACT_API_KEY= | |
| ``` | |
| Hardcoded keys have been removed from utility scripts. Use `switch_api_key.py` to safely set keys into `.env` without embedding them in code. | |
| ### 2) Install | |
| - Windows PowerShell | |
| ```powershell | |
| python -m venv .venv | |
| .\.venv\Scripts\Activate.ps1 | |
| pip install -r requirements.txt | |
| ``` | |
| - Linux/macOS | |
| ```bash | |
| python3 -m venv .venv | |
| source .venv/bin/activate | |
| pip install -r requirements.txt | |
| ``` | |
| ### 3) Run the apps | |
| - Streamlit (PATH‑safe) | |
| ```powershell | |
| python -m streamlit run app.py --server.port 8501 | |
| ``` | |
| - Gradio / Hugging Face (avoid port conflicts) | |
| ```powershell | |
| $env:PORT=7861; python hf_app.py | |
| ``` | |
| ```bash | |
| PORT=7861 python hf_app.py | |
| ``` | |
| The HF app binds on 0.0.0.0:$PORT. | |
| --- | |
| ## 📊 System Architecture Overview | |
| This is a **production-ready, multi-agent job application system** with sophisticated AI capabilities and enterprise-grade features: | |
| ### 🏗️ Core Architecture | |
| #### **Dual Interface Design** | |
| - **Streamlit Interface** (`app.py`) - Traditional web application for desktop use | |
| - **Gradio/HF Interface** (`hf_app.py`) - Modern, mobile-friendly, deployable to Hugging Face Spaces | |
| #### **Multi-Agent System** (15 Specialized Agents) | |
| **Core Processing Agents:** | |
| - **`OrchestratorAgent`** - Central coordinator managing workflow and job orchestration | |
| - **`CVOwnerAgent`** - ATS-optimized resume generation with UK-specific formatting rules | |
| - **`CoverLetterAgent`** - Personalized cover letter generation with keyword optimization | |
| - **`ProfileAgent`** - Intelligent CV parsing and structured profile extraction | |
| - **`JobAgent`** - Job posting analysis and requirement extraction | |
| - **`RouterAgent`** - Dynamic routing based on payload state and workflow stage | |
| **Advanced AI Agents:** | |
| - **`ParallelExecutor`** - Concurrent processing for 3-5x faster multi-job handling | |
| - **`TemporalTracker`** - Time-stamped application history and pattern analysis | |
| - **`ObservabilityAgent`** - Real-time tracing, metrics collection, and monitoring | |
| - **`ContextEngineer`** - Flywheel learning and context optimization | |
| - **`ContextScaler`** - L1/L2/L3 memory management for scalable context handling | |
| - **`LinkedInManager`** - OAuth 2.0 integration and profile synchronization | |
| - **`MetaAgent`** - Combines outputs from multiple specialized analysis agents | |
| - **`TriageAgent`** - Intelligent task prioritization and routing | |
| #### **Guidelines Enforcement System** (`agents/guidelines.py`) | |
| Comprehensive rule engine ensuring document quality: | |
| - **UK Compliance**: British English, UK date formats (MMM YYYY), £ currency normalization | |
| - **ATS Optimization**: Plain text formatting, keyword density, section structure | |
| - **Content Quality**: Anti-buzzword filtering, action verb strengthening, first-person removal | |
| - **Layout Rules**: Exact length enforcement, heading validation, bullet point formatting | |
| ### 🔌 Integration Ecosystem | |
| #### **LLM Integration** (`services/llm.py`) | |
| - **Multi-Provider Support**: OpenAI, Anthropic Claude, Google Gemini | |
| - **Per-Agent API Keys**: Cost optimization through agent-specific key allocation | |
| - **Intelligent Fallbacks**: Graceful degradation when providers unavailable | |
| - **Configurable Models**: Per-agent model selection for optimal performance/cost | |
| #### **Job Aggregation** (`services/job_aggregator.py`, `services/jobspy_client.py`) | |
| - **Primary Sources**: Adzuna API (5,000 jobs/month free tier) | |
| - **JobSpy Integration**: Indeed, LinkedIn, Glassdoor aggregation | |
| - **Additional APIs**: Remotive, The Muse, GitHub Jobs | |
| - **Smart Deduplication**: Title + company matching with fuzzy logic | |
| - **SSL Bypass**: Automatic retry for corporate environments | |
| #### **Document Generation** (`services/`) | |
| - **Word Documents** (`word_cv.py`): 5 professional templates, MCP server integration | |
| - **PowerPoint CVs** (`powerpoint_cv.py`): 4 visual templates for presentations | |
| - **Excel Trackers** (`excel_tracker.py`): 5 analytical sheets with metrics | |
| - **PDF Export**: Cross-platform compatibility with formatting preservation | |
| ### 📈 Advanced Features | |
| #### **Pipeline Architecture** (`agents/pipeline.py`) | |
| ``` | |
| User Input → Router → Profile Analysis → Job Analysis → Resume Generation → Cover Letter → Review → Memory Storage | |
| ↓ ↓ ↓ ↓ ↓ ↓ | |
| Event Log Profile Cache Job Cache Document Cache Metrics Log Temporal KG | |
| ``` | |
| #### **Memory & Persistence** | |
| - **File-backed Storage** (`memory/store.py`): Atomic writes, thread-safe operations | |
| - **Temporal Knowledge Graph**: Application tracking with time-stamped relationships | |
| - **Event Sourcing** (`events.jsonl`): Complete audit trail of all agent actions | |
| - **Caching System** (`utils/cache.py`): TTL-based caching with automatic eviction | |
| #### **LangExtract Integration** (`services/langextract_service.py`) | |
| - **Structured Extraction**: Job requirements, skills, company culture | |
| - **ATS Optimization**: Keyword extraction and scoring | |
| - **Fallback Mechanisms**: Regex-based extraction when API unavailable | |
| - **Result Caching**: Performance optimization for repeated analyses | |
| ### 🛡️ Security & Configuration | |
| #### **Authentication & Security** | |
| - **OAuth 2.0**: LinkedIn integration with CSRF protection | |
| - **Input Sanitization**: Path traversal and injection prevention | |
| - **Environment Isolation**: Secrets management via `.env` | |
| - **Rate Limiting**: API throttling and abuse prevention | |
| #### **Configuration Management** | |
| - **Environment Variables**: All sensitive data in `.env` | |
| - **Agent Configuration** (`utils/config.py`): Centralized settings | |
| - **Template System**: Customizable document templates | |
| - **Feature Flags**: Progressive enhancement based on available services | |
| ### 📁 Project Structure | |
| ``` | |
| 2096955/ | |
| ├── agents/ # Multi-agent system components | |
| │ ├── orchestrator.py # Main orchestration logic | |
| │ ├── cv_owner.py # Resume generation with guidelines | |
| │ ├── guidelines.py # UK rules and ATS optimization | |
| │ ├── pipeline.py # Application pipeline flow | |
| │ └── ... # Additional specialized agents | |
| ├── services/ # External integrations and services | |
| │ ├── llm.py # Multi-provider LLM client | |
| │ ├── job_aggregator.py # Job source aggregation | |
| │ ├── word_cv.py # Word document generation | |
| │ └── ... # Document and API services | |
| ├── utils/ # Utility functions and helpers | |
| │ ├── ats.py # ATS scoring and optimization | |
| │ ├── cache.py # TTL caching system | |
| │ ├── consistency.py # Contradiction detection | |
| │ └── ... # Text processing and helpers | |
| ├── models/ # Data models and schemas | |
| │ └── schemas.py # Pydantic models for type safety | |
| ├── mcp/ # Model Context Protocol servers | |
| │ ├── cv_owner_server.py | |
| │ ├── cover_letter_server.py | |
| │ └── orchestrator_server.py | |
| ├── memory/ # Persistent storage | |
| │ ├── store.py # File-backed memory store | |
| │ └── data/ # Application state and history | |
| ├── app.py # Streamlit interface | |
| ├── hf_app.py # Gradio/HF interface | |
| └── api_llm_integration.py # REST API endpoints | |
| ``` | |
| ### 🚀 Performance Optimizations | |
| - **Parallel Processing**: Async job handling with `asyncio` and `nest_asyncio` | |
| - **Lazy Loading**: Dependencies loaded only when needed | |
| - **Smart Caching**: Multi-level caching (memory, file, API responses) | |
| - **Batch Operations**: Efficient multi-job processing | |
| - **Event-Driven**: Asynchronous event handling for responsiveness | |
| ### 🧪 Testing & Quality | |
| - **Test Suites**: Comprehensive tests in `tests/` directory | |
| - **Integration Tests**: API and service integration validation | |
| - **Mock Mode**: Development without API keys | |
| - **Smoke Tests**: Quick validation scripts for deployment | |
| - **Observability**: Built-in tracing and metrics collection | |
| --- | |
| ## Router pipeline (User → Router → Profile → Job → Resume → Cover → Review) | |
| - Implemented in `agents/pipeline.py` and exposed via API in `api_llm_integration.py` (`/api/llm/pipeline_run`). | |
| - Agents: | |
| - `RouterAgent`: routes based on payload state | |
| - `ProfileAgent`: parses CV to structured profile (LLM with fallback) | |
| - `JobAgent`: analyzes job posting (LLM with fallback) | |
| - `CVOwnerAgent` and `CoverLetterAgent`: draft documents (Gemini, per-agent keys) | |
| - Review: contradiction checks and memory persist | |
| - Temporal tracking: on review, a `drafted` status is recorded in the temporal KG with issues metadata. | |
| **Flow diagram** | |
| ```mermaid | |
| flowchart TD | |
| U["User"] --> R["RouterAgent"] | |
| R -->|cv_text present| P["ProfileAgent (LLM)"] | |
| R -->|job_posting present| J["JobAgent (LLM)"] | |
| P --> RESUME["CVOwnerAgent"] | |
| J --> RESUME | |
| RESUME --> COVER["CoverLetterAgent"] | |
| COVER --> REVIEW["Orchestrator Review"] | |
| REVIEW --> M["MemoryStore (file-backed)"] | |
| REVIEW --> TKG["Temporal KG (triplets)"] | |
| subgraph LLM["LLM Client (Gemini 2.5 Flash, per-agent keys)"] | |
| P | |
| J | |
| RESUME | |
| COVER | |
| end | |
| subgraph UI["Gradio (HF)"] | |
| U | |
| end | |
| subgraph API["Flask API"] | |
| PR["/api/llm/pipeline_run"] | |
| end | |
| U -. optional .-> PR | |
| ``` | |
| --- | |
| ## Hugging Face / Gradio (interactive controls) | |
| - In the CV Analysis tab, you can now set: | |
| - **Refinement cycles** (1–5) | |
| - **Exact target length** (characters) to enforce resume and cover length deterministically | |
| - **Layout preset**: `classic`, `modern`, `minimalist`, `executive` | |
| - classic: Summary → Skills → Experience → Education (above the fold for Summary/Skills) | |
| - modern: Summary → Experience → Skills → Projects/Certifications → Education | |
| - minimalist: concise Summary → Skills → Experience → Education | |
| - executive: Summary → Selected Achievements (3–5) → Experience → Skills → Education → Certifications | |
| --- | |
| ## UK resume/cover rules (built-in) | |
| - UK English and dates (MMM YYYY) | |
| - Current role in present tense; previous roles in past tense | |
| - Digits for numbers; £ and % normalization | |
| - Remove first‑person pronouns in resume bullets; maintain active voice | |
| - Hard skills first (max ~10), then soft skills; verbatim critical JD keywords in bullets | |
| - Strip DOB/photo lines; compress older roles (>15 years) to title/company/dates | |
| These rules are applied by `agents/cv_owner.py` and validated by checklists. | |
| --- | |
| ## Checklists and observability | |
| - Checklists integrate guidance from: | |
| - Reed: CV layout and mistakes | |
| - The Muse: action verbs and layout basics | |
| - Novorésumé: one‑page bias, clean sections, links | |
| - StandOut CV: quantification, bullet density, recent‑role focus | |
| - Observability tab aggregates per‑agent events and displays checklist outcomes. Events are stored in `memory/data/events.jsonl`. | |
| --- | |
| ## Scripts (headless runs) | |
| - Capco (Anthony Lui → Capco): | |
| ```powershell | |
| python .\scripts\run_with_env.py .\scripts\run_anthony_capco.py | |
| ``` | |
| - Anthropic (Anthony Lui → Anthropic): | |
| ```powershell | |
| python .\scripts\run_with_env.py .\scripts\run_anthropic_job.py | |
| ``` | |
| - Pipeline (Router + Agents + Review + Events): | |
| ```powershell | |
| python .\scripts\run_with_env.py .\scripts\pipeline_anthony_capco.py | |
| ``` | |
| These scripts print document lengths, agent diagnostics, and whether Gemini is enabled. Set `.env` with `LLM_PROVIDER=gemini`, `LLM_MODEL=gemini-2.5-flash`, and `GEMINI_API_KEY`. | |
| --- | |
| ## Temporal knowledge graph (micro‑memory) | |
| - `agents/temporal_tracker.py` stores time‑stamped triplets with non‑destructive invalidation. | |
| - Integrated in pipeline review to track job application states and history. | |
| - Utilities for timelines, active applications, and pattern analysis included. | |
| --- | |
| ## Parallel agents + meta‑agent demo | |
| - Notebook: `notebooks/agents_parallel_demo.ipynb` | |
| - Runs 4 analysis agents in parallel and combines outputs via a meta‑agent, with a timeline plot. | |
| - Uses the central LLM client (`services/llm.py`) with `LLM_PROVIDER=gemini` and `LLM_MODEL=gemini-2.5-flash`. | |
| Run (Jupyter/VSCode): | |
| ```python | |
| %pip install nest_asyncio matplotlib | |
| # Ensure GEMINI_API_KEY is set in your environment | |
| ``` | |
| Open and run the notebook cells. | |
| --- | |
| ## LinkedIn OAuth (optional) | |
| 1) Create a LinkedIn Developer App, then add redirect URLs: | |
| ``` | |
| http://localhost:8501 | |
| http://localhost:8501/callback | |
| ``` | |
| 2) Products: enable “Sign In with LinkedIn using OpenID Connect”. | |
| 3) Update `.env` and set `MOCK_MODE=false`. | |
| 4) In the UI, use the “LinkedIn Authentication” section to kick off the flow. | |
| Notes: | |
| - LinkedIn Jobs API is enterprise‑only. The system uses Adzuna + other sources for job data. | |
| --- | |
| ## Job sources | |
| - **Adzuna**: global coverage, 5,000 free jobs/month | |
| - **Resilient aggregator** and optional **JobSpy MCP** for broader search | |
| - **Custom jobs**: add your own postings in the UI | |
| - Corporate SSL environments: Adzuna calls auto‑retries with `verify=False` fallback | |
| --- | |
| ## LLMs and configuration | |
| - Central client supports OpenAI, Anthropic, and Gemini with per‑agent Gemini keys (`services/llm.py`). | |
| - Recommended defaults for this project: | |
| - `LLM_PROVIDER=gemini` | |
| - `LLM_MODEL=gemini-2.5-flash` | |
| - Agents pass `agent="cv|cover|parser|match|tailor|chat"` to use per‑agent keys when provided. | |
| --- | |
| ## Advanced agents (built‑in) | |
| - **Parallel processing**: 3–5× faster multi‑job drafting | |
| - **Temporal tracking**: time‑stamped history and pattern analysis | |
| - **Observability**: tracing, metrics, timeline visualization | |
| - **Context engineering**: flywheel learning, L1/L2/L3 memory, scalable context | |
| Toggle these in the HF app under “🚀 Advanced AI Features”. | |
| --- | |
| ## LangExtract + Gemini | |
| - Uses the same `GEMINI_API_KEY` (auto‑applied to `LANGEXTRACT_API_KEY` when empty) | |
| - Official `langextract.extract(...)` requires examples; the UI also exposes a robust regex‑based fallback (`services/langextract_service.py`) so features work even when cloud extraction is constrained | |
| - In HF app (“🔍 Enhanced Job Analysis”), you can: | |
| - Analyze job postings (structured fields + skills) | |
| - Optimize resume for ATS (score + missing keywords) | |
| - Bulk analyze multiple jobs | |
| --- | |
| ## Office exports | |
| - **Word** (`services/word_cv.py`): resumes + cover letters (5 templates; `python‑docx` fallback) | |
| - **PowerPoint** (`services/powerpoint_cv.py`): visual CV (4 templates; `python‑pptx` fallback) | |
| - **Excel** (`services/excel_tracker.py`): tracker with 5 analytical sheets (`openpyxl` fallback) | |
| - MCP servers supported when available; local libraries are used otherwise | |
| In HF app, after generation, expand: | |
| - “📊 Export to PowerPoint CV” | |
| - “📝 Export to Word Documents” | |
| - “📈 Export Excel Tracker” | |
| --- | |
| ## Hugging Face minimal Space branch | |
| - Clean branch containing only `app.py` and `requirements.txt` for Spaces. | |
| - Branch name: `hf-space-min` (push from a clean worktree). | |
| - `.gitignore` includes `.env` and `.env.*` to avoid leaking secrets. | |
| --- | |
| ## Tests & scripts | |
| - Run test suites in `tests/` | |
| - Useful scripts: `test_*` files in project root (integration checks) | |
| --- | |
| ## Security | |
| - OAuth state validation, input/path/url sanitization | |
| - Sensitive data via environment variables; avoid committing secrets | |
| - Atomic writes in memory store | |
| --- | |
| ## Run summary | |
| - Streamlit: `python -m streamlit run app.py --server.port 8501` | |
| - Gradio/HF: `PORT=7861 python hf_app.py` | |
| Your system is fully documented here in one place and ready for local or HF deployment. | |