Spaces:

Noo88ear
/

Job-Application-Assistant

Runtime error

App Files Files Community

Job-Application-Assistant / README.md

Noo88ear

Add HuggingFace configuration header to README.md

5d6d60e 6 months ago

preview code

raw

history blame contribute delete

18.3 kB

	---
	title: Multi-Agent Job Application Assistant
	emoji: 🚀
	colorFrom: purple
	colorTo: indigo
	sdk: gradio
	sdk_version: "4.44.0"
	app_file: app.py
	pinned: false
	---

	## Multi‑Agent Job Application Assistant (Streamlit + Gradio/Hugging Face)

	A production‑ready system to discover jobs, generate ATS‑optimized resumes and cover letters, and export documents to Word/PowerPoint/Excel. Includes secure LinkedIn OAuth (optional), multi‑source job aggregation, Gemini‑powered generation, and advanced agent capabilities (parallelism, temporal tracking, observability, context engineering).

	---

	### What you get
	- Two UIs: Streamlit (`app.py`) and Gradio/HF (`hf_app.py`)
	- LinkedIn OAuth 2.0 (optional; CSRF‑safe state validation)
	- Job aggregation: Adzuna (5k/month) plus resilient fallbacks
	- ATS‑optimized drafting: resumes + cover letters (Gemini)
	- Office exports:
	- Word resumes and cover letters (5 templates)
	- PowerPoint CV (4 templates)
	- Excel application tracker (5 analytical sheets)
	- Advanced agents: parallel execution, temporal memory, observability/tracing, and context engineering/flywheel
	- LangExtract integration: structured extraction with Gemini key; robust regex fallback in constrained environments
	- New: Router pipeline, Temporal KG integration, Parallel-agents demo, HF minimal Space branch
	- New (Aug 2025): UK resume rules, action-verb upgrades, anti-buzzword scrub, skills proficiency, remote readiness, Muse/Reed/Novorésumé/StandOut CV checklists, and interactive output controls (exact length, cycles, layout presets)

	---

	## Quickstart

	### 1) Environment (.env)
	Create a UTF‑8 `.env` (values optional if you want mock mode). See `.env.example` for the full list of variables:
	```ini
	# Behavior
	MOCK_MODE=true
	PORT=7860

	# LLM / Research
	LLM_PROVIDER=gemini
	LLM_MODEL=gemini-2.5-flash
	GEMINI_API_KEY=
	# Optional per-agent Gemini keys
	GEMINI_API_KEY_CV=
	GEMINI_API_KEY_COVER=
	GEMINI_API_KEY_CHAT=
	GEMINI_API_KEY_PARSER=
	GEMINI_API_KEY_MATCH=
	GEMINI_API_KEY_TAILOR=
	OPENAI_API_KEY=
	ANTHROPIC_API_KEY=

	TAVILY_API_KEY=

	# Job APIs
	ADZUNA_APP_ID=
	ADZUNA_APP_KEY=

	# Office MCP (optional)
	POWERPOINT_MCP_URL=http://localhost:3000
	WORD_MCP_URL=http://localhost:3001
	EXCEL_MCP_URL=http://localhost:3002

	# LangExtract uses GEMINI key by default
	LANGEXTRACT_API_KEY=
	```

	Hardcoded keys have been removed from utility scripts. Use `switch_api_key.py` to safely set keys into `.env` without embedding them in code.

	### 2) Install
	- Windows PowerShell
	```powershell
	python -m venv .venv
	.\.venv\Scripts\Activate.ps1
	pip install -r requirements.txt
	```
	- Linux/macOS
	```bash
	python3 -m venv .venv
	source .venv/bin/activate
	pip install -r requirements.txt
	```

	### 3) Run the apps
	- Streamlit (PATH‑safe)
	```powershell
	python -m streamlit run app.py --server.port 8501
	```
	- Gradio / Hugging Face (avoid port conflicts)
	```powershell
	$env:PORT=7861; python hf_app.py
	```
	```bash
	PORT=7861 python hf_app.py
	```
	The HF app binds on 0.0.0.0:$PORT.

	---

	## 📊 System Architecture Overview

	This is a production-ready, multi-agent job application system with sophisticated AI capabilities and enterprise-grade features:

	### 🏗️ Core Architecture

	#### Dual Interface Design
	- Streamlit Interface (`app.py`) - Traditional web application for desktop use
	- Gradio/HF Interface (`hf_app.py`) - Modern, mobile-friendly, deployable to Hugging Face Spaces

	#### Multi-Agent System (15 Specialized Agents)

	Core Processing Agents:
	- `OrchestratorAgent` - Central coordinator managing workflow and job orchestration
	- `CVOwnerAgent` - ATS-optimized resume generation with UK-specific formatting rules
	- `CoverLetterAgent` - Personalized cover letter generation with keyword optimization
	- `ProfileAgent` - Intelligent CV parsing and structured profile extraction
	- `JobAgent` - Job posting analysis and requirement extraction
	- `RouterAgent` - Dynamic routing based on payload state and workflow stage

	Advanced AI Agents:
	- `ParallelExecutor` - Concurrent processing for 3-5x faster multi-job handling
	- `TemporalTracker` - Time-stamped application history and pattern analysis
	- `ObservabilityAgent` - Real-time tracing, metrics collection, and monitoring
	- `ContextEngineer` - Flywheel learning and context optimization
	- `ContextScaler` - L1/L2/L3 memory management for scalable context handling
	- `LinkedInManager` - OAuth 2.0 integration and profile synchronization
	- `MetaAgent` - Combines outputs from multiple specialized analysis agents
	- `TriageAgent` - Intelligent task prioritization and routing

	#### Guidelines Enforcement System (`agents/guidelines.py`)
	Comprehensive rule engine ensuring document quality:
	- UK Compliance: British English, UK date formats (MMM YYYY), £ currency normalization
	- ATS Optimization: Plain text formatting, keyword density, section structure
	- Content Quality: Anti-buzzword filtering, action verb strengthening, first-person removal
	- Layout Rules: Exact length enforcement, heading validation, bullet point formatting

	### 🔌 Integration Ecosystem

	#### LLM Integration (`services/llm.py`)
	- Multi-Provider Support: OpenAI, Anthropic Claude, Google Gemini
	- Per-Agent API Keys: Cost optimization through agent-specific key allocation
	- Intelligent Fallbacks: Graceful degradation when providers unavailable
	- Configurable Models: Per-agent model selection for optimal performance/cost

	#### Job Aggregation (`services/job_aggregator.py`, `services/jobspy_client.py`)
	- Primary Sources: Adzuna API (5,000 jobs/month free tier)
	- JobSpy Integration: Indeed, LinkedIn, Glassdoor aggregation
	- Additional APIs: Remotive, The Muse, GitHub Jobs
	- Smart Deduplication: Title + company matching with fuzzy logic
	- SSL Bypass: Automatic retry for corporate environments

	#### Document Generation (`services/`)
	- Word Documents (`word_cv.py`): 5 professional templates, MCP server integration
	- PowerPoint CVs (`powerpoint_cv.py`): 4 visual templates for presentations
	- Excel Trackers (`excel_tracker.py`): 5 analytical sheets with metrics
	- PDF Export: Cross-platform compatibility with formatting preservation

	### 📈 Advanced Features

	#### Pipeline Architecture (`agents/pipeline.py`)
	```
	User Input → Router → Profile Analysis → Job Analysis → Resume Generation → Cover Letter → Review → Memory Storage
	↓ ↓ ↓ ↓ ↓ ↓
	Event Log Profile Cache Job Cache Document Cache Metrics Log Temporal KG
	```

	#### Memory & Persistence
	- File-backed Storage (`memory/store.py`): Atomic writes, thread-safe operations
	- Temporal Knowledge Graph: Application tracking with time-stamped relationships
	- Event Sourcing (`events.jsonl`): Complete audit trail of all agent actions
	- Caching System (`utils/cache.py`): TTL-based caching with automatic eviction

	#### LangExtract Integration (`services/langextract_service.py`)
	- Structured Extraction: Job requirements, skills, company culture
	- ATS Optimization: Keyword extraction and scoring
	- Fallback Mechanisms: Regex-based extraction when API unavailable
	- Result Caching: Performance optimization for repeated analyses

	### 🛡️ Security & Configuration

	#### Authentication & Security
	- OAuth 2.0: LinkedIn integration with CSRF protection
	- Input Sanitization: Path traversal and injection prevention
	- Environment Isolation: Secrets management via `.env`
	- Rate Limiting: API throttling and abuse prevention

	#### Configuration Management
	- Environment Variables: All sensitive data in `.env`
	- Agent Configuration (`utils/config.py`): Centralized settings
	- Template System: Customizable document templates
	- Feature Flags: Progressive enhancement based on available services

	### 📁 Project Structure

	```
	2096955/
	├── agents/ # Multi-agent system components
	│ ├── orchestrator.py # Main orchestration logic
	│ ├── cv_owner.py # Resume generation with guidelines
	│ ├── guidelines.py # UK rules and ATS optimization
	│ ├── pipeline.py # Application pipeline flow
	│ └── ... # Additional specialized agents
	├── services/ # External integrations and services
	│ ├── llm.py # Multi-provider LLM client
	│ ├── job_aggregator.py # Job source aggregation
	│ ├── word_cv.py # Word document generation
	│ └── ... # Document and API services
	├── utils/ # Utility functions and helpers
	│ ├── ats.py # ATS scoring and optimization
	│ ├── cache.py # TTL caching system
	│ ├── consistency.py # Contradiction detection
	│ └── ... # Text processing and helpers
	├── models/ # Data models and schemas
	│ └── schemas.py # Pydantic models for type safety
	├── mcp/ # Model Context Protocol servers
	│ ├── cv_owner_server.py
	│ ├── cover_letter_server.py
	│ └── orchestrator_server.py
	├── memory/ # Persistent storage
	│ ├── store.py # File-backed memory store
	│ └── data/ # Application state and history
	├── app.py # Streamlit interface
	├── hf_app.py # Gradio/HF interface
	└── api_llm_integration.py # REST API endpoints
	```

	### 🚀 Performance Optimizations

	- Parallel Processing: Async job handling with `asyncio` and `nest_asyncio`
	- Lazy Loading: Dependencies loaded only when needed
	- Smart Caching: Multi-level caching (memory, file, API responses)
	- Batch Operations: Efficient multi-job processing
	- Event-Driven: Asynchronous event handling for responsiveness

	### 🧪 Testing & Quality

	- Test Suites: Comprehensive tests in `tests/` directory
	- Integration Tests: API and service integration validation
	- Mock Mode: Development without API keys
	- Smoke Tests: Quick validation scripts for deployment
	- Observability: Built-in tracing and metrics collection

	---

	## Router pipeline (User → Router → Profile → Job → Resume → Cover → Review)
	- Implemented in `agents/pipeline.py` and exposed via API in `api_llm_integration.py` (`/api/llm/pipeline_run`).
	- Agents:
	- `RouterAgent`: routes based on payload state
	- `ProfileAgent`: parses CV to structured profile (LLM with fallback)
	- `JobAgent`: analyzes job posting (LLM with fallback)
	- `CVOwnerAgent` and `CoverLetterAgent`: draft documents (Gemini, per-agent keys)
	- Review: contradiction checks and memory persist
	- Temporal tracking: on review, a `drafted` status is recorded in the temporal KG with issues metadata.

	Flow diagram
	```mermaid
	flowchart TD
	U["User"] --> R["RouterAgent"]
	R -->\|cv_text present\| P["ProfileAgent (LLM)"]
	R -->\|job_posting present\| J["JobAgent (LLM)"]
	P --> RESUME["CVOwnerAgent"]
	J --> RESUME
	RESUME --> COVER["CoverLetterAgent"]
	COVER --> REVIEW["Orchestrator Review"]
	REVIEW --> M["MemoryStore (file-backed)"]
	REVIEW --> TKG["Temporal KG (triplets)"]
	subgraph LLM["LLM Client (Gemini 2.5 Flash, per-agent keys)"]
	P
	J
	RESUME
	COVER
	end
	subgraph UI["Gradio (HF)"]
	U
	end
	subgraph API["Flask API"]
	PR["/api/llm/pipeline_run"]
	end
	U -. optional .-> PR
	```

	---

	## Hugging Face / Gradio (interactive controls)
	- In the CV Analysis tab, you can now set:
	- Refinement cycles (1–5)
	- Exact target length (characters) to enforce resume and cover length deterministically
	- Layout preset: `classic`, `modern`, `minimalist`, `executive`
	- classic: Summary → Skills → Experience → Education (above the fold for Summary/Skills)
	- modern: Summary → Experience → Skills → Projects/Certifications → Education
	- minimalist: concise Summary → Skills → Experience → Education
	- executive: Summary → Selected Achievements (3–5) → Experience → Skills → Education → Certifications

	---

	## UK resume/cover rules (built-in)
	- UK English and dates (MMM YYYY)
	- Current role in present tense; previous roles in past tense
	- Digits for numbers; £ and % normalization
	- Remove first‑person pronouns in resume bullets; maintain active voice
	- Hard skills first (max ~10), then soft skills; verbatim critical JD keywords in bullets
	- Strip DOB/photo lines; compress older roles (>15 years) to title/company/dates

	These rules are applied by `agents/cv_owner.py` and validated by checklists.

	---

	## Checklists and observability
	- Checklists integrate guidance from:
	- Reed: CV layout and mistakes
	- The Muse: action verbs and layout basics
	- Novorésumé: one‑page bias, clean sections, links
	- StandOut CV: quantification, bullet density, recent‑role focus
	- Observability tab aggregates per‑agent events and displays checklist outcomes. Events are stored in `memory/data/events.jsonl`.

	---

	## Scripts (headless runs)
	- Capco (Anthony Lui → Capco):
	```powershell
	python .\scripts\run_with_env.py .\scripts\run_anthony_capco.py
	```
	- Anthropic (Anthony Lui → Anthropic):
	```powershell
	python .\scripts\run_with_env.py .\scripts\run_anthropic_job.py
	```
	- Pipeline (Router + Agents + Review + Events):
	```powershell
	python .\scripts\run_with_env.py .\scripts\pipeline_anthony_capco.py
	```

	These scripts print document lengths, agent diagnostics, and whether Gemini is enabled. Set `.env` with `LLM_PROVIDER=gemini`, `LLM_MODEL=gemini-2.5-flash`, and `GEMINI_API_KEY`.

	---

	## Temporal knowledge graph (micro‑memory)
	- `agents/temporal_tracker.py` stores time‑stamped triplets with non‑destructive invalidation.
	- Integrated in pipeline review to track job application states and history.
	- Utilities for timelines, active applications, and pattern analysis included.

	---

	## Parallel agents + meta‑agent demo
	- Notebook: `notebooks/agents_parallel_demo.ipynb`
	- Runs 4 analysis agents in parallel and combines outputs via a meta‑agent, with a timeline plot.
	- Uses the central LLM client (`services/llm.py`) with `LLM_PROVIDER=gemini` and `LLM_MODEL=gemini-2.5-flash`.

	Run (Jupyter/VSCode):
	```python
	%pip install nest_asyncio matplotlib
	# Ensure GEMINI_API_KEY is set in your environment
	```
	Open and run the notebook cells.

	---

	## LinkedIn OAuth (optional)
	1) Create a LinkedIn Developer App, then add redirect URLs:
	```
	http://localhost:8501
	http://localhost:8501/callback
	```
	2) Products: enable “Sign In with LinkedIn using OpenID Connect”.
	3) Update `.env` and set `MOCK_MODE=false`.
	4) In the UI, use the “LinkedIn Authentication” section to kick off the flow.

	Notes:
	- LinkedIn Jobs API is enterprise‑only. The system uses Adzuna + other sources for job data.

	---

	## Job sources
	- Adzuna: global coverage, 5,000 free jobs/month
	- Resilient aggregator and optional JobSpy MCP for broader search
	- Custom jobs: add your own postings in the UI
	- Corporate SSL environments: Adzuna calls auto‑retries with `verify=False` fallback

	---

	## LLMs and configuration
	- Central client supports OpenAI, Anthropic, and Gemini with per‑agent Gemini keys (`services/llm.py`).
	- Recommended defaults for this project:
	- `LLM_PROVIDER=gemini`
	- `LLM_MODEL=gemini-2.5-flash`
	- Agents pass `agent="cv\|cover\|parser\|match\|tailor\|chat"` to use per‑agent keys when provided.

	---

	## Advanced agents (built‑in)
	- Parallel processing: 3–5× faster multi‑job drafting
	- Temporal tracking: time‑stamped history and pattern analysis
	- Observability: tracing, metrics, timeline visualization
	- Context engineering: flywheel learning, L1/L2/L3 memory, scalable context

	Toggle these in the HF app under “🚀 Advanced AI Features”.

	---

	## LangExtract + Gemini
	- Uses the same `GEMINI_API_KEY` (auto‑applied to `LANGEXTRACT_API_KEY` when empty)
	- Official `langextract.extract(...)` requires examples; the UI also exposes a robust regex‑based fallback (`services/langextract_service.py`) so features work even when cloud extraction is constrained
	- In HF app (“🔍 Enhanced Job Analysis”), you can:
	- Analyze job postings (structured fields + skills)
	- Optimize resume for ATS (score + missing keywords)
	- Bulk analyze multiple jobs

	---

	## Office exports
	- Word (`services/word_cv.py`): resumes + cover letters (5 templates; `python‑docx` fallback)
	- PowerPoint (`services/powerpoint_cv.py`): visual CV (4 templates; `python‑pptx` fallback)
	- Excel (`services/excel_tracker.py`): tracker with 5 analytical sheets (`openpyxl` fallback)
	- MCP servers supported when available; local libraries are used otherwise

	In HF app, after generation, expand:
	- “📊 Export to PowerPoint CV”
	- “📝 Export to Word Documents”
	- “📈 Export Excel Tracker”

	---

	## Hugging Face minimal Space branch
	- Clean branch containing only `app.py` and `requirements.txt` for Spaces.
	- Branch name: `hf-space-min` (push from a clean worktree).
	- `.gitignore` includes `.env` and `.env.*` to avoid leaking secrets.

	---

	## Tests & scripts
	- Run test suites in `tests/`
	- Useful scripts: `test_*` files in project root (integration checks)

	---

	## Security
	- OAuth state validation, input/path/url sanitization
	- Sensitive data via environment variables; avoid committing secrets
	- Atomic writes in memory store

	---

	## Run summary
	- Streamlit: `python -m streamlit run app.py --server.port 8501`
	- Gradio/HF: `PORT=7861 python hf_app.py`

	Your system is fully documented here in one place and ready for local or HF deployment.