Spaces:
Sleeping
Sleeping
| # Full Software Guide | |
| ## Purpose and Reader | |
| This guide is the canonical technical walkthrough of the `thereisnohr` codebase for engineers who need to understand, operate, and extend the system. It describes the complete implementation as of the final stage. | |
| ## System Snapshot | |
| `thereisnohr` is a complete Applicant Tracking System (ATS) featuring: | |
| - **Async API**: FastAPI-based REST backend with database-backed background task management. | |
| - **Modern UI**: Multi-page Streamlit application for recruiters. | |
| - **Advanced Ingestion**: Multi-stage PDF parsing, identity resolution, and structured extraction. | |
| - **Hybrid Ranking**: Vector retrieval (`pgvector`) combined with deterministic heuristics and LLM reranking. | |
| - **Full Persitence**: Robust SQLAlchemy models and Alembic migrations. | |
| ## Repository Map | |
| - `src/api/`: REST API surface and async task framework. | |
| - `ui/`: Streamlit frontend application. | |
| - `src/ingest/`: PDF parsing, candidate identity resolution, and Metaflow flows. | |
| - `src/extract/`: Signal extraction logic for resumes and job descriptions. | |
| - `src/retrieval/`: Semantic vector retrieval service. | |
| - `src/ranking/`: Hybrid scoring and LLM reranking services. | |
| - `src/storage/`: Database models, repositories, and migration runtime. | |
| - `src/llm/`: Provider-agnostic client and model alias registry. | |
| - `src/core/`: Runtime configuration and structured logging. | |
| - `config/`: Model routing aliases and fallback policies. | |
| - `tests/`: Unit and integration test suites. | |
| --- | |
| ## Architecture Layers | |
| ### 1) Runtime & Configuration | |
| - **Settings**: Defined in `src/core/config.py` using `pydantic-settings`. Loads from `.env`. | |
| - **Logging**: Structured JSON logging with `run_id` correlation for all internal and external (LLM) calls. | |
| ### 2) Persistence Layer | |
| - **Database**: PostgreSQL with `pgvector` for semantic search. | |
| - **Models**: SQLAlchemy 2.0 `Mapped` styles in `src/storage/models.py`. | |
| - **Repositories**: Standardized CRUD and complex query logic in `src/storage/repositories.py`. | |
| - **Migrations**: Alembic-managed schema revisions. | |
| ### 3) LLM Abstraction | |
| - **LLMClient**: Abstract contract for structured generation and embeddings. | |
| - **LiteLLM**: Concrete implementation supporting hundreds of providers (OpenAI, Ollama, Anthropic, etc.). | |
| - **Alias Registry**: Decouples feature logic from specific models. Features request an alias (e.g., `ranker_default`), and the registry routes it to the configured provider. | |
| ### 4) Ingestion & Parsing | |
| - **Parser**: Uses `pymupdf4llm` to convert PDFs to markdown, followed by custom cleaning and heading detection. | |
| - **Identity Resolution**: Determines if a resume belongs to an existing candidate using deterministic signals (email, phone) and LLM fallback for names. | |
| - **Ingestion Service**: Orchestrates the per-file pipeline (parse -> identify -> extract signals -> persist). | |
| - **Metaflow**: Used for high-volume batch ingestion from local directories. | |
| ### 5) Retrieval & Ranking | |
| - **Stage 1: Vector Retrieval**: Wide-net search using cosine similarity on resume section embeddings. | |
| - **Stage 2: Deterministic Scorer**: Heuristic filter based on explicit hard-skill overlap between the JD and candidate signals. | |
| - **Stage 3: LLM Reranking**: Qualitative refinement of top candidates, generating fit summaries and gap/risk analysis. | |
| ### 6) API & UI (Interface Layer) | |
| - **FastAPI**: Provides endpoints for job management, resume uploads, and ranking triggers. | |
| - **AsyncTask Runner**: A zero-dependency worker that executes long-running functions using FastAPI's `BackgroundTasks` while updating the `async_tasks` table for UI polling. | |
| - **Streamlit**: A dashboard for recruiters to upload files, manage job postings, and review/export candidate rankings. | |
| --- | |
| ## Developer Runbook | |
| ### Setup | |
| ```bash | |
| uv sync --all-extras | |
| cp .env.example .env | |
| # Edit .env with your database and API keys | |
| ``` | |
| ### Database Operations | |
| ```bash | |
| uv run alembic upgrade head | |
| ``` | |
| ### Running the App | |
| 1. **Start Backend**: `uv run uvicorn src.api.app:app` | |
| 2. **Start Frontend**: `uv run streamlit run ui/app.py` | |
| ### Testing | |
| ```bash | |
| uv run pytest -q | |
| # For database integration tests (requires Docker) | |
| uv run pytest -q -m integration | |
| ``` | |
| --- | |
| ## Technical Contracts | |
| ### Public REST API | |
| - `POST /api/ingest/upload`: Upload PDF resumes. | |
| - `POST /api/jobs`: Create a job and extract requirements. | |
| - `POST /api/jobs/{id}/rank`: Start the ranking process. | |
| - `GET /api/tasks/{id}`: Poll for background task status. | |
| - `GET /api/matches`: List ranked candidates for a job. | |
| ### Model Alias System | |
| Aliases are defined in `config/model_aliases.yaml`: | |
| - `embedding_default`: Used for semantic indexing. | |
| - `extractor_default`: Used for structured signal extraction. | |
| - `ranker_default`: Used for qualitative reranking and explanations. | |
| - `explainer_default`: Used for interview prep pack generation. | |