thereisnohr / docs /full-software-guide.md
jbeiroa's picture
Initial clean deploy of demo app
74711df

Full Software Guide

Purpose and Reader

This guide is the canonical technical walkthrough of the thereisnohr codebase for engineers who need to understand, operate, and extend the system. It describes the complete implementation as of the final stage.

System Snapshot

thereisnohr is a complete Applicant Tracking System (ATS) featuring:

  • Async API: FastAPI-based REST backend with database-backed background task management.
  • Modern UI: Multi-page Streamlit application for recruiters.
  • Advanced Ingestion: Multi-stage PDF parsing, identity resolution, and structured extraction.
  • Hybrid Ranking: Vector retrieval (pgvector) combined with deterministic heuristics and LLM reranking.
  • Full Persitence: Robust SQLAlchemy models and Alembic migrations.

Repository Map

  • src/api/: REST API surface and async task framework.
  • ui/: Streamlit frontend application.
  • src/ingest/: PDF parsing, candidate identity resolution, and Metaflow flows.
  • src/extract/: Signal extraction logic for resumes and job descriptions.
  • src/retrieval/: Semantic vector retrieval service.
  • src/ranking/: Hybrid scoring and LLM reranking services.
  • src/storage/: Database models, repositories, and migration runtime.
  • src/llm/: Provider-agnostic client and model alias registry.
  • src/core/: Runtime configuration and structured logging.
  • config/: Model routing aliases and fallback policies.
  • tests/: Unit and integration test suites.

Architecture Layers

1) Runtime & Configuration

  • Settings: Defined in src/core/config.py using pydantic-settings. Loads from .env.
  • Logging: Structured JSON logging with run_id correlation for all internal and external (LLM) calls.

2) Persistence Layer

  • Database: PostgreSQL with pgvector for semantic search.
  • Models: SQLAlchemy 2.0 Mapped styles in src/storage/models.py.
  • Repositories: Standardized CRUD and complex query logic in src/storage/repositories.py.
  • Migrations: Alembic-managed schema revisions.

3) LLM Abstraction

  • LLMClient: Abstract contract for structured generation and embeddings.
  • LiteLLM: Concrete implementation supporting hundreds of providers (OpenAI, Ollama, Anthropic, etc.).
  • Alias Registry: Decouples feature logic from specific models. Features request an alias (e.g., ranker_default), and the registry routes it to the configured provider.

4) Ingestion & Parsing

  • Parser: Uses pymupdf4llm to convert PDFs to markdown, followed by custom cleaning and heading detection.
  • Identity Resolution: Determines if a resume belongs to an existing candidate using deterministic signals (email, phone) and LLM fallback for names.
  • Ingestion Service: Orchestrates the per-file pipeline (parse -> identify -> extract signals -> persist).
  • Metaflow: Used for high-volume batch ingestion from local directories.

5) Retrieval & Ranking

  • Stage 1: Vector Retrieval: Wide-net search using cosine similarity on resume section embeddings.
  • Stage 2: Deterministic Scorer: Heuristic filter based on explicit hard-skill overlap between the JD and candidate signals.
  • Stage 3: LLM Reranking: Qualitative refinement of top candidates, generating fit summaries and gap/risk analysis.

6) API & UI (Interface Layer)

  • FastAPI: Provides endpoints for job management, resume uploads, and ranking triggers.
  • AsyncTask Runner: A zero-dependency worker that executes long-running functions using FastAPI's BackgroundTasks while updating the async_tasks table for UI polling.
  • Streamlit: A dashboard for recruiters to upload files, manage job postings, and review/export candidate rankings.

Developer Runbook

Setup

uv sync --all-extras
cp .env.example .env
# Edit .env with your database and API keys

Database Operations

uv run alembic upgrade head

Running the App

  1. Start Backend: uv run uvicorn src.api.app:app
  2. Start Frontend: uv run streamlit run ui/app.py

Testing

uv run pytest -q
# For database integration tests (requires Docker)
uv run pytest -q -m integration

Technical Contracts

Public REST API

  • POST /api/ingest/upload: Upload PDF resumes.
  • POST /api/jobs: Create a job and extract requirements.
  • POST /api/jobs/{id}/rank: Start the ranking process.
  • GET /api/tasks/{id}: Poll for background task status.
  • GET /api/matches: List ranked candidates for a job.

Model Alias System

Aliases are defined in config/model_aliases.yaml:

  • embedding_default: Used for semantic indexing.
  • extractor_default: Used for structured signal extraction.
  • ranker_default: Used for qualitative reranking and explanations.
  • explainer_default: Used for interview prep pack generation.