Spaces:
Sleeping
Sleeping
Full Software Guide
Purpose and Reader
This guide is the canonical technical walkthrough of the thereisnohr codebase for engineers who need to understand, operate, and extend the system. It describes the complete implementation as of the final stage.
System Snapshot
thereisnohr is a complete Applicant Tracking System (ATS) featuring:
- Async API: FastAPI-based REST backend with database-backed background task management.
- Modern UI: Multi-page Streamlit application for recruiters.
- Advanced Ingestion: Multi-stage PDF parsing, identity resolution, and structured extraction.
- Hybrid Ranking: Vector retrieval (
pgvector) combined with deterministic heuristics and LLM reranking. - Full Persitence: Robust SQLAlchemy models and Alembic migrations.
Repository Map
src/api/: REST API surface and async task framework.ui/: Streamlit frontend application.src/ingest/: PDF parsing, candidate identity resolution, and Metaflow flows.src/extract/: Signal extraction logic for resumes and job descriptions.src/retrieval/: Semantic vector retrieval service.src/ranking/: Hybrid scoring and LLM reranking services.src/storage/: Database models, repositories, and migration runtime.src/llm/: Provider-agnostic client and model alias registry.src/core/: Runtime configuration and structured logging.config/: Model routing aliases and fallback policies.tests/: Unit and integration test suites.
Architecture Layers
1) Runtime & Configuration
- Settings: Defined in
src/core/config.pyusingpydantic-settings. Loads from.env. - Logging: Structured JSON logging with
run_idcorrelation for all internal and external (LLM) calls.
2) Persistence Layer
- Database: PostgreSQL with
pgvectorfor semantic search. - Models: SQLAlchemy 2.0
Mappedstyles insrc/storage/models.py. - Repositories: Standardized CRUD and complex query logic in
src/storage/repositories.py. - Migrations: Alembic-managed schema revisions.
3) LLM Abstraction
- LLMClient: Abstract contract for structured generation and embeddings.
- LiteLLM: Concrete implementation supporting hundreds of providers (OpenAI, Ollama, Anthropic, etc.).
- Alias Registry: Decouples feature logic from specific models. Features request an alias (e.g.,
ranker_default), and the registry routes it to the configured provider.
4) Ingestion & Parsing
- Parser: Uses
pymupdf4llmto convert PDFs to markdown, followed by custom cleaning and heading detection. - Identity Resolution: Determines if a resume belongs to an existing candidate using deterministic signals (email, phone) and LLM fallback for names.
- Ingestion Service: Orchestrates the per-file pipeline (parse -> identify -> extract signals -> persist).
- Metaflow: Used for high-volume batch ingestion from local directories.
5) Retrieval & Ranking
- Stage 1: Vector Retrieval: Wide-net search using cosine similarity on resume section embeddings.
- Stage 2: Deterministic Scorer: Heuristic filter based on explicit hard-skill overlap between the JD and candidate signals.
- Stage 3: LLM Reranking: Qualitative refinement of top candidates, generating fit summaries and gap/risk analysis.
6) API & UI (Interface Layer)
- FastAPI: Provides endpoints for job management, resume uploads, and ranking triggers.
- AsyncTask Runner: A zero-dependency worker that executes long-running functions using FastAPI's
BackgroundTaskswhile updating theasync_taskstable for UI polling. - Streamlit: A dashboard for recruiters to upload files, manage job postings, and review/export candidate rankings.
Developer Runbook
Setup
uv sync --all-extras
cp .env.example .env
# Edit .env with your database and API keys
Database Operations
uv run alembic upgrade head
Running the App
- Start Backend:
uv run uvicorn src.api.app:app - Start Frontend:
uv run streamlit run ui/app.py
Testing
uv run pytest -q
# For database integration tests (requires Docker)
uv run pytest -q -m integration
Technical Contracts
Public REST API
POST /api/ingest/upload: Upload PDF resumes.POST /api/jobs: Create a job and extract requirements.POST /api/jobs/{id}/rank: Start the ranking process.GET /api/tasks/{id}: Poll for background task status.GET /api/matches: List ranked candidates for a job.
Model Alias System
Aliases are defined in config/model_aliases.yaml:
embedding_default: Used for semantic indexing.extractor_default: Used for structured signal extraction.ranker_default: Used for qualitative reranking and explanations.explainer_default: Used for interview prep pack generation.