ascent-interview-backend / docs /PROJECT_LOG.md
faais-k's picture
Final updated project
60fbe12
Perfect. Below is the **full, clean, editor-ready** version of **`docs/project_log.md`**.
You can **copy–paste as-is**. No filler, no inconsistencies, aligned with your actual backend state.
---
## πŸ“„ `docs/project_log.md`
```md
# PROJECT LOG β€” Multimodal AI Interview Simulator
## Project Title
Multimodal AI Interview Simulator
(Backend-first system with scoring, analytics, decision engine, and future frontend)
---
## Project Summary
The Multimodal AI Interview Simulator is a full-stack interview evaluation system designed to:
- Parse ATS-friendly resumes
- Accept job role, job description, experience, and education
- Dynamically generate interview questions
- Conduct a **stateful, multi-stage interview**
- Score answers using **Sentence Transformers + TF-IDF explainability**
- Insert **follow-up questions automatically** when answers lack depth
- Aggregate scores into a final report
- Generate analytics and a readiness index
- Produce a final hiring decision (PASS / BORDERLINE / HOLD)
- Persist **all artifacts** for audit and review
The backend is deterministic, explainable, and LLM-free by design (LLMs planned later).
---
## Architecture Overview
### Backend Stack
- **Framework**: FastAPI
- **ML**: sentence-transformers, scikit-learn
- **ASR**: Whisper (via transformers)
- **Storage**: filesystem (`storage/<session_id>/`)
- **State management**: JSON-based interview state machine
- **Scoring**: cosine similarity + heuristics
- **Analytics**: post-interview signal analysis
- **Decision engine**: rule-based, conservative by default
---
## Key Directories & Files
```
backend/
└── app/
β”œβ”€β”€ main.py
β”œβ”€β”€ api/
β”‚ └── routes/
β”‚ β”œβ”€β”€ health.py
β”‚ β”œβ”€β”€ session.py
β”‚ β”œβ”€β”€ upload.py
β”‚ β”œβ”€β”€ parse_resume.py
β”‚ β”œβ”€β”€ interview_plan.py
β”‚ β”œβ”€β”€ generate_question.py
β”‚ β”œβ”€β”€ score_text.py
β”‚ β”œβ”€β”€ answer_audio.py
β”‚ β”œβ”€β”€ aggregate_scores.py # Phase 6.1
β”‚ β”œβ”€β”€ analytics_report.py # Phase 6.2
β”‚ └── decision.py # Phase 6.3
β”œβ”€β”€ core/
β”‚ β”œβ”€β”€ interview_flow.py
β”‚ β”œβ”€β”€ interview_state.py
β”‚ β”œβ”€β”€ interview_reasoning.py
β”‚ β”œβ”€β”€ scoring_config.py
β”‚ └── ml_models.py
docs/
β”œβ”€β”€ api_contracts.md
β”œβ”€β”€ project_log.md
storage/
└── <session_id>/
```
---
## Interview Flow (Authoritative)
```
intro
β†’ project
β†’ technical (resume + JD driven)
β†’ follow-ups (conditional, dynamic)
β†’ HR / behavioral
β†’ critical thinking
β†’ warmup
β†’ wrapup
β†’ candidate questions (loop until none)
β†’ end
````
Rules:
- Intro and project are fixed stages
- Follow-ups can occur **anywhere except between intro β†’ project**
- Interview never ends abruptly after wrap-up
- Candidate questions are handled before final termination
---
## Chronological Development Log
### Phase 0 β€” Foundation
- Repository initialized
- FastAPI skeleton created
- Health check endpoint added
- Session creation with isolated storage implemented
---
### Phase 1 β€” Resume Handling
- Resume upload (`/api/upload/resume`)
- PDF/DOCX text extraction
- Resume parsing (`/api/parse/resume/{session_id}`)
- Extracted: name, email, skills
- Parsing hardened against encoding errors
---
### Phase 2 β€” Interview Planning
- Interview plan generation using:
- Resume
- Job role
- Job description
- Candidate profile
- Deterministic question generation (no LLM)
- Output stored as `interview_plan.json`
---
### Phase 3 β€” Interview State Machine
- Implemented persistent interview state
- Added:
- `start_interview`
- `next_question`
- State transitions tracked in `interview_state.json`
- Prevents repeated questions
- Supports dynamic follow-ups
---
### Phase 4 β€” Answer Scoring
- Text answer scoring (`/api/score/text`)
- SentenceTransformer embeddings
- Cosine similarity normalized to score [0–10]
- TF-IDF token extraction for explainability
- Human-review flags when:
- Score below threshold
- Answer too short
- Depth insufficient
---
### Phase 5 β€” Audio Answers (ASR)
- Audio upload endpoint
- Whisper ASR integration
- Transcript auto-scored using same pipeline
- Audio persisted for audit
---
### Phase 6.1 β€” Score Aggregation
- Aggregates all question scores
- Weighted scoring by question type
- Coverage validation against interview plan
- Generates `final_report.json`
- Updates interview state with final metadata
---
### Phase 6.2 β€” Analytics
- Skill-wise performance analysis
- Follow-up pressure metrics
- Answer consistency signals
- Risk classification (LOW / MEDIUM / HIGH)
- Readiness index generation
- Output: `analytics_report.json`
Cleanup iteration performed:
- Removed stop-word noise
- Reduced token pollution
- Limited skills to meaningful technical signals
---
### Phase 6.3 β€” Decision Engine
- Final hiring decision endpoint
- Inputs:
- Final score
- Analytics signals
- Human-review flags
- Conservative rules:
- Human review β†’ BORDERLINE or HOLD
- Output: `decision.json`
---
## Known Issues & Resolutions
### Docs UI freezing
- Cause: heavy ML loads during startup
- Fix: lazy model loading
### Resume parsing crashes
- Cause: malformed PDFs
- Fix: error-tolerant text extraction
### Model reload issues (uvicorn)
- Cause: reload + model load race
- Fix: centralized model loader with fallback
---
## Current Status (Confirmed)
βœ… Backend complete
βœ… Interview lifecycle stable
βœ… Scoring, aggregation, analytics, decision working
βœ… API contract finalized
⏳ LLM enhancement deferred
⏳ Security hardening pending
⏳ Frontend not started
---
## Deferred / Future Enhancements
- LLM-based question generation
- LLM-assisted answer evaluation
- Skill taxonomy + ontology
- JWT authentication & RBAC
- Performance optimizations
- Vector DB for answer history
- Admin dashboard
---
## How to Run (Local)
```bash
python -m venv intrv-sim
source intrv-sim/bin/activate
pip install -r requirements.txt
uvicorn backend.app.main:app --reload --reload-exclude storage/*
````
---
## Recommended Next Steps
1. Freeze backend (tag release)
2. Finalize API contract (done)
3. Build frontend (React/Vite)
4. Add admin dashboard
5. Harden security
---
## Changelog
* **2026-02-02**
* Phase 6.1, 6.2, 6.3 completed
* Backend feature-complete (non-LLM)