# Changelog All notable changes to this project are documented here. Format follows [Keep a Changelog](https://keepachangelog.com/en/1.0.0/). ## [1.0.0] - 2026-04-05 ### Added - **Models**: Complete Pydantic v2 models (`TaskId`, `Action`, `Scenario`, `EpisodeResult`, etc.) - **Scenarios**: 30 synthetic PR scenarios (10 per task) with realistic Python diffs - **Env**: Full episode state machine with noise budget, reward calculation, and history tracking - **Graders**: - `bug_grader.py`: Coverage + precision + severity-weighted scoring - `security_grader.py`: Severity-accuracy-weighted scoring (CRITICAL misclassification penalized) - `arch_grader.py`: Binary issue detection + verdict scoring + detail quality bonus - **Config**: Pydantic-settings config with all options documented in `.env.example` - **Database**: SQLModel persistence (`EpisodeRecord`, `LeaderboardRecord`, helpers) - **API Endpoints**: - `GET /stats`: Aggregate metrics across all recorded episodes - `GET /episodes/{id}/replay`: Full action-by-action replay for completed episodes - `GET /episodes`: List active episodes with metadata - `GET /dashboard`: Web dashboard (dark theme, live leaderboard, WebSocket event feed, stats cards) - **Security**: - Rate limiting via `slowapi`: 60 req/min per IP (configurable) - API key authentication: optional, off by default, enabled via `API_KEY_ENABLED=true` - Added `TrustedHostMiddleware` and `Security Headers` (XSS, Frame protection) - **Episode Lifecycle**: Auto-cleanup of expired episodes every 5 minutes (default 1hr) - **Leaderboard**: Paginated `/leaderboard?limit=N&offset=M&task_id=X` - **Baseline Agent**: Full rewrite with argparse CLI, `KeywordAgent` (35 rules), `LLMAgent` (Claude) - **Evaluation**: `scripts/evaluate.py` for batch evaluation of all 30 scenarios with summary report and progress bars - **Testing**: 155+ parametrized tests with full coverage reporting. - **Dockerization**: Multi-stage `builder` + `production` builds with non-root user security. - **CI/CD**: Unified 5-job pipeline (`lint`, `test`, `validate`, `docker-build`, `publish` to GHCR). - **Branding**: Full rebrand to **CodeLens.**, including signature iconography. ### Fixed - **CLI**: Port mismatch in `baseline.py` (8000 → 7860) and added `--url`, `--task`, `--seed` CLI flags. - **Crash Fixes**: Leaderboard submit crash after list slicing (captured rank before slice). - **WebSocket**: Disconnect now handled with typed `WebSocketDisconnect` and `clients.discard()`. - **Metadata**: Incoherent weight structure in `openenv.yaml` replaced with named, accurate pairs. - **Security**: Implemented `TrustedHostMiddleware` and hardened headers. ## [0.1.0] - Initial Baseline Fork - Initial FastAPI skeleton. - In-memory episode storage. - Basic Dockerfile and Pylint-only CI.