CodeLens / CHANGELOG.md
AIMLxDIV's picture
chore : updated configs and formatiing to meet openev specs
74df718
|
Raw
History Blame Contribute Delete
2.82 kB
# Changelog
All notable changes to this project are documented here.
Format follows [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
## [1.0.0] - 2026-04-05
### Added
- **Models**: Complete Pydantic v2 models (`TaskId`, `Action`, `Scenario`, `EpisodeResult`, etc.)
- **Scenarios**: 30 synthetic PR scenarios (10 per task) with realistic Python diffs
- **Env**: Full episode state machine with noise budget, reward calculation, and history tracking
- **Graders**:
- `bug_grader.py`: Coverage + precision + severity-weighted scoring
- `security_grader.py`: Severity-accuracy-weighted scoring (CRITICAL misclassification penalized)
- `arch_grader.py`: Binary issue detection + verdict scoring + detail quality bonus
- **Config**: Pydantic-settings config with all options documented in `.env.example`
- **Database**: SQLModel persistence (`EpisodeRecord`, `LeaderboardRecord`, helpers)
- **API Endpoints**:
- `GET /stats`: Aggregate metrics across all recorded episodes
- `GET /episodes/{id}/replay`: Full action-by-action replay for completed episodes
- `GET /episodes`: List active episodes with metadata
- `GET /dashboard`: Web dashboard (dark theme, live leaderboard, WebSocket event feed, stats cards)
- **Security**:
- Rate limiting via `slowapi`: 60 req/min per IP (configurable)
- API key authentication: optional, off by default, enabled via `API_KEY_ENABLED=true`
- Added `TrustedHostMiddleware` and `Security Headers` (XSS, Frame protection)
- **Episode Lifecycle**: Auto-cleanup of expired episodes every 5 minutes (default 1hr)
- **Leaderboard**: Paginated `/leaderboard?limit=N&offset=M&task_id=X`
- **Baseline Agent**: Full rewrite with argparse CLI, `KeywordAgent` (35 rules), `LLMAgent` (Claude)
- **Evaluation**: `scripts/evaluate.py` for batch evaluation of all 30 scenarios with summary report and progress bars
- **Testing**: 155+ parametrized tests with full coverage reporting.
- **Dockerization**: Multi-stage `builder` + `production` builds with non-root user security.
- **CI/CD**: Unified 5-job pipeline (`lint`, `test`, `validate`, `docker-build`, `publish` to GHCR).
- **Branding**: Full rebrand to **CodeLens.**, including signature iconography.
### Fixed
- **CLI**: Port mismatch in `baseline.py` (8000 → 7860) and added `--url`, `--task`, `--seed` CLI flags.
- **Crash Fixes**: Leaderboard submit crash after list slicing (captured rank before slice).
- **WebSocket**: Disconnect now handled with typed `WebSocketDisconnect` and `clients.discard()`.
- **Metadata**: Incoherent weight structure in `openenv.yaml` replaced with named, accurate pairs.
- **Security**: Implemented `TrustedHostMiddleware` and hardened headers.
## [0.1.0] - Initial Baseline Fork
- Initial FastAPI skeleton.
- In-memory episode storage.
- Basic Dockerfile and Pylint-only CI.