support-ops-env / docs /IMPLEMENTATION_CHECKLIST.md
raj921
clean rewrite: shorter vars, no comments, same logic
cbe3d95
|
Raw
History Blame Contribute Delete
960 Bytes
# Implementation Checklist
## Slice 1: Submission reliability
- [ ] `pip install -e '.[dev]'`
- [ ] `pytest -q`
- [ ] `openenv validate .`
- [ ] `scripts/validate-local.sh --skip-docker`
- [ ] `scripts/validate-local.sh` on a Docker-enabled machine
## Slice 2: Benchmark quality
- [ ] verify each task has a strong-score happy path
- [ ] verify unsafe or irrelevant actions reduce final score
- [ ] verify duplicate handling is required for medium and hard tasks
- [ ] verify hard task reply does not overclaim a confirmed breach
## Slice 3: Docs and demo polish
- [ ] README reflects final baseline and deployment instructions
- [ ] benchmark brief and glossary are present
- [ ] Hugging Face Space secrets are configured
- [ ] remote `/health` and `/reset` checks succeed
## Slice 4: Final submission
- [ ] push to Hugging Face Space
- [ ] run remote validator against the Space URL
- [ ] record final baseline scores and URL in the submission notes