support-ops-env / docs /IMPLEMENTATION_CHECKLIST.md
raj921
clean rewrite: shorter vars, no comments, same logic
cbe3d95
|
Raw
History Blame Contribute Delete
960 Bytes

Implementation Checklist

Slice 1: Submission reliability

  • pip install -e '.[dev]'
  • pytest -q
  • openenv validate .
  • scripts/validate-local.sh --skip-docker
  • scripts/validate-local.sh on a Docker-enabled machine

Slice 2: Benchmark quality

  • verify each task has a strong-score happy path
  • verify unsafe or irrelevant actions reduce final score
  • verify duplicate handling is required for medium and hard tasks
  • verify hard task reply does not overclaim a confirmed breach

Slice 3: Docs and demo polish

  • README reflects final baseline and deployment instructions
  • benchmark brief and glossary are present
  • Hugging Face Space secrets are configured
  • remote /health and /reset checks succeed

Slice 4: Final submission

  • push to Hugging Face Space
  • run remote validator against the Space URL
  • record final baseline scores and URL in the submission notes