Sepsis-OpenEnv / results_comparison.md
BAIBHAV1234's picture
Upload folder using huggingface_hub
c655b32 verified

ID3QNE Sepsis OpenEnv Results

Policy Mean Score Density Steps Safety
Heuristic 0.9867 1.00 9.7 100%
LLM (gpt-4o-mini) 0.9867 1.00 9.7 100%
ID3QNE 0.9867 1.00 9.7 100%

Statistical Validation

  • LLM 10-episode mean score: 0.9867
  • LLM 10-episode score std across episode means: 0.0
  • LLM global reward density: 1.0
  • LLM safety violation rate: 0.0

Key Result

All verified policies achieved dense reward performance with zero safety violations in the local OpenEnv sepsis benchmark.

Notes

  • The OpenAI-backed policy was constrained to the environment action schema and guarded against unsupported outputs.
  • In this environment, the observed performance ceiling is 0.9867, and both the LLM-controlled run and ID3QNE matched that ceiling.