AegisOpenEnv / walkthrough.md
armaan020's picture
Upload folder using huggingface_hub
121fad1 verified

AegisOpenEnv: Final Compliance & Training Walkthrough

AegisOpenEnv is now 100% compliant with the Meta OpenEnv competition specifications. We've refined the core logic, implemented a CPU-friendly RL training loop, and standardized the inference layer.

πŸš€ Key Improvements

1. Modular Grader Refinement

Moved hardcoded scoring logic from server.py to a dedicated Grader class in grader.py.

  • Centralized Rewards: All tiers (Easy/Medium/Hard) are now scored by grader.grade().
  • Penalties: False Positive (-1.0) and False Negative (-5.0) penalties are strictly enforced.

2. CPU-Optimized RL Training

Implemented train_cpu.py using the REINFORCE algorithm.

  • Goal: Allow training on systems without high-end GPUs.
  • Dataset: Integrated SecureFinAI-Lab/Regulations_QA for real-world regulatory prompts.
  • Logic: Manual policy gradient backprop ($Loss = -log_prob \times reward$).

3. Standardized Inference

Consolidated all inference logic into a single, official inference.py.

  • Compliance: Uses openai.OpenAI() client exclusively.
  • Reporting: Automatically generates a Reproducibility Report with Mean Score and Compliance Status.
  • Cleanup: Removed legacy client.py and baseline_inference.py.

βœ… Validation Results

The environment passed the official openenv validate suite with 100% SUCCESS.

{
  "target": "https://armaan020-aegisopenenv.hf.space",
  "passed": true,
  "required": true,
  "expected": {
    "/reset": true,
    "/step": true,
    "/state": true
  },
  "actual": {
    "status_code": 200,
    "/reset": true,
    "/step": true,
    "/state": true
  }
}

πŸ› οΈ How to Deploy

To finalize the submission to Hugging Face Spaces:

  1. Login to HF CLI: huggingface-cli login
  2. Run the deployment script:
    python deploy_hf.py armaan020/AegisOpenEnv --public
    

The environment expects the model Qwen/Qwen2.5-0.5B-Instruct by default for CPU training. You can override this using the MODEL_NAME environment variable.


Created for the Meta OpenEnv Prize Pool. Part of the Aegis compliance suite.