Spaces:
Sleeping
AegisOpenEnv: Final Compliance & Training Walkthrough
AegisOpenEnv is now 100% compliant with the Meta OpenEnv competition specifications. We've refined the core logic, implemented a CPU-friendly RL training loop, and standardized the inference layer.
π Key Improvements
1. Modular Grader Refinement
Moved hardcoded scoring logic from server.py to a dedicated Grader class in grader.py.
- Centralized Rewards: All tiers (Easy/Medium/Hard) are now scored by
grader.grade(). - Penalties: False Positive (-1.0) and False Negative (-5.0) penalties are strictly enforced.
2. CPU-Optimized RL Training
Implemented train_cpu.py using the REINFORCE algorithm.
- Goal: Allow training on systems without high-end GPUs.
- Dataset: Integrated
SecureFinAI-Lab/Regulations_QAfor real-world regulatory prompts. - Logic: Manual policy gradient backprop ($Loss = -log_prob \times reward$).
3. Standardized Inference
Consolidated all inference logic into a single, official inference.py.
- Compliance: Uses
openai.OpenAI()client exclusively. - Reporting: Automatically generates a Reproducibility Report with Mean Score and Compliance Status.
- Cleanup: Removed legacy
client.pyandbaseline_inference.py.
β Validation Results
The environment passed the official openenv validate suite with 100% SUCCESS.
{
"target": "https://armaan020-aegisopenenv.hf.space",
"passed": true,
"required": true,
"expected": {
"/reset": true,
"/step": true,
"/state": true
},
"actual": {
"status_code": 200,
"/reset": true,
"/step": true,
"/state": true
}
}
π οΈ How to Deploy
To finalize the submission to Hugging Face Spaces:
- Login to HF CLI:
huggingface-cli login - Run the deployment script:
python deploy_hf.py armaan020/AegisOpenEnv --public
The environment expects the model
Qwen/Qwen2.5-0.5B-Instructby default for CPU training. You can override this using theMODEL_NAMEenvironment variable.
Created for the Meta OpenEnv Prize Pool. Part of the Aegis compliance suite.