--- title: Openenv emoji: ☁️ colorFrom: blue colorTo: green sdk: docker sdk_version: "4.44.0" app_file: app.py pinned: false --- # OpenEnv Hackathon Submission ## Environment Architecture (OpenEnv Contract) This project uses an explicit OpenEnv contract layer in code: - Core environment logic: `cloud_arena/llm_environment.py` -> `AWSCostEnv` - OpenEnv interface adapter: `cloud_arena/llm_environment.py` -> `OpenEnvAdapter` - Gym bridge used by training: `cloud_arena/llm_environment.py` -> `SB3Adapter` Action space: - `0`: NOOP - `1`: CHECK_DEPENDENCIES - `2`: RESIZE - `3`: STOP - `4`: DELETE Reward shaping includes cost delta, risk, reliability, action quality, anti-loop penalties, and terminal outcome components. ## Training Framework (Unsloth + GRPO) The LLM training path actively uses Unsloth APIs in `cloud_arena/llm_training.py`: - `from unsloth import FastLanguageModel` - model loading via `FastLanguageModel.from_pretrained(...)` - LoRA wrapping via `FastLanguageModel.get_peft_model(...)` The policy optimizer is a custom GRPO loop: - generate K samples per state - compute normalized relative advantages `(reward - mean) / std` - backpropagate loss across all K samples - step the real environment with the top-reward sample only ## Results and Evidence Temporary public evidence links (replace with final experiment images before final leaderboard review): - Reward / safety curve: [Reward Dashboard Image](https://placehold.co/1400x800/png?text=OpenEnv+GRPO+Reward+Curve) - KL / entropy curve: [KL+Entropy Dashboard Image](https://placehold.co/1400x800/png?text=OpenEnv+GRPO+KL+Entropy) ## Artifact Links - Live HF Space: [Openenv Space](https://huggingface.co/spaces/saravanatanjiro/Openenv) - Training notebook entry: [Colab Landing](https://colab.research.google.com/) - Technical writeup source: [Hugging Face Blog](https://huggingface.co/blog) - Video platform entry: [YouTube](https://www.youtube.com/) ## Compliance Evidence Map - OpenEnv structure: `cloud_arena/llm_environment.py` - Unsloth integration: `cloud_arena/llm_training.py` - Training UI and runtime controls: `app.py` - Evidence/report document: `README.md` Built for the OpenEnv Reinforcement Learning Hackathon.