Spaces:
Paused
Paused
| title: Openenv | |
| emoji: ☁️ | |
| colorFrom: blue | |
| colorTo: green | |
| sdk: docker | |
| sdk_version: "4.44.0" | |
| app_file: app.py | |
| pinned: false | |
| # OpenEnv Hackathon Submission | |
| ## Environment Architecture (OpenEnv Contract) | |
| This project uses an explicit OpenEnv contract layer in code: | |
| - Core environment logic: `cloud_arena/llm_environment.py` -> `AWSCostEnv` | |
| - OpenEnv interface adapter: `cloud_arena/llm_environment.py` -> `OpenEnvAdapter` | |
| - Gym bridge used by training: `cloud_arena/llm_environment.py` -> `SB3Adapter` | |
| Action space: | |
| - `0`: NOOP | |
| - `1`: CHECK_DEPENDENCIES | |
| - `2`: RESIZE | |
| - `3`: STOP | |
| - `4`: DELETE | |
| Reward shaping includes cost delta, risk, reliability, action quality, anti-loop penalties, and terminal outcome components. | |
| ## Training Framework (Unsloth + GRPO) | |
| The LLM training path actively uses Unsloth APIs in `cloud_arena/llm_training.py`: | |
| - `from unsloth import FastLanguageModel` | |
| - model loading via `FastLanguageModel.from_pretrained(...)` | |
| - LoRA wrapping via `FastLanguageModel.get_peft_model(...)` | |
| The policy optimizer is a custom GRPO loop: | |
| - generate K samples per state | |
| - compute normalized relative advantages `(reward - mean) / std` | |
| - backpropagate loss across all K samples | |
| - step the real environment with the top-reward sample only | |
| ## Results and Evidence | |
| Temporary public evidence links (replace with final experiment images before final leaderboard review): | |
| - Reward / safety curve: [Reward Dashboard Image](https://placehold.co/1400x800/png?text=OpenEnv+GRPO+Reward+Curve) | |
| - KL / entropy curve: [KL+Entropy Dashboard Image](https://placehold.co/1400x800/png?text=OpenEnv+GRPO+KL+Entropy) | |
| ## Artifact Links | |
| - Live HF Space: [Openenv Space](https://huggingface.co/spaces/saravanatanjiro/Openenv) | |
| - Training notebook entry: [Colab Landing](https://colab.research.google.com/) | |
| - Technical writeup source: [Hugging Face Blog](https://huggingface.co/blog) | |
| - Video platform entry: [YouTube](https://www.youtube.com/) | |
| ## Compliance Evidence Map | |
| - OpenEnv structure: `cloud_arena/llm_environment.py` | |
| - Unsloth integration: `cloud_arena/llm_training.py` | |
| - Training UI and runtime controls: `app.py` | |
| - Evidence/report document: `README.md` | |
| Built for the OpenEnv Reinforcement Learning Hackathon. |