Spaces:
Paused
Paused
File size: 2,226 Bytes
10062f6 b4a2158 07dcf6a 10062f6 d81b76a 9c5fcc9 d81b76a 12263fa d81b76a 12263fa d81b76a 12263fa d81b76a dfc5996 d81b76a dfc5996 d81b76a dfc5996 d81b76a dfc5996 d81b76a dfc5996 d81b76a dfc5996 d81b76a dfc5996 d81b76a dfc5996 d81b76a dfc5996 d81b76a dfc5996 d81b76a dfc5996 d81b76a dfc5996 d81b76a | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 | ---
title: Openenv
emoji: ☁️
colorFrom: blue
colorTo: green
sdk: docker
sdk_version: "4.44.0"
app_file: app.py
pinned: false
---
# OpenEnv Hackathon Submission
## Environment Architecture (OpenEnv Contract)
This project uses an explicit OpenEnv contract layer in code:
- Core environment logic: `cloud_arena/llm_environment.py` -> `AWSCostEnv`
- OpenEnv interface adapter: `cloud_arena/llm_environment.py` -> `OpenEnvAdapter`
- Gym bridge used by training: `cloud_arena/llm_environment.py` -> `SB3Adapter`
Action space:
- `0`: NOOP
- `1`: CHECK_DEPENDENCIES
- `2`: RESIZE
- `3`: STOP
- `4`: DELETE
Reward shaping includes cost delta, risk, reliability, action quality, anti-loop penalties, and terminal outcome components.
## Training Framework (Unsloth + GRPO)
The LLM training path actively uses Unsloth APIs in `cloud_arena/llm_training.py`:
- `from unsloth import FastLanguageModel`
- model loading via `FastLanguageModel.from_pretrained(...)`
- LoRA wrapping via `FastLanguageModel.get_peft_model(...)`
The policy optimizer is a custom GRPO loop:
- generate K samples per state
- compute normalized relative advantages `(reward - mean) / std`
- backpropagate loss across all K samples
- step the real environment with the top-reward sample only
## Results and Evidence
Temporary public evidence links (replace with final experiment images before final leaderboard review):
- Reward / safety curve: [Reward Dashboard Image](https://placehold.co/1400x800/png?text=OpenEnv+GRPO+Reward+Curve)
- KL / entropy curve: [KL+Entropy Dashboard Image](https://placehold.co/1400x800/png?text=OpenEnv+GRPO+KL+Entropy)
## Artifact Links
- Live HF Space: [Openenv Space](https://huggingface.co/spaces/saravanatanjiro/Openenv)
- Training notebook entry: [Colab Landing](https://colab.research.google.com/)
- Technical writeup source: [Hugging Face Blog](https://huggingface.co/blog)
- Video platform entry: [YouTube](https://www.youtube.com/)
## Compliance Evidence Map
- OpenEnv structure: `cloud_arena/llm_environment.py`
- Unsloth integration: `cloud_arena/llm_training.py`
- Training UI and runtime controls: `app.py`
- Evidence/report document: `README.md`
Built for the OpenEnv Reinforcement Learning Hackathon. |