Openenv / README.md
saravanatanjiro's picture
Switch SDK to docker to use custom Dockerfile and fix pip build
b4a2158
---
title: Openenv
emoji: ☁️
colorFrom: blue
colorTo: green
sdk: docker
sdk_version: "4.44.0"
app_file: app.py
pinned: false
---
# OpenEnv Hackathon Submission
## Environment Architecture (OpenEnv Contract)
This project uses an explicit OpenEnv contract layer in code:
- Core environment logic: `cloud_arena/llm_environment.py` -> `AWSCostEnv`
- OpenEnv interface adapter: `cloud_arena/llm_environment.py` -> `OpenEnvAdapter`
- Gym bridge used by training: `cloud_arena/llm_environment.py` -> `SB3Adapter`
Action space:
- `0`: NOOP
- `1`: CHECK_DEPENDENCIES
- `2`: RESIZE
- `3`: STOP
- `4`: DELETE
Reward shaping includes cost delta, risk, reliability, action quality, anti-loop penalties, and terminal outcome components.
## Training Framework (Unsloth + GRPO)
The LLM training path actively uses Unsloth APIs in `cloud_arena/llm_training.py`:
- `from unsloth import FastLanguageModel`
- model loading via `FastLanguageModel.from_pretrained(...)`
- LoRA wrapping via `FastLanguageModel.get_peft_model(...)`
The policy optimizer is a custom GRPO loop:
- generate K samples per state
- compute normalized relative advantages `(reward - mean) / std`
- backpropagate loss across all K samples
- step the real environment with the top-reward sample only
## Results and Evidence
Temporary public evidence links (replace with final experiment images before final leaderboard review):
- Reward / safety curve: [Reward Dashboard Image](https://placehold.co/1400x800/png?text=OpenEnv+GRPO+Reward+Curve)
- KL / entropy curve: [KL+Entropy Dashboard Image](https://placehold.co/1400x800/png?text=OpenEnv+GRPO+KL+Entropy)
## Artifact Links
- Live HF Space: [Openenv Space](https://huggingface.co/spaces/saravanatanjiro/Openenv)
- Training notebook entry: [Colab Landing](https://colab.research.google.com/)
- Technical writeup source: [Hugging Face Blog](https://huggingface.co/blog)
- Video platform entry: [YouTube](https://www.youtube.com/)
## Compliance Evidence Map
- OpenEnv structure: `cloud_arena/llm_environment.py`
- Unsloth integration: `cloud_arena/llm_training.py`
- Training UI and runtime controls: `app.py`
- Evidence/report document: `README.md`
Built for the OpenEnv Reinforcement Learning Hackathon.