Spaces:
Paused
Paused
metadata
title: Openenv
emoji: ☁️
colorFrom: blue
colorTo: green
sdk: docker
sdk_version: 4.44.0
app_file: app.py
pinned: false
OpenEnv Hackathon Submission
Environment Architecture (OpenEnv Contract)
This project uses an explicit OpenEnv contract layer in code:
- Core environment logic:
cloud_arena/llm_environment.py->AWSCostEnv - OpenEnv interface adapter:
cloud_arena/llm_environment.py->OpenEnvAdapter - Gym bridge used by training:
cloud_arena/llm_environment.py->SB3Adapter
Action space:
0: NOOP1: CHECK_DEPENDENCIES2: RESIZE3: STOP4: DELETE
Reward shaping includes cost delta, risk, reliability, action quality, anti-loop penalties, and terminal outcome components.
Training Framework (Unsloth + GRPO)
The LLM training path actively uses Unsloth APIs in cloud_arena/llm_training.py:
from unsloth import FastLanguageModel- model loading via
FastLanguageModel.from_pretrained(...) - LoRA wrapping via
FastLanguageModel.get_peft_model(...)
The policy optimizer is a custom GRPO loop:
- generate K samples per state
- compute normalized relative advantages
(reward - mean) / std - backpropagate loss across all K samples
- step the real environment with the top-reward sample only
Results and Evidence
Temporary public evidence links (replace with final experiment images before final leaderboard review):
- Reward / safety curve: Reward Dashboard Image
- KL / entropy curve: KL+Entropy Dashboard Image
Artifact Links
- Live HF Space: Openenv Space
- Training notebook entry: Colab Landing
- Technical writeup source: Hugging Face Blog
- Video platform entry: YouTube
Compliance Evidence Map
- OpenEnv structure:
cloud_arena/llm_environment.py - Unsloth integration:
cloud_arena/llm_training.py - Training UI and runtime controls:
app.py - Evidence/report document:
README.md
Built for the OpenEnv Reinforcement Learning Hackathon.