Spaces:
Sleeping
Sleeping
π§ OpenEnv Hackathon β Judging & Expectations Guide
π¨ TL;DR (What You Actually Need to Do)
Build an environment where an LLM can train and measurably improve at something meaningful, then:
- Show actual training
- Provide evidence (metrics, reward curves, comparisons)
- Tell a clear, compelling story
A messy but ambitious project with real training evidence beats a polished but shallow one.
βοΈ Judging Criteria (Core Evaluation)
1. π Environment Innovation β 40%
- Is your environment novel, creative, or challenging?
- Does it meaningfully test agent behavior?
- Avoid overused ideas (grid worlds, chess clones, etc.)
2. π€ Storytelling & Presentation β 30%
- Clearly explain:
- The problem
- The environment
- What the agent learned
- Demo should be engaging and easy to follow
3. π Showing Improvement in Rewards β 20%
- Must prove learning happened
- Evidence:
- Reward curves
- Before vs after behavior
- Baseline comparisons
4. βοΈ Reward & Training Pipeline β 10%
- Reward logic should be coherent and hard to exploit
- Training should improve agent behavior
π¦ Minimum Submission Requirements
- Use OpenEnv (latest release)
- Provide a working training script (Unsloth or HuggingFace TRL)
- Show training evidence (loss + reward plots)
- Submit:
- Mini-blog OR
- <2 min video OR
- Slides
- Host on Hugging Face Spaces
- Provide a README with problem, environment, results, links
Rules:
- One submission per team
- Submit environment URL
- No changes after deadline
π§ͺ What Judges Look For
π¬ Real Training
- Training must run against your environment
- Show learning with plots, metrics, comparisons
π§ Reward Design
- Dense and informative rewards
- Hard to game
- Avoid simple binary rewards
π Ambitious Problems
- Solve something LLMs struggle with
- Prefer underexplored domains
π Clear Results
- Label axes properly
- Save plots as images
- Show comparisons clearly
π Tell a Story
Your README should answer:
- Problem
- Environment
- Results
- Why it matters
π§Ή Clean Engineering
- Use OpenEnv properly
- Follow Gym API (reset, step, state)
- Maintain clean architecture
π§ Problem Selection Guidelines
- Reuse Round 1 idea only if it fits themes
- Build environment + reward model early
- Ensure alignment with judging criteria
π Final Advice
- Be ambitious
- Show real learning
- Communicate clearly
Judges want projects that push the frontier of LLM training.