Spaces:

Akshaykumarbm
/

scheduling_env

Sleeping

App Files Files Community

scheduling_env / docs /hackathon-guide-rl-environments.md

Akshaykumarbm

Upload folder using huggingface_hub

7bdbe90 verified about 1 month ago

preview code

raw

history blame contribute delete

7.12 kB

Building RL Environments for Hackathon: Complete Guide

Overview

This guide provides comprehensive insights for building real-world Reinforcement Learning (RL) environments using the OpenM (Open Environment) library for hackathon participation.

1. Fundamentals of Reinforcement Learning

The Mechanism

How it Works: Model generates candidate implementations (actions) → Environment verifies/tests → Environment provides reward signal (score) based on pre-defined rubrics
Purpose: Tells the model what is good or bad through trial and error rather than long-context prompts

Position in Training Pipeline

Typically follows Supervised Fine-Tuning (SFT)
Used to "squeeze out" final performance gains on specific capabilities
More efficient alternative to "in-context learning" (which degrades with longer prompts)

Key Challenges

Reward Hacking

Models learn to "game" the verifier to get high scores without actually solving the task
Mitigation: Inspect output trajectories or use multiple reward functions

Curriculum Learning

Start with easy tasks and build complexity progressively
Ensures model receives consistent reward signal
Prevents "wasting compute" on tasks that are too difficult initially

2. Introduction to OpenM

What is OpenM?

Collaborative project between Meta, Hugging Face, and others
Standardizes RL environments (like Hugging Face standardized language models)
Single, consistent API for environments
Interoperable with training frameworks (TRL, Unsloth, etc.)

Core Components

Standard OpenM environment requires defining:

Actions (as Pydantic objects)
Observations (as Pydantic objects)
States (as Pydantic objects)

3. Technical Implementation

CLI Workflow

# Initialize skeleton environment
openm init

# Validate setup
openm validate

# Deploy to Hugging Face Spaces
openm push

Agent Integration

Use coding agents (like Codeex) with OpenM "skills"
Automatically generate environment code from prompts

Deployment

Environments deployed as Docker containers on Hugging Face
Provides web interface for manual testing and debugging
Important: Dockerfile must be moved outside /server folder to main project directory

4. Hackathon Requirements

Environment Quality

Real-World Focus (Critical)

Must build: Real-world task environments (healthcare, email triage, code optimization)
Avoid: "Toy" environments, games (Wordle, Connect 4, etc.)
Goal: Environment that could realistically be used in model's post-training RL run

Complexity Requirements

Map long-running tasks with multiple trajectories/routes
Agent should have various possible approaches to solve the task

Technical Requirements

Mandatory Inference Script

Required for every submission
Used by organizers to evaluate environment effectiveness
Measures how well environment provides rewards to model

API Configuration

No OpenAI API key required
Use Hugging Face token instead
Use provided HF Router (API base URL) for model calls
HF Router handles model calls through Hugging Face

Docker Setup

Move Dockerfile outside /server folder to main project directory
Run openm validate before submission

Reward Signal Design

Requirements

Score typically between 0 and 1
Must deliver valid signal indicating "good" or "bad" performance
Grading Diversity: Must not return same score every time
Should distinguish between different performance levels

Best Practices

Start with achievable tasks for the model
Ensure task is feasible but challenging
Avoid tasks too difficult or out-of-distribution for the model

5. Grading Criteria

Evaluation based on:

Utility of the Idea
- How useful is the task for real-world AI?
- Does it represent authentic human tasks?
Quality of the Grader
- Returns diverse scores (not same score every time)
- Value between 0 and 1
- Distinguishes performance levels
Technical Design
- Environment architecture and implementation
- Successful execution of inference script
Novelty
- Key criterion for high scores
- Create something not thought of yet
- Solve problems in unique domains
- Plagiarism is strictly prohibited

6. Submission Guidelines

Deadline

Round One: April 8th

Submission Process

Push environment to Hugging Face Spaces using openm push
Submit URL of Hugging Face Space
Multiple submissions allowed (latest accurate submission used)

Collaboration

Teams are highly encouraged
Helps manage technical and creative requirements

7. High-Value Environment Ideas

Healthcare Domain

Medical triage tools
Navigating medical records
Healthcare-specific software tool utilization

Productivity and Operations

Email Triage: Prioritize, categorize, respond to complex inbox
Calendar Management: Coordinate schedules, handle conflicts across multiple participants

Technical and Code Optimization

Kernel Optimization: Benchmark and optimize PyTorch/GPU kernels for speed and efficiency
Repository Maintenance: Navigate GitHub to identify/fix bugs, run test suites

Logistics and Travel

Complex Flight Booking: Navigate changing availability, multi-leg transfers, request missing information from users

API and Tool Integration

Wide set of real-world tools
Interactive APIs that agents must learn to use correctly

8. Best Practices Summary

Do's

Focus on real-world utility
Design long-running, multi-trajectory tasks
Implement diverse grading systems
Start with curriculum learning approach
Validate thoroughly before submission
Work in teams for better results
Aim for novelty and uniqueness

Don'ts

Avoid toy environments or games
Don't create tasks too difficult for models
Don't implement single-score graders
Avoid plagiarism
Don't submit without testing inference script
Don't use tasks without clear reward signals

9. Technical Checklist

Initialize project with openm init
Define Actions, Observations, States as Pydantic objects
Implement diverse reward function (0-1 range)
Create mandatory inference script
Configure HF token and router (not OpenAI key)
Move Dockerfile to main directory (outside /server)
Run openm validate to verify setup
Test environment locally
Deploy with openm push to Hugging Face Spaces
Submit Hugging Face Space URL before April 8th

Resources

OpenM Library: Standardized RL environment framework
Hugging Face Spaces: Deployment platform
HF Router: API for model access
Training Frameworks: TRL, Unsloth (compatible with OpenM)

This guide synthesizes best practices for building competitive RL environments for hackathons. Focus on real-world utility, technical excellence, and novel approaches for the best results.