# Scaler School of Technology — Meta PyTorch Hackathon ## OpenEnv Hackathon Dashboard **URL:** https://www.scaler.com/school-of-technology/meta-pytorch-hackathon/dashboard#form --- ## Timeline | Stage | Dates | |--------------|--------------------------| | Registration | 14th March – 3rd April | | Declaration | Before Round 1 | | Prepare | Now – 25th March | | Round 1 | 25th March – 8th April | | Results | 10th April | | Finals | 25th – 26th (April) | --- ## Community - **Discord:** Join the Discord Community — all announcements, mentor access, and team matching happens here. --- ## Participation - Currently registered as **Solo Warrior** - Locked for Round 1 — cannot switch to a team until Round 1 is over. --- ## Problem Statement ### The Task > Build a complete, real-world OpenEnv environment that an AI agent can learn from through the standard `step()` / `reset()` / `state()` API. --- ## Key Requirements at a Glance - Must simulate a real-world task (not games or toys) - Implement full OpenEnv spec: typed models, `step()`/`reset()`/`state()`, `openenv.yaml` - Minimum 3 tasks with agent graders (easy → medium → hard, scores 0.0–1.0) - Meaningful reward function with partial progress signals - Baseline inference script with reproducible scores - Deploy to Hugging Face Spaces + working Dockerfile - README with environment description, action/observation spaces, setup instructions --- ## Detailed Requirements ### Real-world task simulation The environment must simulate a task humans actually do. Not games, not toys. Examples: email triage, code review, data cleaning, scheduling, customer support, content moderation. ### OpenEnv spec compliance Implement the full OpenEnv interface: - Typed `Observation`, `Action`, and `Reward` Pydantic models - `step(action)` → returns observation, reward, done, info - `reset()` → returns initial observation - `state()` → returns current state - `openenv.yaml` with metadata - Tested via `openenv validate` ### Minimum 3 tasks with agent graders Each task defines a concrete objective an agent must accomplish, with a programmatic grader that scores performance (0.0–1.0). Tasks should range: easy → medium → hard. Graders must have clear, deterministic success/failure criteria. ### Meaningful reward function Provides signal over the full trajectory (not just binary end-of-episode). Rewards partial progress toward task completion. Penalizes clearly undesirable behavior (e.g. infinite loops, destructive actions). ### Baseline inference script Uses the OpenAI API client to run a model against the environment. Reads API credentials from environment variables (`OPENAI_API_KEY`). Produces a reproducible baseline score on all 3 tasks. --- ## Non-Functional Requirements ### Deploys to a Hugging Face Space Environment must run as a containerized HF Space tagged with `openenv`. Must include a working Dockerfile. The environment should start cleanly with `docker build` + `docker run`. ### Documentation README must include: - Environment description and motivation - Action and observation space definitions - Task descriptions with expected difficulty - Setup and usage instructions - Baseline scores --- ## Evaluation Criteria | Parameter | Weight | Description | |----------------------------|--------|-----------------------------------------------------------------------------| | Real-world utility | 30% | Does the environment model a genuine task? Would someone actually use this to train or evaluate agents? | | Task & grader quality | 25% | Are tasks well-defined with clear objectives? Do graders accurately and fairly measure success? Meaningful difficulty progression? | | Environment design | 20% | Clean state management, sensible action/observation spaces, good reward shaping, proper episode boundaries | | Code quality & spec compliance | 15% | Follows OpenEnv spec, clean project structure, typed models, documented, tested, Dockerfile works | | Creativity & novelty | 10% | Novel problem domain, interesting mechanics, clever reward design, original approach | ### Scoring Breakdown (Real-world utility) - **0–5:** Toy/artificial problem with no practical application - **6–15:** Valid domain but shallow modeling of the real task - **16–25:** Good domain modeling, would be useful for agent evaluation - **26–30:** Excellent — fills a real gap, immediate value for the RL/agent community ### Scoring Checklist Questions **Task & grader quality:** - 3+ tasks with difficulty range? - Graders produce scores between 0.0–1.0? - Graders deterministic and reproducible? - Hard task genuinely challenges frontier models? **Environment design:** - `reset()` produces clean state? - Action/observation types well-designed and documented? - Reward function provides useful varying signal (not just sparse)? - Episode boundaries sensible? **Code quality & spec compliance:** - `openenv validate` passes? - `docker build && docker run` works? - HF Space deploys and responds? - Baseline script runs and reproduces scores? **Creativity & novelty:** - Domain we haven't seen in OpenEnv before? - Reward design has interesting properties? - Clever mechanics that make the environment engaging? --- ## How Judging Works - **Phase 1 — Automated Validation:** Pass/fail gate — HF Space deploys, OpenEnv spec compliance, Dockerfile builds, baseline reproduces, 3+ tasks with graders. - **Phase 2 — Agentic Evaluation:** Scored — baseline agent re-run, standard Open LLM agent (e.g. Nemotron 3 Super) run against all environments, score variance check. - **Phase 3 — Human Review:** Top submissions reviewed by Meta and Hugging Face engineers for real-world utility, creativity, and exploit checks. --- ## Disqualification Criteria - Environment does not deploy or respond - Plagiarized or trivially modified existing environments - Graders that always return the same score - No baseline inference script --- ## Pre-Submission Checklist — all must pass or you're disqualified | Check | Requirement | |--------------------------|-------------------------------------------------------------------------------------------------| | HF Space deploys | Automated ping to the Space URL — must return 200 and respond to `reset()` | | OpenEnv spec compliance | Validate `openenv.yaml`, typed models, `step()`/`reset()`/`state()` endpoints | | Dockerfile builds | Automated docker build on the submitted repo | | Baseline reproduces | Run the submitted inference script — must complete without error and produce scores | | 3+ tasks with graders | Enumerate tasks, run each grader, verify scores in 0.0–1.0 range | | Infra Restrictions | Runtime of inference script should be less than 20 min. Must run on vcpu=2, memory=8gb | | Validator | Run the pre-submission validation script before submitting | ### Mandatory Additional Instructions Before submitting, ensure the following variables are defined in your environment configuration: | Variable | Description | |-----------------|------------------------------------------| | `API_BASE_URL` | The API endpoint for the LLM | | `MODEL_NAME` | The model identifier to use for inference | | `HF_TOKEN` | Your Hugging Face / API key | - The inference script must be named `inference.py` and placed in the root directory of the project. - Participants must use the OpenAI Client for all LLM calls using the above variables. - Participants must emit structured stdout logs strictly following the `[START]`, `[STEP]`, and `[END]` format defined in the sample inference script. Any deviation in field names, ordering, or formatting will result in incorrect evaluation scoring. Refer to [`sample_inference.py`](./sample_inference.py) for the complete format specification and examples. ### Infra Restrictions - Runtime of inference script must be less than 20 minutes. - Ensure your env and inference can run on a machine with `vcpu=2`, `memory=8gb`. ### Validator Run the pre-submission validation script at [`pre_validate.sh`](./pre_validate.sh) before submitting. ### Sample Inference Script See [`sample_inference.py`](./sample_inference.py) for the complete example, including the mandatory `[START]`, `[STEP]`, and `[END]` structured log format. --- ## Submission - **Submission window opens:** 28th March - **Deadline:** 8 April 2026, 11:59 PM IST ### Step 1 Choose solo or team before you can start the assessment. ### Step 2 Complete Step 1 first. Problem Statement is live. Build and submit. --- ## Study Material **4 modules · ~3.5 hours** Each module: read the README first, then open the notebook in Colab. No local setup needed. ### Module 1 — Essential for Round 1 (45 min) **What you'll do:** Connect to 3 real AI environments hosted online — an Echo bot, a Catch game, and Wordle — and interact with each using the exact same code pattern. ### Module 2 — Essential for Round 1 (50 min) **What you'll do:** Write 4 different game-playing strategies for a Catch game, run a competition between them, then switch to a completely different game using the same code. ### Module 3 — Essential for Round 1 (45 min) **What you'll do:** Clone an existing environment, modify it, run it on your machine, then deploy your version live to Hugging Face Spaces with one command. ### Module 4 — Most Important for Round 1 **What you'll do:** Build a complete word-guessing game environment from scratch — define the rules, implement the logic, test it locally, and deploy it live. About 100 lines of real code. - View full course repository --- ## Guide ### What to Expect Example of what a problem statement looks like: > "Build a mini-game RL environment with clearly defined tasks, automated graders, and deploy it live to Hugging Face Spaces." ### Prerequisites (from Step 1 assessment) - Write graders that verify task completion - Define reward logic for scoring - Package using OpenEnv for automated evaluation **Install before April 1st:** | Tool | Requirement | Command | |-----------------------|--------------------------------------|----------------------------------------------| | Python 3.10+ | Install 3.10, 3.11, or 3.12 | `python --version` | | Git + GitHub account | Push your submission to GitHub or HF | `git --version` | | Hugging Face CLI | Deploy to HF Spaces | `pip install huggingface_hub` | | | | `huggingface-cli login` | | OpenEnv | The framework | `pip install openenv-core` | | Google Colab | Prep course runs in Colab (free tier works) | colab.research.google.com | | Docker | Isolated container testing | `docker --version` | | VS Code (Recommended) | Best Python + Docker support | | ### Step 1 Evaluation Criteria | Criteria | Standard | |-----------------------|---------------------------------| | Runtime correctness | Runs without errors | | Interface compliance | Follows OpenEnv standard | | Task design | Clear, realistic, testable | | Grading logic | Reward system makes sense | ### How to Submit When Round 1 starts on 1 April: **Step 1 — Application Form** Choose your problem domain. The task is open-ended — build any real-world OpenEnv environment that a human would actually do. **Step 2 — Scaffold** ```bash openenv init my_env ``` Generate project structure. **Step 3 — Build** Define your environment in the generated files. **Step 4 — Test locally** ```bash uv run server ``` **Step 5 — Deploy** ```bash openenv push --repo-id your-username/my-env ``` **Step 6 — Submit** Paste your HF Spaces URL on the platform before the deadline. - Submission window opens 28th March - Deadline: 8 April 2026, 11:59 PM IST > **Note:** Only team leaders can make the final submission. > **Note:** The Guide above references "4–5 problem statements" — this is outdated. Round 1 is open-ended. There is no fixed list of problem statements to choose from. Build any real-world environment that a human would actually do (e.g. email triage, code review, data cleaning). The requirements and evaluation criteria remain the same. --- ## FAQs ### How does the team/solo declaration work? If you choose to compete solo, you will participate individually for Round 1. If you form a team (2–3 members), only the Team Lead fills out the team formation form before the Round 1 assessment window opens and adds teammates using their registered email IDs. Once a team is confirmed, it cannot be changed. Note: Since Round 2 is a 48-hour in-person hackathon, solo participants who qualify will be matched with other qualifying participants to form teams for the final round. ### Who should fill the team form? Only the team lead completes the team registration form. Teammates do not need to fill out anything at this stage. Once the Team Lead submits the form, listed members will receive an invite on their dashboards. The team will be reflected on their dashboards only after they accept the invite. ### What if someone already added me to their team? This will only happen once you accept their invite; your dashboard will then automatically update to reflect the team you have joined. After confirmation, you will not be able to switch to solo mode or join/form another team. Team assignments are permanent once confirmed. ### Can I change my team or switch to solo after confirming? No. Teams are permanent once confirmed, no changes are allowed. Solo declarations are locked for Round 1. A confirmation prompt is shown before submission, so please review carefully before proceeding. ### Do I need to complete the prep course? While not mandatory, it is strongly recommended. ### What happens during Round 1? You will select one problem statement from a set of challenges and build an RL environment using the OpenEnv framework. ### Can I update my submission? Yes. You may update your submission multiple times until the Round 1 deadline (5th April, 11:59 PM IST). Only the latest submission will be evaluated. ### How are submissions evaluated? Round 1 uses an LLM-based evaluator with structured rubrics. The finale includes LLM screening, manual review, and judging by Meta's global team. Evaluation criteria include runtime correctness, OpenEnv interface compliance, task design quality, grading logic, and overall code quality. ### What framework must be used? All environments must be built using the OpenEnv framework by Meta and Hugging Face. ### What happens after Round 1? Results will be announced on 10 April. The top 3,000 teams will advance to the Grand Finale, a 48-hour on-campus hackathon at Scaler School of Technology, Bangalore (25th–26th April). ### What do I need to submit? A public GitHub repository with your environment code, a `requirements.txt`, a demo script, and a README. A deployed Hugging Face Spaces URL showcasing your working demo. ### Where can I get help? Join the Discord community for announcements and support. For account or registration issues, email: help_openenvhackathon@scaler.com --- ## Support **Need help? Reach out to us:** - Email: help_openenvhackathon@scaler.com