openenv-clinical-trial / docs /hack_info.md
Roopalgn's picture
docs: add roadmap, README, branch-pr matrix, and formatting pass
b3e0336
|
raw
history blame
9.15 kB

Hackathon Themes and Event Information

Theme #1 - Multi-Agent Interactions

  • Environments for this theme involve cooperation, competition, negotiation, and coalition formation.
  • Learning from these environments enables agents to model beliefs and incentives of others in partially observable settings.
  • This drives theory-of-mind reasoning and emergent strategic behavior.
  • Expected Outcome: an environment that can be used to train multi-agent task handling in an LLM.
  • Example environments: market simulations, compute-allocation negotiations, collaborative puzzle worlds, mixed cooperative/competitive strategy games.
  • Sub-themes with bonus prizes:
    • Fleet AI. Scalable Oversight: environments that train oversight agents to monitor, analyze, and explain behavior of other AI agents in complex multi-agent settings.
    • Halluminate. Multi-Actor Environments: realistic environments where an agent interacts with and manages multiple actors (agents) to discover and achieve a task.

Theme #2 - (Super) Long-Horizon Planning and Instruction Following

  • Build environments that require deep, multi-step reasoning with sparse or delayed rewards.
  • Goal is to enable agents to decompose goals, track state over extended trajectories, and recover from early mistakes.
  • Aim is to move beyond shallow next-token reasoning toward structured planning and durable internal representations.
  • Expected Outcome: an environment that captures and improves LLM behaviour on challenging long-horizon tasks needing long-running sessions beyond context memory limits.
  • Example environments: research-planning simulators, large-scale codebase refactoring tasks, strategic resource management worlds, long-horizon logistics optimization, extremely complicated long-horizon instruction following (e.g., 300 instructions scattered around).
  • Sub-themes with bonus prizes:
    • Scale AI: long-horizon workflows for non-code business use cases in Sales, Project management, or HR and IT.
    • Mercor: an environment with capped/uncapped rewards where frontier model rewards scale with token output.

Theme #3 - World Modeling

3.1 Professional Tasks

  • Develop environments requiring real interaction with tools, APIs, or dynamic systems where models do real hard work instead of exploiting shortcuts.
  • Learning from these environments should enable agents to maintain consistent internal state, update beliefs based on outcomes, and orchestrate multi-step workflows.
  • Goal is to strengthen causal reasoning and persistent world models.
  • Expected Outcome: an environment capturing nuances of a defined partially observable world and improving LLM interaction with it.
  • Example environments: dynamic browser/API ecosystems, enterprise applications, scientific workflow loops (papers -> code -> experiments), economic simulations with feedback, tool-discovery benchmarks.
  • Sub-themes with bonus prizes:
    • Scaler AI Labs. Multi-App RL Environment for Enterprise Workflows: create RL environments to demonstrate complex workflows and business-rule nuances in large enterprises.

3.2 Personalized Tasks

  • Develop environments for real personalized task handling.
  • Example use cases include replying to personal messages, handling dinner/work conflicts, replying to tough emails, and other personal assistant tasks.
  • Expected Outcome: an environment that gives the model a realistic simulation of handling personal tasks, conflicts, and delegations.
  • Example environments: executive assistant meeting planner, dinner and drive planning, email/message replying, shopping, etc.
  • Sub-themes with bonus prizes:
    • Patronus AI. Consumer Workflows with Schema Drift: multi-step consumer workflow environments where schemas, API contracts, and policies/rules change.

Theme #4 - Self-Improvement

  • Focus is to create environments where agents learn to generate new challenges, escalate difficulty, and improve through self-play or adaptive curricula.
  • Instead of optimizing fixed tasks, agents should learn to drive their own capability growth.
  • Objective is recursive skill amplification.
  • Expected Outcome: an environment for improving self-play of an LLM over a defined set of tasks.
  • Example environments: self-play negotiation arenas, auto-generated math/proof tasks, evolving coding competitions, adaptive RL curricula.
  • Sub-themes with bonus prizes:
    • Snorkel AI. Simulated Experts-in-the-Loop: environment that simulates interactions with subject-matter experts with changing requirements/preferences.

Theme #5: Wild Card - Impress Us!

  • If ideas do not fit the boxes above, out-of-the-box tasks are welcome.
  • Submissions should still meaningfully add value to LLM training on a specific task.

Guidelines for Problem Statement

  • It is not mandatory to choose the same problem statement as Round 1.
  • Choose the same problem statement only if it aligns with the provided hackathon themes.
  • You can start working on your problem statement once finalized.
  • Post-training can be done onsite on 25th and 26th when compute credits are provided for HuggingFace.
  • Before onsite, focus on building the environment, agent behaviours, reward model, and evaluating alignment with judging criteria.

Judging Criteria

Minimum requirements

  • Usage of OpenEnv (latest release).
  • Show a minimal training script using Unsloth or HF TRL in Colab.
  • Write a mini-blog on HuggingFace or mini-video on YouTube talking about your submission (< 2 minutes).

First Round Judging Overview

  • Pitch Format: each team has 3 minutes to pitch and 2 minutes for Q&A (5 minutes total).
  • Evaluation criteria:
    • Environment Innovation (40%): Is the environment novel, creative, or challenging? Does it meaningfully test agent behavior?
    • Storytelling (30%): Does the team clearly explain the problem, environment, and agent behavior? Is the demo engaging and easy to follow?
    • Showing Improvement in Rewards (20%): Does the demo show observable training progress (reward curves, metrics, before/after behavior)?
    • Reward and Training Script/Pipeline Setup (10%): Is reward logic coherent, and does the pipeline produce meaningful improvement in agent inference?
  • Each evaluator judges about 10-15 teams and submits scores individually.
  • Cerebral Valley aggregates all judges' scores to determine the top 15 finalist projects.

Team Confirmation Email

  • Hi Roopal Guha Neogi,
  • Your solo/team spot at the Meta PyTorch OpenEnv Hackathon x Scaler School of Technology - Grand Finale is officially confirmed.
  • This email serves as your official team ticket to the finale.

Event details

  • Date: 25-26 April 2026
  • Venue: Scaler School of Technology, Electronic City, Bangalore

Participation category

  • Team of 2

Team members

What to do right now

  • Join the private Discord (MANDATORY): Join here.
  • All major updates and announcements will be shared there first.
  • Check the travel guide: Read Here.
  • Travel guide includes venue details, directions, and nearby stay options.

Important - Entry to Campus

  • You must present this email at entry.
  • Teams/participants without this email will not be allowed on campus.
  • Going forward, all communication will be shared only with the team leader.

Please carry for verification

  • A valid government-issued ID.
  • Your college/company ID used during registration.

Entry policy notes

  • Entry will not be permitted if details do not match registration.
  • All team members must be individually registered in the system.
  • New/unregistered members added to travel details will not be allowed on campus.
  • Organisers reserve the right to deny entry if verification criteria are not met.

Round 2 Theme Reveal Summary

  • Multi-Agent Interactions
  • Long-Horizon Planning and Instruction Following
  • World Modeling across professional and personal tasks
  • Self-Improving agent systems

These themes reflect real-world AI environment design and agent behavior that the hackathon evaluates.

Submission Design Expectations

  • Choose one or more themes and design your own problem statement.
  • Simulate realistic scenarios, enable meaningful agent interaction, and support measurable outcomes.

As part of submission, clearly define:

  • The problem statement
  • The environment in which the agent(s) operate
  • The capabilities of the agent(s)
  • The tasks to be performed
  • The reward model/evaluation logic
  • The post-training or self-improvement strategy

Recommendation for High Scores

  • Define clear, structured tasks and environments.
  • Incorporate robust evaluation and reward mechanisms.
  • Reflect real-world complexity aligned with OpenEnv principles.

Immediate Next Step

  • Begin refining design and evaluation right away.
  • Training and implementation happen onsite with provided compute credits.