openenv-clinical-trial / docs /hack_info.md
Roopalgn's picture
docs: add roadmap, README, branch-pr matrix, and formatting pass
b3e0336
|
raw
history blame
9.15 kB
# Hackathon Themes and Event Information
## Theme #1 - Multi-Agent Interactions
- Environments for this theme involve cooperation, competition, negotiation, and coalition formation.
- Learning from these environments enables agents to model beliefs and incentives of others in partially observable settings.
- This drives theory-of-mind reasoning and emergent strategic behavior.
- **Expected Outcome:** an environment that can be used to train multi-agent task handling in an LLM.
- **Example environments:** market simulations, compute-allocation negotiations, collaborative puzzle worlds, mixed cooperative/competitive strategy games.
- **Sub-themes with bonus prizes:**
- **Fleet AI. Scalable Oversight:** environments that train oversight agents to monitor, analyze, and explain behavior of other AI agents in complex multi-agent settings.
- **Halluminate. Multi-Actor Environments:** realistic environments where an agent interacts with and manages multiple actors (agents) to discover and achieve a task.
## Theme #2 - (Super) Long-Horizon Planning and Instruction Following
- Build environments that require deep, multi-step reasoning with sparse or delayed rewards.
- Goal is to enable agents to decompose goals, track state over extended trajectories, and recover from early mistakes.
- Aim is to move beyond shallow next-token reasoning toward structured planning and durable internal representations.
- **Expected Outcome:** an environment that captures and improves LLM behaviour on challenging long-horizon tasks needing long-running sessions beyond context memory limits.
- **Example environments:** research-planning simulators, large-scale codebase refactoring tasks, strategic resource management worlds, long-horizon logistics optimization, extremely complicated long-horizon instruction following (e.g., 300 instructions scattered around).
- **Sub-themes with bonus prizes:**
- **Scale AI:** long-horizon workflows for non-code business use cases in Sales, Project management, or HR and IT.
- **Mercor:** an environment with capped/uncapped rewards where frontier model rewards scale with token output.
## Theme #3 - World Modeling
### 3.1 Professional Tasks
- Develop environments requiring real interaction with tools, APIs, or dynamic systems where models do real hard work instead of exploiting shortcuts.
- Learning from these environments should enable agents to maintain consistent internal state, update beliefs based on outcomes, and orchestrate multi-step workflows.
- Goal is to strengthen causal reasoning and persistent world models.
- **Expected Outcome:** an environment capturing nuances of a defined partially observable world and improving LLM interaction with it.
- **Example environments:** dynamic browser/API ecosystems, enterprise applications, scientific workflow loops (papers -> code -> experiments), economic simulations with feedback, tool-discovery benchmarks.
- **Sub-themes with bonus prizes:**
- **Scaler AI Labs. Multi-App RL Environment for Enterprise Workflows:** create RL environments to demonstrate complex workflows and business-rule nuances in large enterprises.
### 3.2 Personalized Tasks
- Develop environments for real personalized task handling.
- Example use cases include replying to personal messages, handling dinner/work conflicts, replying to tough emails, and other personal assistant tasks.
- **Expected Outcome:** an environment that gives the model a realistic simulation of handling personal tasks, conflicts, and delegations.
- **Example environments:** executive assistant meeting planner, dinner and drive planning, email/message replying, shopping, etc.
- **Sub-themes with bonus prizes:**
- **Patronus AI. Consumer Workflows with Schema Drift:** multi-step consumer workflow environments where schemas, API contracts, and policies/rules change.
## Theme #4 - Self-Improvement
- Focus is to create environments where agents learn to generate new challenges, escalate difficulty, and improve through self-play or adaptive curricula.
- Instead of optimizing fixed tasks, agents should learn to drive their own capability growth.
- Objective is recursive skill amplification.
- **Expected Outcome:** an environment for improving self-play of an LLM over a defined set of tasks.
- **Example environments:** self-play negotiation arenas, auto-generated math/proof tasks, evolving coding competitions, adaptive RL curricula.
- **Sub-themes with bonus prizes:**
- **Snorkel AI. Simulated Experts-in-the-Loop:** environment that simulates interactions with subject-matter experts with changing requirements/preferences.
## Theme #5: Wild Card - Impress Us!
- If ideas do not fit the boxes above, out-of-the-box tasks are welcome.
- Submissions should still meaningfully add value to LLM training on a specific task.
## Guidelines for Problem Statement
- It is **not mandatory** to choose the same problem statement as Round 1.
- Choose the same problem statement only if it aligns with the provided hackathon themes.
- You can start working on your problem statement once finalized.
- Post-training can be done onsite on 25th and 26th when compute credits are provided for HuggingFace.
- Before onsite, focus on building the environment, agent behaviours, reward model, and evaluating alignment with judging criteria.
## Judging Criteria
### Minimum requirements
- Usage of OpenEnv (latest release).
- Show a minimal training script using Unsloth or HF TRL in Colab.
- Write a mini-blog on HuggingFace or mini-video on YouTube talking about your submission (< 2 minutes).
### First Round Judging Overview
- **Pitch Format:** each team has 3 minutes to pitch and 2 minutes for Q&A (5 minutes total).
- **Evaluation criteria:**
- **Environment Innovation (40%):** Is the environment novel, creative, or challenging? Does it meaningfully test agent behavior?
- **Storytelling (30%):** Does the team clearly explain the problem, environment, and agent behavior? Is the demo engaging and easy to follow?
- **Showing Improvement in Rewards (20%):** Does the demo show observable training progress (reward curves, metrics, before/after behavior)?
- **Reward and Training Script/Pipeline Setup (10%):** Is reward logic coherent, and does the pipeline produce meaningful improvement in agent inference?
- Each evaluator judges about 10-15 teams and submits scores individually.
- Cerebral Valley aggregates all judges' scores to determine the top 15 finalist projects.
## Team Confirmation Email
- Hi Roopal Guha Neogi,
- Your solo/team spot at the Meta PyTorch OpenEnv Hackathon x Scaler School of Technology - Grand Finale is officially confirmed.
- This email serves as your official team ticket to the finale.
### Event details
- **Date:** 25-26 April 2026
- **Venue:** Scaler School of Technology, Electronic City, Bangalore
### Participation category
- Team of 2
### Team members
- **Team Member 1 (Team Leader):**
- Name: Roopal Guha Neogi
- Email: roopal.guhaneogi@gmail.com
- **Team Member 2:**
- Name: Suyash Kumar
- Email: suyashk102@gmail.com
### What to do right now
- Join the private Discord (MANDATORY): Join here.
- All major updates and announcements will be shared there first.
- Check the travel guide: Read Here.
- Travel guide includes venue details, directions, and nearby stay options.
### Important - Entry to Campus
- You must present this email at entry.
- Teams/participants without this email will not be allowed on campus.
- Going forward, all communication will be shared only with the team leader.
### Please carry for verification
- A valid government-issued ID.
- Your college/company ID used during registration.
### Entry policy notes
- Entry will not be permitted if details do not match registration.
- All team members must be individually registered in the system.
- New/unregistered members added to travel details will not be allowed on campus.
- Organisers reserve the right to deny entry if verification criteria are not met.
## Round 2 Theme Reveal Summary
- Multi-Agent Interactions
- Long-Horizon Planning and Instruction Following
- World Modeling across professional and personal tasks
- Self-Improving agent systems
These themes reflect real-world AI environment design and agent behavior that the hackathon evaluates.
## Submission Design Expectations
- Choose one or more themes and design your own problem statement.
- Simulate realistic scenarios, enable meaningful agent interaction, and support measurable outcomes.
As part of submission, clearly define:
- The **problem statement**
- The **environment** in which the agent(s) operate
- The **capabilities** of the agent(s)
- The **tasks** to be performed
- The **reward model/evaluation logic**
- The **post-training or self-improvement strategy**
## Recommendation for High Scores
- Define clear, structured tasks and environments.
- Incorporate robust evaluation and reward mechanisms.
- Reflect real-world complexity aligned with OpenEnv principles.
## Immediate Next Step
- Begin refining design and evaluation right away.
- Training and implementation happen onsite with provided compute credits.