Spaces:
Sleeping
Sleeping
Hackathon Themes and Event Information
Theme #1 - Multi-Agent Interactions
- Environments for this theme involve cooperation, competition, negotiation, and coalition formation.
- Learning from these environments enables agents to model beliefs and incentives of others in partially observable settings.
- This drives theory-of-mind reasoning and emergent strategic behavior.
- Expected Outcome: an environment that can be used to train multi-agent task handling in an LLM.
- Example environments: market simulations, compute-allocation negotiations, collaborative puzzle worlds, mixed cooperative/competitive strategy games.
- Sub-themes with bonus prizes:
- Fleet AI. Scalable Oversight: environments that train oversight agents to monitor, analyze, and explain behavior of other AI agents in complex multi-agent settings.
- Halluminate. Multi-Actor Environments: realistic environments where an agent interacts with and manages multiple actors (agents) to discover and achieve a task.
Theme #2 - (Super) Long-Horizon Planning and Instruction Following
- Build environments that require deep, multi-step reasoning with sparse or delayed rewards.
- Goal is to enable agents to decompose goals, track state over extended trajectories, and recover from early mistakes.
- Aim is to move beyond shallow next-token reasoning toward structured planning and durable internal representations.
- Expected Outcome: an environment that captures and improves LLM behaviour on challenging long-horizon tasks needing long-running sessions beyond context memory limits.
- Example environments: research-planning simulators, large-scale codebase refactoring tasks, strategic resource management worlds, long-horizon logistics optimization, extremely complicated long-horizon instruction following (e.g., 300 instructions scattered around).
- Sub-themes with bonus prizes:
- Scale AI: long-horizon workflows for non-code business use cases in Sales, Project management, or HR and IT.
- Mercor: an environment with capped/uncapped rewards where frontier model rewards scale with token output.
Theme #3 - World Modeling
3.1 Professional Tasks
- Develop environments requiring real interaction with tools, APIs, or dynamic systems where models do real hard work instead of exploiting shortcuts.
- Learning from these environments should enable agents to maintain consistent internal state, update beliefs based on outcomes, and orchestrate multi-step workflows.
- Goal is to strengthen causal reasoning and persistent world models.
- Expected Outcome: an environment capturing nuances of a defined partially observable world and improving LLM interaction with it.
- Example environments: dynamic browser/API ecosystems, enterprise applications, scientific workflow loops (papers -> code -> experiments), economic simulations with feedback, tool-discovery benchmarks.
- Sub-themes with bonus prizes:
- Scaler AI Labs. Multi-App RL Environment for Enterprise Workflows: create RL environments to demonstrate complex workflows and business-rule nuances in large enterprises.
3.2 Personalized Tasks
- Develop environments for real personalized task handling.
- Example use cases include replying to personal messages, handling dinner/work conflicts, replying to tough emails, and other personal assistant tasks.
- Expected Outcome: an environment that gives the model a realistic simulation of handling personal tasks, conflicts, and delegations.
- Example environments: executive assistant meeting planner, dinner and drive planning, email/message replying, shopping, etc.
- Sub-themes with bonus prizes:
- Patronus AI. Consumer Workflows with Schema Drift: multi-step consumer workflow environments where schemas, API contracts, and policies/rules change.
Theme #4 - Self-Improvement
- Focus is to create environments where agents learn to generate new challenges, escalate difficulty, and improve through self-play or adaptive curricula.
- Instead of optimizing fixed tasks, agents should learn to drive their own capability growth.
- Objective is recursive skill amplification.
- Expected Outcome: an environment for improving self-play of an LLM over a defined set of tasks.
- Example environments: self-play negotiation arenas, auto-generated math/proof tasks, evolving coding competitions, adaptive RL curricula.
- Sub-themes with bonus prizes:
- Snorkel AI. Simulated Experts-in-the-Loop: environment that simulates interactions with subject-matter experts with changing requirements/preferences.
Theme #5: Wild Card - Impress Us!
- If ideas do not fit the boxes above, out-of-the-box tasks are welcome.
- Submissions should still meaningfully add value to LLM training on a specific task.
Guidelines for Problem Statement
- It is not mandatory to choose the same problem statement as Round 1.
- Choose the same problem statement only if it aligns with the provided hackathon themes.
- You can start working on your problem statement once finalized.
- Post-training can be done onsite on 25th and 26th when compute credits are provided for HuggingFace.
- Before onsite, focus on building the environment, agent behaviours, reward model, and evaluating alignment with judging criteria.
Judging Criteria
Minimum requirements
- Usage of OpenEnv (latest release).
- Show a minimal training script using Unsloth or HF TRL in Colab.
- Write a mini-blog on HuggingFace or mini-video on YouTube talking about your submission (< 2 minutes).
First Round Judging Overview
- Pitch Format: each team has 3 minutes to pitch and 2 minutes for Q&A (5 minutes total).
- Evaluation criteria:
- Environment Innovation (40%): Is the environment novel, creative, or challenging? Does it meaningfully test agent behavior?
- Storytelling (30%): Does the team clearly explain the problem, environment, and agent behavior? Is the demo engaging and easy to follow?
- Showing Improvement in Rewards (20%): Does the demo show observable training progress (reward curves, metrics, before/after behavior)?
- Reward and Training Script/Pipeline Setup (10%): Is reward logic coherent, and does the pipeline produce meaningful improvement in agent inference?
- Each evaluator judges about 10-15 teams and submits scores individually.
- Cerebral Valley aggregates all judges' scores to determine the top 15 finalist projects.
Team Confirmation Email
- Hi Roopal Guha Neogi,
- Your solo/team spot at the Meta PyTorch OpenEnv Hackathon x Scaler School of Technology - Grand Finale is officially confirmed.
- This email serves as your official team ticket to the finale.
Event details
- Date: 25-26 April 2026
- Venue: Scaler School of Technology, Electronic City, Bangalore
Participation category
- Team of 2
Team members
- Team Member 1 (Team Leader):
- Name: Roopal Guha Neogi
- Email: roopal.guhaneogi@gmail.com
- Team Member 2:
- Name: Suyash Kumar
- Email: suyashk102@gmail.com
What to do right now
- Join the private Discord (MANDATORY): Join here.
- All major updates and announcements will be shared there first.
- Check the travel guide: Read Here.
- Travel guide includes venue details, directions, and nearby stay options.
Important - Entry to Campus
- You must present this email at entry.
- Teams/participants without this email will not be allowed on campus.
- Going forward, all communication will be shared only with the team leader.
Please carry for verification
- A valid government-issued ID.
- Your college/company ID used during registration.
Entry policy notes
- Entry will not be permitted if details do not match registration.
- All team members must be individually registered in the system.
- New/unregistered members added to travel details will not be allowed on campus.
- Organisers reserve the right to deny entry if verification criteria are not met.
Round 2 Theme Reveal Summary
- Multi-Agent Interactions
- Long-Horizon Planning and Instruction Following
- World Modeling across professional and personal tasks
- Self-Improving agent systems
These themes reflect real-world AI environment design and agent behavior that the hackathon evaluates.
Submission Design Expectations
- Choose one or more themes and design your own problem statement.
- Simulate realistic scenarios, enable meaningful agent interaction, and support measurable outcomes.
As part of submission, clearly define:
- The problem statement
- The environment in which the agent(s) operate
- The capabilities of the agent(s)
- The tasks to be performed
- The reward model/evaluation logic
- The post-training or self-improvement strategy
Recommendation for High Scores
- Define clear, structured tasks and environments.
- Incorporate robust evaluation and reward mechanisms.
- Reflect real-world complexity aligned with OpenEnv principles.
Immediate Next Step
- Begin refining design and evaluation right away.
- Training and implementation happen onsite with provided compute credits.