# Hackathon Themes and Event Information ## Theme #1 - Multi-Agent Interactions - Environments for this theme involve cooperation, competition, negotiation, and coalition formation. - Learning from these environments enables agents to model beliefs and incentives of others in partially observable settings. - This drives theory-of-mind reasoning and emergent strategic behavior. - **Expected Outcome:** an environment that can be used to train multi-agent task handling in an LLM. - **Example environments:** market simulations, compute-allocation negotiations, collaborative puzzle worlds, mixed cooperative/competitive strategy games. - **Sub-themes with bonus prizes:** - **Fleet AI. Scalable Oversight:** environments that train oversight agents to monitor, analyze, and explain behavior of other AI agents in complex multi-agent settings. - **Halluminate. Multi-Actor Environments:** realistic environments where an agent interacts with and manages multiple actors (agents) to discover and achieve a task. ## Theme #2 - (Super) Long-Horizon Planning and Instruction Following - Build environments that require deep, multi-step reasoning with sparse or delayed rewards. - Goal is to enable agents to decompose goals, track state over extended trajectories, and recover from early mistakes. - Aim is to move beyond shallow next-token reasoning toward structured planning and durable internal representations. - **Expected Outcome:** an environment that captures and improves LLM behaviour on challenging long-horizon tasks needing long-running sessions beyond context memory limits. - **Example environments:** research-planning simulators, large-scale codebase refactoring tasks, strategic resource management worlds, long-horizon logistics optimization, extremely complicated long-horizon instruction following (e.g., 300 instructions scattered around). - **Sub-themes with bonus prizes:** - **Scale AI:** long-horizon workflows for non-code business use cases in Sales, Project management, or HR and IT. - **Mercor:** an environment with capped/uncapped rewards where frontier model rewards scale with token output. ## Theme #3 - World Modeling ### 3.1 Professional Tasks - Develop environments requiring real interaction with tools, APIs, or dynamic systems where models do real hard work instead of exploiting shortcuts. - Learning from these environments should enable agents to maintain consistent internal state, update beliefs based on outcomes, and orchestrate multi-step workflows. - Goal is to strengthen causal reasoning and persistent world models. - **Expected Outcome:** an environment capturing nuances of a defined partially observable world and improving LLM interaction with it. - **Example environments:** dynamic browser/API ecosystems, enterprise applications, scientific workflow loops (papers -> code -> experiments), economic simulations with feedback, tool-discovery benchmarks. - **Sub-themes with bonus prizes:** - **Scaler AI Labs. Multi-App RL Environment for Enterprise Workflows:** create RL environments to demonstrate complex workflows and business-rule nuances in large enterprises. ### 3.2 Personalized Tasks - Develop environments for real personalized task handling. - Example use cases include replying to personal messages, handling dinner/work conflicts, replying to tough emails, and other personal assistant tasks. - **Expected Outcome:** an environment that gives the model a realistic simulation of handling personal tasks, conflicts, and delegations. - **Example environments:** executive assistant meeting planner, dinner and drive planning, email/message replying, shopping, etc. - **Sub-themes with bonus prizes:** - **Patronus AI. Consumer Workflows with Schema Drift:** multi-step consumer workflow environments where schemas, API contracts, and policies/rules change. ## Theme #4 - Self-Improvement - Focus is to create environments where agents learn to generate new challenges, escalate difficulty, and improve through self-play or adaptive curricula. - Instead of optimizing fixed tasks, agents should learn to drive their own capability growth. - Objective is recursive skill amplification. - **Expected Outcome:** an environment for improving self-play of an LLM over a defined set of tasks. - **Example environments:** self-play negotiation arenas, auto-generated math/proof tasks, evolving coding competitions, adaptive RL curricula. - **Sub-themes with bonus prizes:** - **Snorkel AI. Simulated Experts-in-the-Loop:** environment that simulates interactions with subject-matter experts with changing requirements/preferences. ## Theme #5: Wild Card - Impress Us! - If ideas do not fit the boxes above, out-of-the-box tasks are welcome. - Submissions should still meaningfully add value to LLM training on a specific task. ## Guidelines for Problem Statement - It is **not mandatory** to choose the same problem statement as Round 1. - Choose the same problem statement only if it aligns with the provided hackathon themes. - You can start working on your problem statement once finalized. - Post-training can be done onsite on 25th and 26th when compute credits are provided for HuggingFace. - Before onsite, focus on building the environment, agent behaviours, reward model, and evaluating alignment with judging criteria. ## Judging Criteria ### Minimum requirements - Usage of OpenEnv (latest release). - Show a minimal training script using Unsloth or HF TRL in Colab. - Write a mini-blog on HuggingFace or mini-video on YouTube talking about your submission (< 2 minutes). ### First Round Judging Overview - **Pitch Format:** each team has 3 minutes to pitch and 2 minutes for Q&A (5 minutes total). - **Evaluation criteria:** - **Environment Innovation (40%):** Is the environment novel, creative, or challenging? Does it meaningfully test agent behavior? - **Storytelling (30%):** Does the team clearly explain the problem, environment, and agent behavior? Is the demo engaging and easy to follow? - **Showing Improvement in Rewards (20%):** Does the demo show observable training progress (reward curves, metrics, before/after behavior)? - **Reward and Training Script/Pipeline Setup (10%):** Is reward logic coherent, and does the pipeline produce meaningful improvement in agent inference? - Each evaluator judges about 10-15 teams and submits scores individually. - Cerebral Valley aggregates all judges' scores to determine the top 15 finalist projects. ## Team Confirmation Email - Hi Roopal Guha Neogi, - Your solo/team spot at the Meta PyTorch OpenEnv Hackathon x Scaler School of Technology - Grand Finale is officially confirmed. - This email serves as your official team ticket to the finale. ### Event details - **Date:** 25-26 April 2026 - **Venue:** Scaler School of Technology, Electronic City, Bangalore ### Participation category - Team of 2 ### Team members - **Team Member 1 (Team Leader):** - Name: Roopal Guha Neogi - Email: roopal.guhaneogi@gmail.com - **Team Member 2:** - Name: Suyash Kumar - Email: suyashk102@gmail.com ### What to do right now - Join the private Discord (MANDATORY): Join here. - All major updates and announcements will be shared there first. - Check the travel guide: Read Here. - Travel guide includes venue details, directions, and nearby stay options. ### Important - Entry to Campus - You must present this email at entry. - Teams/participants without this email will not be allowed on campus. - Going forward, all communication will be shared only with the team leader. ### Please carry for verification - A valid government-issued ID. - Your college/company ID used during registration. ### Entry policy notes - Entry will not be permitted if details do not match registration. - All team members must be individually registered in the system. - New/unregistered members added to travel details will not be allowed on campus. - Organisers reserve the right to deny entry if verification criteria are not met. ## Round 2 Theme Reveal Summary - Multi-Agent Interactions - Long-Horizon Planning and Instruction Following - World Modeling across professional and personal tasks - Self-Improving agent systems These themes reflect real-world AI environment design and agent behavior that the hackathon evaluates. ## Submission Design Expectations - Choose one or more themes and design your own problem statement. - Simulate realistic scenarios, enable meaningful agent interaction, and support measurable outcomes. As part of submission, clearly define: - The **problem statement** - The **environment** in which the agent(s) operate - The **capabilities** of the agent(s) - The **tasks** to be performed - The **reward model/evaluation logic** - The **post-training or self-improvement strategy** ## Recommendation for High Scores - Define clear, structured tasks and environments. - Incorporate robust evaluation and reward mechanisms. - Reflect real-world complexity aligned with OpenEnv principles. ## Immediate Next Step - Begin refining design and evaluation right away. - Training and implementation happen onsite with provided compute credits.