Spaces:
Sleeping
Sleeping
| # Hackathon Themes and Event Information | |
| ## Theme #1 - Multi-Agent Interactions | |
| - Environments for this theme involve cooperation, competition, negotiation, and coalition formation. | |
| - Learning from these environments enables agents to model beliefs and incentives of others in partially observable settings. | |
| - This drives theory-of-mind reasoning and emergent strategic behavior. | |
| - **Expected Outcome:** an environment that can be used to train multi-agent task handling in an LLM. | |
| - **Example environments:** market simulations, compute-allocation negotiations, collaborative puzzle worlds, mixed cooperative/competitive strategy games. | |
| - **Sub-themes with bonus prizes:** | |
| - **Fleet AI. Scalable Oversight:** environments that train oversight agents to monitor, analyze, and explain behavior of other AI agents in complex multi-agent settings. | |
| - **Halluminate. Multi-Actor Environments:** realistic environments where an agent interacts with and manages multiple actors (agents) to discover and achieve a task. | |
| ## Theme #2 - (Super) Long-Horizon Planning and Instruction Following | |
| - Build environments that require deep, multi-step reasoning with sparse or delayed rewards. | |
| - Goal is to enable agents to decompose goals, track state over extended trajectories, and recover from early mistakes. | |
| - Aim is to move beyond shallow next-token reasoning toward structured planning and durable internal representations. | |
| - **Expected Outcome:** an environment that captures and improves LLM behaviour on challenging long-horizon tasks needing long-running sessions beyond context memory limits. | |
| - **Example environments:** research-planning simulators, large-scale codebase refactoring tasks, strategic resource management worlds, long-horizon logistics optimization, extremely complicated long-horizon instruction following (e.g., 300 instructions scattered around). | |
| - **Sub-themes with bonus prizes:** | |
| - **Scale AI:** long-horizon workflows for non-code business use cases in Sales, Project management, or HR and IT. | |
| - **Mercor:** an environment with capped/uncapped rewards where frontier model rewards scale with token output. | |
| ## Theme #3 - World Modeling | |
| ### 3.1 Professional Tasks | |
| - Develop environments requiring real interaction with tools, APIs, or dynamic systems where models do real hard work instead of exploiting shortcuts. | |
| - Learning from these environments should enable agents to maintain consistent internal state, update beliefs based on outcomes, and orchestrate multi-step workflows. | |
| - Goal is to strengthen causal reasoning and persistent world models. | |
| - **Expected Outcome:** an environment capturing nuances of a defined partially observable world and improving LLM interaction with it. | |
| - **Example environments:** dynamic browser/API ecosystems, enterprise applications, scientific workflow loops (papers -> code -> experiments), economic simulations with feedback, tool-discovery benchmarks. | |
| - **Sub-themes with bonus prizes:** | |
| - **Scaler AI Labs. Multi-App RL Environment for Enterprise Workflows:** create RL environments to demonstrate complex workflows and business-rule nuances in large enterprises. | |
| ### 3.2 Personalized Tasks | |
| - Develop environments for real personalized task handling. | |
| - Example use cases include replying to personal messages, handling dinner/work conflicts, replying to tough emails, and other personal assistant tasks. | |
| - **Expected Outcome:** an environment that gives the model a realistic simulation of handling personal tasks, conflicts, and delegations. | |
| - **Example environments:** executive assistant meeting planner, dinner and drive planning, email/message replying, shopping, etc. | |
| - **Sub-themes with bonus prizes:** | |
| - **Patronus AI. Consumer Workflows with Schema Drift:** multi-step consumer workflow environments where schemas, API contracts, and policies/rules change. | |
| ## Theme #4 - Self-Improvement | |
| - Focus is to create environments where agents learn to generate new challenges, escalate difficulty, and improve through self-play or adaptive curricula. | |
| - Instead of optimizing fixed tasks, agents should learn to drive their own capability growth. | |
| - Objective is recursive skill amplification. | |
| - **Expected Outcome:** an environment for improving self-play of an LLM over a defined set of tasks. | |
| - **Example environments:** self-play negotiation arenas, auto-generated math/proof tasks, evolving coding competitions, adaptive RL curricula. | |
| - **Sub-themes with bonus prizes:** | |
| - **Snorkel AI. Simulated Experts-in-the-Loop:** environment that simulates interactions with subject-matter experts with changing requirements/preferences. | |
| ## Theme #5: Wild Card - Impress Us! | |
| - If ideas do not fit the boxes above, out-of-the-box tasks are welcome. | |
| - Submissions should still meaningfully add value to LLM training on a specific task. | |
| ## Guidelines for Problem Statement | |
| - It is **not mandatory** to choose the same problem statement as Round 1. | |
| - Choose the same problem statement only if it aligns with the provided hackathon themes. | |
| - You can start working on your problem statement once finalized. | |
| - Post-training can be done onsite on 25th and 26th when compute credits are provided for HuggingFace. | |
| - Before onsite, focus on building the environment, agent behaviours, reward model, and evaluating alignment with judging criteria. | |
| ## Judging Criteria | |
| ### Minimum requirements | |
| - Usage of OpenEnv (latest release). | |
| - Show a minimal training script using Unsloth or HF TRL in Colab. | |
| - Write a mini-blog on HuggingFace or mini-video on YouTube talking about your submission (< 2 minutes). | |
| ### First Round Judging Overview | |
| - **Pitch Format:** each team has 3 minutes to pitch and 2 minutes for Q&A (5 minutes total). | |
| - **Evaluation criteria:** | |
| - **Environment Innovation (40%):** Is the environment novel, creative, or challenging? Does it meaningfully test agent behavior? | |
| - **Storytelling (30%):** Does the team clearly explain the problem, environment, and agent behavior? Is the demo engaging and easy to follow? | |
| - **Showing Improvement in Rewards (20%):** Does the demo show observable training progress (reward curves, metrics, before/after behavior)? | |
| - **Reward and Training Script/Pipeline Setup (10%):** Is reward logic coherent, and does the pipeline produce meaningful improvement in agent inference? | |
| - Each evaluator judges about 10-15 teams and submits scores individually. | |
| - Cerebral Valley aggregates all judges' scores to determine the top 15 finalist projects. | |
| ## Team Confirmation Email | |
| - Hi Roopal Guha Neogi, | |
| - Your solo/team spot at the Meta PyTorch OpenEnv Hackathon x Scaler School of Technology - Grand Finale is officially confirmed. | |
| - This email serves as your official team ticket to the finale. | |
| ### Event details | |
| - **Date:** 25-26 April 2026 | |
| - **Venue:** Scaler School of Technology, Electronic City, Bangalore | |
| ### Participation category | |
| - Team of 2 | |
| ### Team members | |
| - **Team Member 1 (Team Leader):** | |
| - Name: Roopal Guha Neogi | |
| - Email: roopal.guhaneogi@gmail.com | |
| - **Team Member 2:** | |
| - Name: Suyash Kumar | |
| - Email: suyashk102@gmail.com | |
| ### What to do right now | |
| - Join the private Discord (MANDATORY): Join here. | |
| - All major updates and announcements will be shared there first. | |
| - Check the travel guide: Read Here. | |
| - Travel guide includes venue details, directions, and nearby stay options. | |
| ### Important - Entry to Campus | |
| - You must present this email at entry. | |
| - Teams/participants without this email will not be allowed on campus. | |
| - Going forward, all communication will be shared only with the team leader. | |
| ### Please carry for verification | |
| - A valid government-issued ID. | |
| - Your college/company ID used during registration. | |
| ### Entry policy notes | |
| - Entry will not be permitted if details do not match registration. | |
| - All team members must be individually registered in the system. | |
| - New/unregistered members added to travel details will not be allowed on campus. | |
| - Organisers reserve the right to deny entry if verification criteria are not met. | |
| ## Round 2 Theme Reveal Summary | |
| - Multi-Agent Interactions | |
| - Long-Horizon Planning and Instruction Following | |
| - World Modeling across professional and personal tasks | |
| - Self-Improving agent systems | |
| These themes reflect real-world AI environment design and agent behavior that the hackathon evaluates. | |
| ## Submission Design Expectations | |
| - Choose one or more themes and design your own problem statement. | |
| - Simulate realistic scenarios, enable meaningful agent interaction, and support measurable outcomes. | |
| As part of submission, clearly define: | |
| - The **problem statement** | |
| - The **environment** in which the agent(s) operate | |
| - The **capabilities** of the agent(s) | |
| - The **tasks** to be performed | |
| - The **reward model/evaluation logic** | |
| - The **post-training or self-improvement strategy** | |
| ## Recommendation for High Scores | |
| - Define clear, structured tasks and environments. | |
| - Incorporate robust evaluation and reward mechanisms. | |
| - Reflect real-world complexity aligned with OpenEnv principles. | |
| ## Immediate Next Step | |
| - Begin refining design and evaluation right away. | |
| - Training and implementation happen onsite with provided compute credits. | |