Spaces:

Roopalgn
/

openenv-clinical-trial

Sleeping

App Files Files Community

openenv-clinical-trial / docs /hack_info.md

Roopalgn

docs: add roadmap, README, branch-pr matrix, and formatting pass

b3e0336 about 2 months ago

preview code

raw

history blame

9.15 kB

Hackathon Themes and Event Information

Theme #1 - Multi-Agent Interactions

Environments for this theme involve cooperation, competition, negotiation, and coalition formation.
Learning from these environments enables agents to model beliefs and incentives of others in partially observable settings.
This drives theory-of-mind reasoning and emergent strategic behavior.
Expected Outcome: an environment that can be used to train multi-agent task handling in an LLM.
Example environments: market simulations, compute-allocation negotiations, collaborative puzzle worlds, mixed cooperative/competitive strategy games.
Sub-themes with bonus prizes:
- Fleet AI. Scalable Oversight: environments that train oversight agents to monitor, analyze, and explain behavior of other AI agents in complex multi-agent settings.
- Halluminate. Multi-Actor Environments: realistic environments where an agent interacts with and manages multiple actors (agents) to discover and achieve a task.

Theme #2 - (Super) Long-Horizon Planning and Instruction Following

Build environments that require deep, multi-step reasoning with sparse or delayed rewards.
Goal is to enable agents to decompose goals, track state over extended trajectories, and recover from early mistakes.
Aim is to move beyond shallow next-token reasoning toward structured planning and durable internal representations.
Expected Outcome: an environment that captures and improves LLM behaviour on challenging long-horizon tasks needing long-running sessions beyond context memory limits.
Example environments: research-planning simulators, large-scale codebase refactoring tasks, strategic resource management worlds, long-horizon logistics optimization, extremely complicated long-horizon instruction following (e.g., 300 instructions scattered around).
Sub-themes with bonus prizes:
- Scale AI: long-horizon workflows for non-code business use cases in Sales, Project management, or HR and IT.
- Mercor: an environment with capped/uncapped rewards where frontier model rewards scale with token output.

Theme #3 - World Modeling

3.1 Professional Tasks

Develop environments requiring real interaction with tools, APIs, or dynamic systems where models do real hard work instead of exploiting shortcuts.
Learning from these environments should enable agents to maintain consistent internal state, update beliefs based on outcomes, and orchestrate multi-step workflows.
Goal is to strengthen causal reasoning and persistent world models.
Expected Outcome: an environment capturing nuances of a defined partially observable world and improving LLM interaction with it.
Example environments: dynamic browser/API ecosystems, enterprise applications, scientific workflow loops (papers -> code -> experiments), economic simulations with feedback, tool-discovery benchmarks.
Sub-themes with bonus prizes:
- Scaler AI Labs. Multi-App RL Environment for Enterprise Workflows: create RL environments to demonstrate complex workflows and business-rule nuances in large enterprises.

3.2 Personalized Tasks

Develop environments for real personalized task handling.
Example use cases include replying to personal messages, handling dinner/work conflicts, replying to tough emails, and other personal assistant tasks.
Expected Outcome: an environment that gives the model a realistic simulation of handling personal tasks, conflicts, and delegations.
Example environments: executive assistant meeting planner, dinner and drive planning, email/message replying, shopping, etc.
Sub-themes with bonus prizes:
- Patronus AI. Consumer Workflows with Schema Drift: multi-step consumer workflow environments where schemas, API contracts, and policies/rules change.

Theme #4 - Self-Improvement

Focus is to create environments where agents learn to generate new challenges, escalate difficulty, and improve through self-play or adaptive curricula.
Instead of optimizing fixed tasks, agents should learn to drive their own capability growth.
Objective is recursive skill amplification.
Expected Outcome: an environment for improving self-play of an LLM over a defined set of tasks.
Example environments: self-play negotiation arenas, auto-generated math/proof tasks, evolving coding competitions, adaptive RL curricula.
Sub-themes with bonus prizes:
- Snorkel AI. Simulated Experts-in-the-Loop: environment that simulates interactions with subject-matter experts with changing requirements/preferences.

Theme #5: Wild Card - Impress Us!

If ideas do not fit the boxes above, out-of-the-box tasks are welcome.
Submissions should still meaningfully add value to LLM training on a specific task.

Guidelines for Problem Statement

It is not mandatory to choose the same problem statement as Round 1.
Choose the same problem statement only if it aligns with the provided hackathon themes.
You can start working on your problem statement once finalized.
Post-training can be done onsite on 25th and 26th when compute credits are provided for HuggingFace.
Before onsite, focus on building the environment, agent behaviours, reward model, and evaluating alignment with judging criteria.

Judging Criteria

Minimum requirements

Usage of OpenEnv (latest release).
Show a minimal training script using Unsloth or HF TRL in Colab.
Write a mini-blog on HuggingFace or mini-video on YouTube talking about your submission (< 2 minutes).

First Round Judging Overview

Pitch Format: each team has 3 minutes to pitch and 2 minutes for Q&A (5 minutes total).
Evaluation criteria:
- Environment Innovation (40%): Is the environment novel, creative, or challenging? Does it meaningfully test agent behavior?
- Storytelling (30%): Does the team clearly explain the problem, environment, and agent behavior? Is the demo engaging and easy to follow?
- Showing Improvement in Rewards (20%): Does the demo show observable training progress (reward curves, metrics, before/after behavior)?
- Reward and Training Script/Pipeline Setup (10%): Is reward logic coherent, and does the pipeline produce meaningful improvement in agent inference?
Each evaluator judges about 10-15 teams and submits scores individually.
Cerebral Valley aggregates all judges' scores to determine the top 15 finalist projects.

Team Confirmation Email

Hi Roopal Guha Neogi,
Your solo/team spot at the Meta PyTorch OpenEnv Hackathon x Scaler School of Technology - Grand Finale is officially confirmed.
This email serves as your official team ticket to the finale.

Event details

Date: 25-26 April 2026
Venue: Scaler School of Technology, Electronic City, Bangalore

Participation category

Team of 2

Team members

Team Member 1 (Team Leader):
- Name: Roopal Guha Neogi
- Email: roopal.guhaneogi@gmail.com
Team Member 2:
- Name: Suyash Kumar
- Email: suyashk102@gmail.com

What to do right now

Join the private Discord (MANDATORY): Join here.
All major updates and announcements will be shared there first.
Check the travel guide: Read Here.
Travel guide includes venue details, directions, and nearby stay options.

Important - Entry to Campus

You must present this email at entry.
Teams/participants without this email will not be allowed on campus.
Going forward, all communication will be shared only with the team leader.

Please carry for verification

A valid government-issued ID.
Your college/company ID used during registration.

Entry policy notes

Entry will not be permitted if details do not match registration.
All team members must be individually registered in the system.
New/unregistered members added to travel details will not be allowed on campus.
Organisers reserve the right to deny entry if verification criteria are not met.

Round 2 Theme Reveal Summary

Multi-Agent Interactions
Long-Horizon Planning and Instruction Following
World Modeling across professional and personal tasks
Self-Improving agent systems

These themes reflect real-world AI environment design and agent behavior that the hackathon evaluates.

Submission Design Expectations

Choose one or more themes and design your own problem statement.
Simulate realistic scenarios, enable meaningful agent interaction, and support measurable outcomes.

As part of submission, clearly define:

The problem statement
The environment in which the agent(s) operate
The capabilities of the agent(s)
The tasks to be performed
The reward model/evaluation logic
The post-training or self-improvement strategy

Recommendation for High Scores

Define clear, structured tasks and environments.
Incorporate robust evaluation and reward mechanisms.
Reflect real-world complexity aligned with OpenEnv principles.

Immediate Next Step

Begin refining design and evaluation right away.
Training and implementation happen onsite with provided compute credits.