Spaces:

Roopalgn
/

openenv-clinical-trial

Sleeping

App Files Files Community

openenv-clinical-trial / docs /hack_info.md

Roopalgn

docs: add roadmap, README, branch-pr matrix, and formatting pass

b3e0336 about 2 months ago

preview code

raw

history blame

9.15 kB

	# Hackathon Themes and Event Information

	## Theme #1 - Multi-Agent Interactions

	- Environments for this theme involve cooperation, competition, negotiation, and coalition formation.
	- Learning from these environments enables agents to model beliefs and incentives of others in partially observable settings.
	- This drives theory-of-mind reasoning and emergent strategic behavior.
	- Expected Outcome: an environment that can be used to train multi-agent task handling in an LLM.
	- Example environments: market simulations, compute-allocation negotiations, collaborative puzzle worlds, mixed cooperative/competitive strategy games.
	- Sub-themes with bonus prizes:
	- Fleet AI. Scalable Oversight: environments that train oversight agents to monitor, analyze, and explain behavior of other AI agents in complex multi-agent settings.
	- Halluminate. Multi-Actor Environments: realistic environments where an agent interacts with and manages multiple actors (agents) to discover and achieve a task.

	## Theme #2 - (Super) Long-Horizon Planning and Instruction Following

	- Build environments that require deep, multi-step reasoning with sparse or delayed rewards.
	- Goal is to enable agents to decompose goals, track state over extended trajectories, and recover from early mistakes.
	- Aim is to move beyond shallow next-token reasoning toward structured planning and durable internal representations.
	- Expected Outcome: an environment that captures and improves LLM behaviour on challenging long-horizon tasks needing long-running sessions beyond context memory limits.
	- Example environments: research-planning simulators, large-scale codebase refactoring tasks, strategic resource management worlds, long-horizon logistics optimization, extremely complicated long-horizon instruction following (e.g., 300 instructions scattered around).
	- Sub-themes with bonus prizes:
	- Scale AI: long-horizon workflows for non-code business use cases in Sales, Project management, or HR and IT.
	- Mercor: an environment with capped/uncapped rewards where frontier model rewards scale with token output.

	## Theme #3 - World Modeling

	### 3.1 Professional Tasks

	- Develop environments requiring real interaction with tools, APIs, or dynamic systems where models do real hard work instead of exploiting shortcuts.
	- Learning from these environments should enable agents to maintain consistent internal state, update beliefs based on outcomes, and orchestrate multi-step workflows.
	- Goal is to strengthen causal reasoning and persistent world models.
	- Expected Outcome: an environment capturing nuances of a defined partially observable world and improving LLM interaction with it.
	- Example environments: dynamic browser/API ecosystems, enterprise applications, scientific workflow loops (papers -> code -> experiments), economic simulations with feedback, tool-discovery benchmarks.
	- Sub-themes with bonus prizes:
	- Scaler AI Labs. Multi-App RL Environment for Enterprise Workflows: create RL environments to demonstrate complex workflows and business-rule nuances in large enterprises.

	### 3.2 Personalized Tasks

	- Develop environments for real personalized task handling.
	- Example use cases include replying to personal messages, handling dinner/work conflicts, replying to tough emails, and other personal assistant tasks.
	- Expected Outcome: an environment that gives the model a realistic simulation of handling personal tasks, conflicts, and delegations.
	- Example environments: executive assistant meeting planner, dinner and drive planning, email/message replying, shopping, etc.
	- Sub-themes with bonus prizes:
	- Patronus AI. Consumer Workflows with Schema Drift: multi-step consumer workflow environments where schemas, API contracts, and policies/rules change.

	## Theme #4 - Self-Improvement

	- Focus is to create environments where agents learn to generate new challenges, escalate difficulty, and improve through self-play or adaptive curricula.
	- Instead of optimizing fixed tasks, agents should learn to drive their own capability growth.
	- Objective is recursive skill amplification.
	- Expected Outcome: an environment for improving self-play of an LLM over a defined set of tasks.
	- Example environments: self-play negotiation arenas, auto-generated math/proof tasks, evolving coding competitions, adaptive RL curricula.
	- Sub-themes with bonus prizes:
	- Snorkel AI. Simulated Experts-in-the-Loop: environment that simulates interactions with subject-matter experts with changing requirements/preferences.

	## Theme #5: Wild Card - Impress Us!

	- If ideas do not fit the boxes above, out-of-the-box tasks are welcome.
	- Submissions should still meaningfully add value to LLM training on a specific task.

	## Guidelines for Problem Statement

	- It is not mandatory to choose the same problem statement as Round 1.
	- Choose the same problem statement only if it aligns with the provided hackathon themes.
	- You can start working on your problem statement once finalized.
	- Post-training can be done onsite on 25th and 26th when compute credits are provided for HuggingFace.
	- Before onsite, focus on building the environment, agent behaviours, reward model, and evaluating alignment with judging criteria.

	## Judging Criteria

	### Minimum requirements

	- Usage of OpenEnv (latest release).
	- Show a minimal training script using Unsloth or HF TRL in Colab.
	- Write a mini-blog on HuggingFace or mini-video on YouTube talking about your submission (< 2 minutes).

	### First Round Judging Overview

	- Pitch Format: each team has 3 minutes to pitch and 2 minutes for Q&A (5 minutes total).
	- Evaluation criteria:
	- Environment Innovation (40%): Is the environment novel, creative, or challenging? Does it meaningfully test agent behavior?
	- Storytelling (30%): Does the team clearly explain the problem, environment, and agent behavior? Is the demo engaging and easy to follow?
	- Showing Improvement in Rewards (20%): Does the demo show observable training progress (reward curves, metrics, before/after behavior)?
	- Reward and Training Script/Pipeline Setup (10%): Is reward logic coherent, and does the pipeline produce meaningful improvement in agent inference?
	- Each evaluator judges about 10-15 teams and submits scores individually.
	- Cerebral Valley aggregates all judges' scores to determine the top 15 finalist projects.

	## Team Confirmation Email

	- Hi Roopal Guha Neogi,
	- Your solo/team spot at the Meta PyTorch OpenEnv Hackathon x Scaler School of Technology - Grand Finale is officially confirmed.
	- This email serves as your official team ticket to the finale.

	### Event details

	- Date: 25-26 April 2026
	- Venue: Scaler School of Technology, Electronic City, Bangalore

	### Participation category

	- Team of 2

	### Team members

	- Team Member 1 (Team Leader):
	- Name: Roopal Guha Neogi
	- Email: roopal.guhaneogi@gmail.com
	- Team Member 2:
	- Name: Suyash Kumar
	- Email: suyashk102@gmail.com

	### What to do right now

	- Join the private Discord (MANDATORY): Join here.
	- All major updates and announcements will be shared there first.
	- Check the travel guide: Read Here.
	- Travel guide includes venue details, directions, and nearby stay options.

	### Important - Entry to Campus

	- You must present this email at entry.
	- Teams/participants without this email will not be allowed on campus.
	- Going forward, all communication will be shared only with the team leader.

	### Please carry for verification

	- A valid government-issued ID.
	- Your college/company ID used during registration.

	### Entry policy notes

	- Entry will not be permitted if details do not match registration.
	- All team members must be individually registered in the system.
	- New/unregistered members added to travel details will not be allowed on campus.
	- Organisers reserve the right to deny entry if verification criteria are not met.

	## Round 2 Theme Reveal Summary

	- Multi-Agent Interactions
	- Long-Horizon Planning and Instruction Following
	- World Modeling across professional and personal tasks
	- Self-Improving agent systems

	These themes reflect real-world AI environment design and agent behavior that the hackathon evaluates.

	## Submission Design Expectations

	- Choose one or more themes and design your own problem statement.
	- Simulate realistic scenarios, enable meaningful agent interaction, and support measurable outcomes.

	As part of submission, clearly define:

	- The problem statement
	- The environment in which the agent(s) operate
	- The capabilities of the agent(s)
	- The tasks to be performed
	- The reward model/evaluation logic
	- The post-training or self-improvement strategy

	## Recommendation for High Scores

	- Define clear, structured tasks and environments.
	- Incorporate robust evaluation and reward mechanisms.
	- Reflect real-world complexity aligned with OpenEnv principles.

	## Immediate Next Step

	- Begin refining design and evaluation right away.
	- Training and implementation happen onsite with provided compute credits.