Spaces:
Running
Running
File size: 2,520 Bytes
2414d31 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | # ClarifyRL β Documentation Index
Project: **ClarifyRL** β Train LLMs to ask clarifying questions instead of hallucinating.
Hackathon: **Meta OpenEnv Hackathon Grand Finale**, Apr 25-26, 2026, Bangalore.
Team: **Bhole Chature** (Anurag Agarwal + Kanan Agarwal).
> **New agent / new chat?** Read in this order:
>
> 1. [`AGENT_ONBOARDING.md`](./AGENT_ONBOARDING.md) β paste-this-first for non-Windsurf agents
> 2. [`STATUS.md`](./STATUS.md) β what's true right now
> 3. [`SESSION_LOG.md`](./SESSION_LOG.md) β last 3 entries, what prior agents did
> 4. Then the design docs below.
## Read in this order
| # | Doc | What it covers |
|---|-----|----------------|
| 00 | [overview.md](./00-overview.md) | Pitch, problem statement, why this idea wins |
| 01 | [requirements.md](./01-requirements.md) | Functional, non-functional, hackathon validator requirements |
| 02 | [architecture.md](./02-architecture.md) | System architecture, components, data flow |
| 03 | [environment-spec.md](./03-environment-spec.md) | OpenEnv env design: state, actions, observations, MCP tools |
| 04 | [rubric-design.md](./04-rubric-design.md) | 5-component composable rubric, weights, anti-hacking |
| 05 | [scenario-design.md](./05-scenario-design.md) | Profile schema, task types, user simulator |
| 06 | [training-plan.md](./06-training-plan.md) | GRPO + Unsloth config, baseline, eval methodology |
| 07 | [deployment.md](./07-deployment.md) | HF Space, Colab, README, submission checklist |
| 08 | [timeline.md](./08-timeline.md) | Hour-by-hour 48h sprint plan + team split |
| 09 | [risks.md](./09-risks.md) | Risk register + mitigations + fallback plans |
## Lock-status
- β
**Idea LOCKED**: ClarifyRL (AskBeforeYouAct) β train epistemic humility via RL
- β
**Theme LOCKED**: **#5 Wild Card (primary)** + 3.2 Personalized + 2 Long-Horizon (secondary)
- β
**Task families LOCKED**: 3 high-stakes (coding, medical-intake, support) + 2 personal (meeting, event)
- β
**Stack LOCKED**: OpenEnv 0.2.2 + MCPEnvironment + FastMCP + Unsloth + TRL GRPO + Qwen2.5-1.5B
- β
**Compute LOCKED**: Colab free T4 + $30 HF inference credits + M3 Pro 18GB
- β
**Docs LOCKED**: Positioning sharpened with AI-safety framing
- β³ **Code**: Scaffolding done, env + rubric pending
## Headline metric
> **Hallucination rate: ~90% baseline β ~3% trained** (on 100 held-out scenarios across 5 task families).
Secondary metrics: plan satisfaction 27% β 85%, field-match F1 0.20 β 0.92, avg clarifying questions 0.4 β 2.7.
|