Spaces:
Sleeping
The Agentic Triage Protocol: 2-Minute Presentation Script
Goal: Deliver a high-impact, professional summary of the project for hackathon judges.
[0:00 - 0:30] The Hook: The Governance Gap
"Current AI coding assistants are great at writing code, but they are dangerously naive at guarding it. Most LLMs can summarize a diff, but they can’t tell you if a subtle shift in a Redis lock will bring down your global service. There is a massive 'Governance Gap' between code generation and professional code review."
[0:30 - 1:00] The Solution: pr-review-env
"We built **pr-review-env**—not just a dataset, but a high-fidelity, interactive protocol for training professional reviewers. We've cataloged 100 diversified scenarios cross-referenced from real production incidents—covering everything from JWT security bypasses to TOCTOU race conditions.
Our environment uses a Staged Interaction Protocol, forcing agents to transition from identifying risks to assessing impact and final remediation."
[1:00 - 1:30] The Tech: GRPO & The Reward Engine
"To solve this, we used Group Relative Policy Optimization (GRPO). We trained a Qwen-2.5-3B model using a deterministic, multi-axis reward engine. We don't just score if the model is 'polite'; we score on Decision Accuracy, Label F1, and Logical Consistency. Our engine include an 'Anti-Hedging' penalty—if a model spots a security bug but approves it anyway, it is heavily penalized. This builds a model engineering teams can actually trust."
[1:30 - 2:00] The Results: Speed & Stability
"The results? A 77% improvement in triage quality on hard, high-stakes tasks. But more importantly, we’ve broken the speed-stability tradeoff. Our agents now deliver senior-grade reviews in under 8 seconds—faster than any human could parse the complexity of these diffs. We've made this fully reproducible with a Dockerized environment and a Colab-first training pipeline. We aren't just summarizing code; we're automating professional code observability. Thank you."
Key Differentiators to Emphasize (If asked):
- Deterministic Grading: No high-latency LLM judges; all rewards are computed via a fast, verifiable Evidence Oracle.
- Latency-Aware RL: Our RL loop specifically optimizes for the Pareto frontier of accuracy vs. speed.
- Consistency Penalty: We effectively solved "model hedging," where models choose the easiest path instead of the safest one.