Spaces:
Sleeping
Sleeping
| title: Weak Supervision Reasoning Explorer | |
| emoji: 🔬 | |
| colorFrom: purple | |
| colorTo: pink | |
| sdk: gradio | |
| sdk_version: 4.36.0 | |
| app_file: app.py | |
| pinned: false | |
| # Weak Supervision Reasoning Explorer | |
| Interactive demo exploring when LLMs can learn to reason with weak supervision, based on paper 2604.18574. | |
| **Hypothesis:** Models that generalize under weak supervision exhibit a prolonged pre-saturation phase during which training reward and downstream performance climb together, while rapid saturation indicates memorization. | |
| ## Key Findings from Paper | |
| - **Reward Saturation Dynamics:** Models that generalize show prolonged pre-saturation | |
| - **Reasoning Faithfulness:** Intermediate steps logically supporting final answers predict generalization | |
| - **SFT is Critical:** Supervised fine-tuning on explicit reasoning traces enables weak supervision generalization | |
| ## Features | |
| - Visualize reward saturation curves | |
| - Compare reasoning faithfulness across models | |
| - Interactive weak supervision scenarios | |