Spaces:

O96a
/

weak-supervision-reasoning

Sleeping

App Files Files Community

weak-supervision-reasoning / README.md

O96a

Upload README.md with huggingface_hub

a5f7d5e verified about 1 month ago

preview code

raw

history blame contribute delete

1.01 kB

A newer version of the Gradio SDK is available: 6.15.2

Upgrade

metadata

title: Weak Supervision Reasoning Explorer
emoji: 🔬
colorFrom: purple
colorTo: pink
sdk: gradio
sdk_version: 4.36.0
app_file: app.py
pinned: false

Weak Supervision Reasoning Explorer

Interactive demo exploring when LLMs can learn to reason with weak supervision, based on paper 2604.18574.

Hypothesis: Models that generalize under weak supervision exhibit a prolonged pre-saturation phase during which training reward and downstream performance climb together, while rapid saturation indicates memorization.

Key Findings from Paper

Reward Saturation Dynamics: Models that generalize show prolonged pre-saturation
Reasoning Faithfulness: Intermediate steps logically supporting final answers predict generalization
SFT is Critical: Supervised fine-tuning on explicit reasoning traces enables weak supervision generalization

Features

Visualize reward saturation curves
Compare reasoning faithfulness across models
Interactive weak supervision scenarios