Spaces:

O96a
/

weak-supervision-reasoning

Sleeping

Upload README.md with huggingface_hub

a5f7d5e verified about 1 month ago

1.01 kB

	---
	title: Weak Supervision Reasoning Explorer
	emoji: 🔬
	colorFrom: purple
	colorTo: pink
	sdk: gradio
	sdk_version: 4.36.0
	app_file: app.py
	pinned: false
	---

	# Weak Supervision Reasoning Explorer

	Interactive demo exploring when LLMs can learn to reason with weak supervision, based on paper 2604.18574.

	Hypothesis: Models that generalize under weak supervision exhibit a prolonged pre-saturation phase during which training reward and downstream performance climb together, while rapid saturation indicates memorization.

	## Key Findings from Paper

	- Reward Saturation Dynamics: Models that generalize show prolonged pre-saturation
	- Reasoning Faithfulness: Intermediate steps logically supporting final answers predict generalization
	- SFT is Critical: Supervised fine-tuning on explicit reasoning traces enables weak supervision generalization

	## Features

	- Visualize reward saturation curves
	- Compare reasoning faithfulness across models
	- Interactive weak supervision scenarios