Running Featured 69 QED-Nano: Teaching a Tiny Model to Prove Hard Theorems 📝 69 Who needs 1T parameters? Olympiad proofs with a 4B model
Running Featured 69 QED-Nano: Teaching a Tiny Model to Prove Hard Theorems 📝 69 Who needs 1T parameters? Olympiad proofs with a 4B model
Reasoning Cache: Continual Improvement Over Long Horizons via Short-Horizon RL Paper • 2602.03773 • Published Feb 3 • 12
view post Post 1088 I ran the Anthropic Misalignment Framework for a few top models and added it to a dataset: cfahlgren1/anthropic-agentic-misalignment-resultsYou can read the reasoning traces of the models trying to blackmail the user and perform other actions. It's very interesting!! See translation 👍 1 1 + Reply
view post Post 419 Really nice to see AllenAI drop the Reward-Bench-2 dataset and leaderboard from their new paper all on the hub! 👏 allenai/reward-bench allenai/reward-bench-2 allenai/reward-bench-2-resultsGreat work @natolambert , allenai and others!! 🤗 See translation 🤗 1 1 + Reply
view post Post 1740 Yesterday, we dropped a new conversational viewer for datasets on the hub! 💬Actually being able to view and inspect your data is extremely important. This is a big step in making data more accessible and actionable for everyone.Here's some datasets you can try it out on:• mlabonne/FineTome-100k• Salesforce/APIGen-MT-5k• open-thoughts/OpenThoughts2-1M• allenai/tulu-3-sft-mixtureAny other good ones? See translation 1 reply · ❤️ 5 5 + Reply
Kimina-Prover Preview: Towards Large Formal Reasoning Models with Reinforcement Learning Paper • 2504.11354 • Published Apr 15, 2025 • 6
SmolVLM: Redefining small and efficient multimodal models Paper • 2504.05299 • Published Apr 7, 2025 • 207
Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning Paper • 2503.07572 • Published Mar 10, 2025 • 48
Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning Paper • 2503.07572 • Published Mar 10, 2025 • 48
Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning Paper • 2503.07572 • Published Mar 10, 2025 • 48
Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning Paper • 2503.07572 • Published Mar 10, 2025 • 48
Recursive Introspection: Teaching Language Model Agents How to Self-Improve Paper • 2407.18219 • Published Jul 25, 2024 • 3