Semi-Supervised Reward Modeling via Iterative Self-Training Paper • 2409.06903 • Published Sep 10, 2024
Towards Understanding the Fragility of Multilingual LLMs against Fine-Tuning Attacks Paper • 2410.18210 • Published Oct 23, 2024
MergeBench: A Benchmark for Merging Domain-Specialized LLMs Paper • 2505.10833 • Published May 16, 2025 • 1
Scalable Data Synthesis for Computer Use Agents with Step-Level Filtering Paper • 2512.10962 • Published Nov 22, 2025 • 3
Localize-and-Stitch: Efficient Model Merging via Sparse Task Arithmetic Paper • 2408.13656 • Published Aug 24, 2024
MergeBench-Llama-8B-it/llama-3.1-8b-it_tulu-3-sft-personas-instruction-following_epoch3_0429 Text Generation • 8B • Updated May 5, 2025 • 3
MergeBench-Llama-8B-it/llama-3.1-8b-it_tulu-3-sft-personas-instruction-following_epoch3_0429 Text Generation • 8B • Updated May 5, 2025 • 3
MergeBench-Llama-8B-it/llama-3.1-8b-it_dart-math-uniform_2epoch Text Generation • 8B • Updated Apr 25, 2025 • 2
MergeBench-Llama-8B-it/llama-3.1-8b-it_dart-math-uniform_2epoch Text Generation • 8B • Updated Apr 25, 2025 • 2
Understanding the Impact of Adversarial Robustness on Accuracy Disparity Paper • 2211.15762 • Published Nov 28, 2022