Data-efficient pre-training by scaling synthetic megadocs Collection https://arxiv.org/abs/2603.18534 • 20 items • Updated 7 days ago