Metadata Conditioned LLMs Collection Pretraining Data: English NOW corpus (english-corpora.org/now). Paper: arxiv.org/abs/2601.15236. Code: github.com/iamshnoo/metadata_localization • 92 items • Updated 4 days ago
Metadata Conditioned LLMs Collection Pretraining Data: English NOW corpus (english-corpora.org/now). Paper: arxiv.org/abs/2601.15236. Code: github.com/iamshnoo/metadata_localization • 92 items • Updated 4 days ago
iamshnoo/combined_no_europe_without_metadata_1b_step8k Text Generation • 1B • Updated 13 days ago • 900
iamshnoo/combined_no_europe_without_metadata_1b_step4k Text Generation • 1B • Updated 13 days ago • 894
iamshnoo/combined_no_europe_without_metadata_1b_step2k Text Generation • 1B • Updated 13 days ago • 877
iamshnoo/combined_no_asia_without_metadata_1b_step8k Text Generation • 1B • Updated 13 days ago • 852
iamshnoo/combined_no_asia_without_metadata_1b_step4k Text Generation • 1B • Updated 13 days ago • 849
iamshnoo/combined_no_asia_without_metadata_1b_step2k Text Generation • 1B • Updated 13 days ago • 829
iamshnoo/combined_no_america_without_metadata_1b_step8k Text Generation • 1B • Updated 13 days ago • 808
iamshnoo/combined_no_america_without_metadata_1b_step4k Text Generation • 1B • Updated 13 days ago • 808
iamshnoo/combined_no_america_without_metadata_1b_step2k Text Generation • 1B • Updated 13 days ago • 800
iamshnoo/combined_no_africa_without_metadata_1b_step8k Text Generation • 1B • Updated 13 days ago • 795