Nemotron-Labs-Diffusion Collection A Tri-Mode Language Model Family Unifying Autoregressive, Diffusion, and Self-Speculation Decoding • 7 items • Updated 18 days ago • 51
Personality and Roleplay Collection Datasets for finetuning the personality and accent's of your LLM's • 2 items • Updated Feb 21 • 2
Creative Writing Datasets Collection High-quality creative writing and storytelling data. • 36 items • Updated Mar 22 • 9
🤏 Smol-Data Collection Tried and tested mixes for strong pretraining. Inspired by https://huggingface.co/blog/codelion/optimal-dataset-mixing • 14 items • Updated Mar 2 • 13
pplx-embed Collection Diffusion-Pretrained Dense and Contextual Embeddings • 10 items • Updated May 26 • 100