Nemotron Distills

#5
by Epistates - opened

Are there any plans to use https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 and the cascade models as well https://huggingface.co/nvidia/Nemotron-Cascade-14B-Thinking ? The recent distills are very impressive!

TeichAI org

Was thinking just this! Will start with the Cascade model since it fits on my personal GPU, fingers crossed it works out of the box :)

TeichAI org

I think your distills may be the best open source models to date. Well done to the entire team!

TeichAI org
โ€ข
edited 9 days ago

I think your distills may be the best open source models to date. Well done to the entire team!

Thanks for the kind words, I think they are improvements to the base models for sure. Not sure if they are the best ๐Ÿ˜‰

There are plenty of amazing teams out there that are doing a lot more than these relatively simple distillations, their work is amazing.

Either way, Nemotron Cascade 8B Thinking is up now as well:
TeichAI/Nemotron-Cascade-8B-Thinking-Claude-4.5-Opus-High-Reasoning-Distill
TeichAI/Nemotron-Cascade-8B-Thinking-Claude-4.5-Opus-High-Reasoning-Distill-GGUF

Please let me know which (if any other) models you would like me to distill into nemotron cascade next (i.e. claude 4.5 sonnet, deepseek v3.2, gpt 5.2, gemini 3 pro preview, etc)

They are quite impressive! I'll have to try the 8b!

Fwiw, I had a chance to try Nemotron-Cascade-14B-Thinking-Claude-4.5-Opus-High-Reasoning-Distill and it seemed to give a truncated response. I have a collection of prompts I use as a general heuristic to gauge the quality of every LLM I test. Ex: "Write a production grade web application." I test with several languages but typically Ill use go lang as the suggested language. It was able to plan/think with similar output (though it takes this model longer to think) but the response itself seemed to be truncated compared with even the smaller Qwen3-4B-Thinking-2507-Claude-4.5-Opus-High-Reasoning-Distill.

I can post the responses I saw but maybe I just need to tune the hyper parameters a bit.

TeichAI org

it seemed to give a truncated response.

I will look into this, perhaps it is due to the models relatively low context length, but i find it hard to believe it's reaching it's limit on a single prompt. Not the slightest idea what could be causing it though. may roll back a couple hundred steps. If you get a chance to test out the 8B variant, please let me know if you have the same issue

TeichAI org

If you have a chance - https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 would be amazing!

Waiting on my free colab credits for the month ๐Ÿ˜„

TeichAI org

it seemed to give a truncated response.

After my first couple tests I see what you mean. Never experienced an issue like this before, give me some time to look into it more

Sign up or log in to comment