Spaces:

TeichAI
/

README

Running

Nemotron Distills

by Epistates - opened Dec 16, 2025

Dec 16, 2025

Are there any plans to use https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 and the cascade models as well https://huggingface.co/nvidia/Nemotron-Cascade-14B-Thinking ? The recent distills are very impressive!

armand0e

TeichAI org Dec 17, 2025

Was thinking just this! Will start with the Cascade model since it fits on my personal GPU, fingers crossed it works out of the box :)

armand0e

TeichAI org Dec 17, 2025

Model was made with a Qwen3 base so it worked out of the box :)

TeichAI/Nemotron-Cascade-14B-Thinking-Claude-4.5-Opus-High-Reasoning-Distill
TeichAI/Nemotron-Cascade-14B-Thinking-Claude-4.5-Opus-High-Reasoning-Distill-GGUF

Epistates

Dec 17, 2025

I think your distills may be the best open source models to date. Well done to the entire team!

armand0e

TeichAI org Dec 17, 2025

•

edited Dec 17, 2025

I think your distills may be the best open source models to date. Well done to the entire team!

Thanks for the kind words, I think they are improvements to the base models for sure. Not sure if they are the best 😉

There are plenty of amazing teams out there that are doing a lot more than these relatively simple distillations, their work is amazing.

Either way, Nemotron Cascade 8B Thinking is up now as well:
TeichAI/Nemotron-Cascade-8B-Thinking-Claude-4.5-Opus-High-Reasoning-Distill
TeichAI/Nemotron-Cascade-8B-Thinking-Claude-4.5-Opus-High-Reasoning-Distill-GGUF

Please let me know which (if any other) models you would like me to distill into nemotron cascade next (i.e. claude 4.5 sonnet, deepseek v3.2, gpt 5.2, gemini 3 pro preview, etc)

Epistates

Dec 18, 2025

They are quite impressive! I'll have to try the 8b!

Fwiw, I had a chance to try Nemotron-Cascade-14B-Thinking-Claude-4.5-Opus-High-Reasoning-Distill and it seemed to give a truncated response. I have a collection of prompts I use as a general heuristic to gauge the quality of every LLM I test. Ex: "Write a production grade web application." I test with several languages but typically Ill use go lang as the suggested language. It was able to plan/think with similar output (though it takes this model longer to think) but the response itself seemed to be truncated compared with even the smaller Qwen3-4B-Thinking-2507-Claude-4.5-Opus-High-Reasoning-Distill.

I can post the responses I saw but maybe I just need to tune the hyper parameters a bit.

Epistates

Dec 18, 2025

If you have a chance - https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 would be amazing!

armand0e

TeichAI org Dec 18, 2025

it seemed to give a truncated response.

I will look into this, perhaps it is due to the models relatively low context length, but i find it hard to believe it's reaching it's limit on a single prompt. Not the slightest idea what could be causing it though. may roll back a couple hundred steps. If you get a chance to test out the 8B variant, please let me know if you have the same issue

armand0e

TeichAI org Dec 18, 2025

If you have a chance - https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 would be amazing!

Waiting on my free colab credits for the month 😄

armand0e

TeichAI org Dec 19, 2025

it seemed to give a truncated response.

After my first couple tests I see what you mean. Never experienced an issue like this before, give me some time to look into it more

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment