nvidia
/

Nemotron-Flash-3B

Text Generation

Model card Files Files and versions

YongganFu commited on Oct 25, 2025

Commit

00d2f4e

·

verified ·

1 Parent(s): e5e988e

Update README.md

Files changed (1) hide show

README.md +4 -4

README.md CHANGED Viewed

@@ -3,15 +3,15 @@ library_name: transformers
 tags: []
 ---
-# Nemotron-Hymba2-3B Base
-Nemotron-Hymba2 is a new hybrid SLM model family that outperforms Qwen models in accuracy (math, coding, and commonsense), batch-size-1 latency, and throughput. More details are in our NeurIPS 2025 [paper](https://drive.google.com/drive/folders/17vOGktwUfUpRAJPGJUV6oX8XwLSczZtv?usp=sharing).
 Instruct version: [https://huggingface.co/nvidia/Nemotron-Hymba2-3B-Instruct](https://huggingface.co/nvidia/Nemotron-Hymba2-3B-Instruct).
 Docker path: `/lustre/fsw/portfolios/nvr/users/yongganf/docker/megatron_py25_fast_slm.sqsh` on NRT.
-## Chat with Nemotron-Hymba2-3B
 We wrap the model into CUDA Graph for fast generation:
@@ -19,7 +19,7 @@ We wrap the model into CUDA Graph for fast generation:
 from transformers import AutoModelForCausalLM, AutoTokenizer
 import torch
-repo_name = "nvidia/Nemotron-Hymba2-3B"
 tokenizer = AutoTokenizer.from_pretrained(repo_name, trust_remote_code=True)
 model = AutoModelForCausalLM.from_pretrained(repo_name, trust_remote_code=True)

 tags: []
 ---
+# Nemotron-Flash-3B Base Model
+Nemotron-Flash is a new hybrid SLM model family that outperforms Qwen models in accuracy (math, coding, and commonsense), batch-size-1 latency, and throughput. More details are in our NeurIPS 2025 [paper](https://drive.google.com/drive/folders/17vOGktwUfUpRAJPGJUV6oX8XwLSczZtv?usp=sharing).
 Instruct version: [https://huggingface.co/nvidia/Nemotron-Hymba2-3B-Instruct](https://huggingface.co/nvidia/Nemotron-Hymba2-3B-Instruct).
 Docker path: `/lustre/fsw/portfolios/nvr/users/yongganf/docker/megatron_py25_fast_slm.sqsh` on NRT.
+## Chat with Nemotron-Flash-3B
 We wrap the model into CUDA Graph for fast generation:
 from transformers import AutoModelForCausalLM, AutoTokenizer
 import torch
+repo_name = "nvidia/Nemotron-Flash-3B"
 tokenizer = AutoTokenizer.from_pretrained(repo_name, trust_remote_code=True)
 model = AutoModelForCausalLM.from_pretrained(repo_name, trust_remote_code=True)