YongganFu commited on
Commit
00d2f4e
·
verified ·
1 Parent(s): e5e988e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -3,15 +3,15 @@ library_name: transformers
3
  tags: []
4
  ---
5
 
6
- # Nemotron-Hymba2-3B Base
7
 
8
- Nemotron-Hymba2 is a new hybrid SLM model family that outperforms Qwen models in accuracy (math, coding, and commonsense), batch-size-1 latency, and throughput. More details are in our NeurIPS 2025 [paper](https://drive.google.com/drive/folders/17vOGktwUfUpRAJPGJUV6oX8XwLSczZtv?usp=sharing).
9
 
10
  Instruct version: [https://huggingface.co/nvidia/Nemotron-Hymba2-3B-Instruct](https://huggingface.co/nvidia/Nemotron-Hymba2-3B-Instruct).
11
 
12
  Docker path: `/lustre/fsw/portfolios/nvr/users/yongganf/docker/megatron_py25_fast_slm.sqsh` on NRT.
13
 
14
- ## Chat with Nemotron-Hymba2-3B
15
 
16
  We wrap the model into CUDA Graph for fast generation:
17
 
@@ -19,7 +19,7 @@ We wrap the model into CUDA Graph for fast generation:
19
  from transformers import AutoModelForCausalLM, AutoTokenizer
20
  import torch
21
 
22
- repo_name = "nvidia/Nemotron-Hymba2-3B"
23
 
24
  tokenizer = AutoTokenizer.from_pretrained(repo_name, trust_remote_code=True)
25
  model = AutoModelForCausalLM.from_pretrained(repo_name, trust_remote_code=True)
 
3
  tags: []
4
  ---
5
 
6
+ # Nemotron-Flash-3B Base Model
7
 
8
+ Nemotron-Flash is a new hybrid SLM model family that outperforms Qwen models in accuracy (math, coding, and commonsense), batch-size-1 latency, and throughput. More details are in our NeurIPS 2025 [paper](https://drive.google.com/drive/folders/17vOGktwUfUpRAJPGJUV6oX8XwLSczZtv?usp=sharing).
9
 
10
  Instruct version: [https://huggingface.co/nvidia/Nemotron-Hymba2-3B-Instruct](https://huggingface.co/nvidia/Nemotron-Hymba2-3B-Instruct).
11
 
12
  Docker path: `/lustre/fsw/portfolios/nvr/users/yongganf/docker/megatron_py25_fast_slm.sqsh` on NRT.
13
 
14
+ ## Chat with Nemotron-Flash-3B
15
 
16
  We wrap the model into CUDA Graph for fast generation:
17
 
 
19
  from transformers import AutoModelForCausalLM, AutoTokenizer
20
  import torch
21
 
22
+ repo_name = "nvidia/Nemotron-Flash-3B"
23
 
24
  tokenizer = AutoTokenizer.from_pretrained(repo_name, trust_remote_code=True)
25
  model = AutoModelForCausalLM.from_pretrained(repo_name, trust_remote_code=True)