YongganFu commited on
Commit
97310ee
·
verified ·
1 Parent(s): 0f878ee

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +35 -0
README.md CHANGED
@@ -98,6 +98,41 @@ setattr(config, "attention_implementation_new", "flash_attention_2")
98
  model = AutoModelForCausalLM.from_pretrained(repo_name, config=config, torch_dtype=torch.bfloat16, trust_remote_code=True)
99
  ```
100
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
101
  ## Citation
102
  ```
103
  @misc{fu2025nemotronflash,
 
98
  model = AutoModelForCausalLM.from_pretrained(repo_name, config=config, torch_dtype=torch.bfloat16, trust_remote_code=True)
99
  ```
100
 
101
+ ## Running Nemotron-Flash with TensorRT-LLM
102
+
103
+ ### Setup
104
+ Installation + quick start for TensorRT-LLM: <a href="https://nvidia.github.io/TensorRT-LLM/quick-start-guide.html">Tutorial</a>.
105
+
106
+ ### Quick example
107
+
108
+ An example script for running through the generation workflow:
109
+ ```
110
+ cd examples/auto_deploy
111
+ python build_and_run_ad.py --model nvidia/Nemotron-Flash-3B-Instruct --args.yaml-extra nemotron_flash.yaml
112
+ ```
113
+
114
+ ### Serving with trtllm-serve
115
+
116
+ - Spin up a trtllm server (more details are in this <a href="https://nvidia.github.io/TensorRT-LLM/commands/trtllm-serve/trtllm-serve.html#starting-a-server">doc</a>):
117
+ ```
118
+ trtllm-serve serve nvidia/Nemotron-Flash-3B-Instruct \
119
+ --backend _autodeploy \
120
+ --trust_remote_code \
121
+ --extra_llm_api_options examples/auto_deploy/nemotron_flash.yaml
122
+ ```
123
+
124
+ - Send a request (more details are in this <a href="https://nvidia.github.io/TensorRT-LLM/examples/curl_chat_client.html">doc</a>):
125
+ ```
126
+ curl http://localhost:8000/v1/chat/completions \
127
+ -H "Content-Type: application/json" \
128
+ -d '{
129
+ "model": "nvidia/Nemotron-Flash-3B-Instruct",
130
+ "messages":[{"role": "user", "content": "Where is New York?"}],
131
+ "max_tokens": 16,
132
+ "temperature": 0
133
+ }'
134
+ ```
135
+
136
  ## Citation
137
  ```
138
  @misc{fu2025nemotronflash,