mayankr26 commited on
Commit
ed8ca69
·
1 Parent(s): c15f1f0

add vllm serving instructions to readme

Browse files
Files changed (1) hide show
  1. README.md +16 -0
README.md CHANGED
@@ -119,6 +119,22 @@ resp = tokenizer.batch_decode(output)[0]
119
  print(resp.replace(model_inputs, ""))
120
  ```
121
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
122
  ## Training and Evaluation
123
 
124
  ### Training Data
 
119
  print(resp.replace(model_inputs, ""))
120
  ```
121
 
122
+ ### Serving with vLLM
123
+
124
+ For production deployments, you can serve Foundation-Sec-8B-Reasoning using [vLLM](https://github.com/vllm-project/vllm). The model uses the `minimax_m2` reasoning parser to properly handle reasoning traces.
125
+
126
+ ```bash
127
+ vllm serve "fdtn-ai/Foundation-Sec-8B-Reasoning" \
128
+ --host 0.0.0.0 \
129
+ --port ${PORT} \
130
+ --tensor-parallel-size 1 \
131
+ --max-model-len 32768 \
132
+ --trust-remote-code \
133
+ --reasoning-parser minimax_m2
134
+ ```
135
+
136
+ Adjust `--tensor-parallel-size` based on your GPU configuration and `--max-model-len` based on your memory constraints.
137
+
138
  ## Training and Evaluation
139
 
140
  ### Training Data