fdtn-ai
/

Foundation-Sec-8B-Reasoning

@@ -119,6 +119,22 @@ resp = tokenizer.batch_decode(output)[0]
 print(resp.replace(model_inputs, ""))
 ```
 ## Training and Evaluation
 ### Training Data

 print(resp.replace(model_inputs, ""))
 ```
+### Serving with vLLM
+For production deployments, you can serve Foundation-Sec-8B-Reasoning using [vLLM](https://github.com/vllm-project/vllm). The model uses the `minimax_m2` reasoning parser to properly handle reasoning traces.
+```bash
+vllm serve "fdtn-ai/Foundation-Sec-8B-Reasoning" \
+    --host 0.0.0.0 \
+    --port ${PORT} \
+    --tensor-parallel-size 1 \
+    --max-model-len 32768 \
+    --trust-remote-code \
+    --reasoning-parser minimax_m2
+```
+Adjust `--tensor-parallel-size` based on your GPU configuration and `--max-model-len` based on your memory constraints.
 ## Training and Evaluation
 ### Training Data