BechusRantus
/

injected_thinking

Model card Files Files and versions

injected_thinking / third_party /ms-swift /examples /deploy /sglang.sh

BechusRantus's picture

Upload folder using huggingface_hub

7134ce7 verified 2 months ago

history blame contribute delete

562 Bytes

	CUDA_VISIBLE_DEVICES=0,1 \
	swift deploy \
	--model Qwen/Qwen3-8B \
	--infer_backend sglang \
	--max_new_tokens 2048 \
	--sglang_context_length 8192 \
	--sglang_tp_size 2 \
	--served_model_name Qwen3-8B

	# After the server-side deployment above is successful, use the command below to perform a client call test.

	# curl http://localhost:8000/v1/chat/completions \
	# -H "Content-Type: application/json" \
	# -d '{
	# "model": "Qwen3-8B",
	# "messages": [{"role": "user", "content": "What is your name?"}],
	# "temperature": 0
	# }'