Instructions to use decompute/Nebula-S-SVMS2-3B-Internal with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use decompute/Nebula-S-SVMS2-3B-Internal with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="decompute/Nebula-S-SVMS2-3B-Internal")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("decompute/Nebula-S-SVMS2-3B-Internal") model = AutoModelForCausalLM.from_pretrained("decompute/Nebula-S-SVMS2-3B-Internal") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use decompute/Nebula-S-SVMS2-3B-Internal with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "decompute/Nebula-S-SVMS2-3B-Internal" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "decompute/Nebula-S-SVMS2-3B-Internal", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/decompute/Nebula-S-SVMS2-3B-Internal
- SGLang
How to use decompute/Nebula-S-SVMS2-3B-Internal with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "decompute/Nebula-S-SVMS2-3B-Internal" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "decompute/Nebula-S-SVMS2-3B-Internal", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "decompute/Nebula-S-SVMS2-3B-Internal" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "decompute/Nebula-S-SVMS2-3B-Internal", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use decompute/Nebula-S-SVMS2-3B-Internal with Docker Model Runner:
docker model run hf.co/decompute/Nebula-S-SVMS2-3B-Internal
File size: 379 Bytes
a8bc0fd | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 | {
"python": "3.12.13 | packaged by conda-forge | (main, Mar 5 2026, 16:50:00) [GCC 14.3.0]",
"python_version": "3.12.13",
"packages": {
"torch": "2.11.0",
"transformers": "5.8.0",
"safetensors": "0.7.0",
"accelerate": "1.13.0",
"huggingface_hub": "1.8.0"
},
"torch_cuda": "13.0",
"torch_git_version": "70d99e998b4955e0049d13a98d77ae1b14db1f45"
}
|