Text Generation
Transformers
Safetensors
llama
maths-reasoning
math-reasoning
slm
reasoning
text-generation-inference
Instructions to use decompute/Nebula-S-SVMS2-3B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use decompute/Nebula-S-SVMS2-3B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="decompute/Nebula-S-SVMS2-3B")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("decompute/Nebula-S-SVMS2-3B") model = AutoModelForCausalLM.from_pretrained("decompute/Nebula-S-SVMS2-3B") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use decompute/Nebula-S-SVMS2-3B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "decompute/Nebula-S-SVMS2-3B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "decompute/Nebula-S-SVMS2-3B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/decompute/Nebula-S-SVMS2-3B
- SGLang
How to use decompute/Nebula-S-SVMS2-3B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "decompute/Nebula-S-SVMS2-3B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "decompute/Nebula-S-SVMS2-3B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "decompute/Nebula-S-SVMS2-3B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "decompute/Nebula-S-SVMS2-3B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use decompute/Nebula-S-SVMS2-3B with Docker Model Runner:
docker model run hf.co/decompute/Nebula-S-SVMS2-3B
| library_name: transformers | |
| tags: | |
| - text-generation | |
| - maths-reasoning | |
| - math-reasoning | |
| - slm | |
| - reasoning | |
| license: other | |
| license_name: decompute-non-commercial-research-license-v1.0 | |
| license_link: https://huggingface.co/decompute/Nebula-S-SVMS2-3B-Internal/blob/main/LICENSE | |
| extra_gated_heading: "Request access to Nebula-S-SVMS2-3B" | |
| extra_gated_button_content: "Acknowledge license and request access" | |
| extra_gated_prompt: "By requesting access, you agree to the Decompute Non-Commercial Research License v1.0. The model may be used only for non-commercial research and evaluation. Commercial use, revenue-generating use, redistribution, sublicensing, hosting, paid API use, SaaS use, production use, customer-facing deployment, fine-tuned redistribution, quantized redistribution, derivative model distribution, and use to train or improve commercial models are prohibited." | |
| extra_gated_fields: | |
| Full name: text | |
| Affiliation: text | |
| Intended use: text | |
| Commercial entity?: text | |
| Country: country | |
| # Nebula-S-3B | |
| Nebula-S-3B is an internal reasoning model package with custom runtime components. | |
| This package intentionally does not include upstream lineage, source training records, or private provenance. Those records are maintained separately in restricted internal release files. | |
| ## Contents | |
| - `core/`: model weights, tokenizer, and generation configuration | |
| - `runtime_weights.safetensors`: runtime weight artifact | |
| - `modeling_nebula.py`: local runtime loader | |
| - `nebula_runtime.py`: import-friendly loader alias | |
| - `release_metadata.json`: neutral package metadata | |
| - `release_manifest.internal.json`: file checksums for this release | |
| ## Install | |
| ```bash | |
| python3 -m venv .venv | |
| source .venv/bin/activate | |
| pip install -r requirements.txt | |
| ``` | |
| ## Smoke test | |
| Run this from inside the extracted model directory: | |
| ```bash | |
| python modeling_nebula.py . | |
| ``` | |
| ## Local usage | |
| ```python | |
| from nebula_runtime import load_model | |
| model, tokenizer = load_model("./Nebula-S-3B") | |
| messages = [{"role": "user", "content": "Solve: what is 2+2?"}] | |
| if getattr(tokenizer, "chat_template", None): | |
| prompt = tokenizer.apply_chat_template( | |
| messages, | |
| add_generation_prompt=True, | |
| tokenize=False, | |
| ) | |
| else: | |
| prompt = "User: Solve: what is 2+2?\nAssistant:" | |
| inputs = tokenizer( | |
| prompt, | |
| add_special_tokens=False, | |
| return_tensors="pt", | |
| ).to(next(model.parameters()).device) | |
| text = model.generate( | |
| inputs["input_ids"], | |
| inputs["attention_mask"], | |
| tokenizer, | |
| max_new_tokens=512, | |
| temperature=0, | |
| ) | |
| print(text) | |
| ``` | |
| ## Creating a tuned successor release | |
| This downloadable package is an inference artifact. To create a tuned successor release, use the approved restricted training workspace rather than modifying this folder in place. | |
| Recommended internal flow: | |
| 1. Create a new release ID, for example `nebula_s_3b_v0_1_1`. | |
| 2. Add approved examples or correction data to the internal training dataset. | |
| 3. Train a candidate runtime artifact in the restricted training environment. | |
| 4. Compare the candidate against this release on fixed evaluation prompts and tasks. | |
| 5. Repackage the candidate with the internal packaging tool. | |
| 6. Run package validation: smoke load, leak scan, strict runtime-weight validation, checksum manifest, and license/notice review. | |
| 7. Promote only the sanitized downloadable package. | |
| Do not upload private provenance, source training records, optimizer state, source data paths, or build logs with this package. | |
| ## License and Use Restrictions | |
| Nebula-S-SVMS2-3B is released under the Decompute Non-Commercial Research License v1.0. | |
| This is a restricted-access non-commercial research release. It is not an open-source release. | |
| ### Permitted Use | |
| The model may be used only for personal, academic, and non-commercial research or evaluation. | |
| ### Prohibited Use | |
| The model may not be used for commercial use, revenue-generating use, production use, paid API use, SaaS use, customer-facing deployment, enterprise workflow automation, redistribution, sublicensing, mirroring, uploading converted versions, uploading quantized versions, uploading fine-tuned versions, or creating/distributing derivative models. | |
| The model and its outputs may not be used to train, improve, distill, benchmark for marketing purposes, or evaluate commercial models, products, services, or platforms. | |
| For commercial licensing, contact hina@decompute.run. | |
| ## Evaluation Results | |
| The following results are from Decompute internal evaluations of Nebula-S-SVMS2-3B. | |
| | Benchmark | Score | | |
| |---|---:| | |
| | GPQA | 86.85 | | |
| | HMMT Nov 2025 | 80.00 | | |
| | GSM8K | 93.78 | | |
| | MMLU-Pro | 83.00 | | |
| These scores are reported from internal evaluation runs. Evaluation settings, prompts, decoding parameters, and extraction methods may affect comparability with public leaderboard results. |