GGUF Files for Nebula-S-v1
These are the GGUF files for decompute/Nebula-S-v1.
Downloads
| GGUF Link | Quantization | Description |
|---|---|---|
| Download | Q2_K | Lowest quality |
| Download | Q3_K_S | |
| Download | IQ3_S | Integer quant, preferable over Q3_K_S |
| Download | IQ3_M | Integer quant |
| Download | Q3_K_M | |
| Download | Q3_K_L | |
| Download | IQ4_XS | Integer quant |
| Download | Q4_K_S | Fast with good performance |
| Download | Q4_K_M | Recommended: Perfect mix of speed and performance |
| Download | Q5_K_S | |
| Download | Q5_K_M | |
| Download | Q6_K | Very good quality |
| Download | Q8_0 | Best quality |
| Download | f16 | Full precision, don't bother; use a quant |
Note from Flexan
I provide GGUFs and quantizations of publicly available models that do not have a GGUF equivalent available yet, usually for models I deem interesting and wish to try out.
If there are some quants missing that you'd like me to add, you may request one in the community tab. If you want to request a public model to be converted, you can also request that in the community tab. If you have questions regarding this model, please refer to the original model repo.
You can find more info about me and what I do here.
Nebula-S-v1
Nebula-S-v1 is a reasoning-enhanced language model using the SVMS (Structured-Vector Multi-Stream) architecture.
Architecture
SVMS adds a multi-stream reasoning layer on top of a frozen 4B-parameter backbone:
- Structured Consistency: Topological constraint forcing cross-stream coherence
- Learned Router: Per-token stream weighting
- Delta Logits: Learnable correction to backbone predictions
| Component | Details |
|---|---|
| Trainable | 400M / 4.4B total |
Quick Start
pip install torch transformers huggingface-hub
Option 1: Using huggingface_hub
from huggingface_hub import snapshot_download
import sys
snapshot_download("punitdecomp/Nebula-S-v1", local_dir="./Nebula-S-v1")
sys.path.insert(0, "./Nebula-S-v1")
from nebula_s import load_nebula_s
model, tokenizer = load_nebula_s("./Nebula-S-v1", device="cuda")
Option 2: Using git clone
git lfs install
git clone https://huggingface.co/punitdecomp/Nebula-S-v1
import sys
sys.path.insert(0, "./Nebula-S-v1")
from nebula_s import load_nebula_s
model, tokenizer = load_nebula_s("./Nebula-S-v1", device="cuda")
Generate a response
messages = [{"role": "user", "content": "Solve step by step: what is 17 * 23?"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to("cuda")
response = model.generate(
inputs["input_ids"], inputs["attention_mask"],
tokenizer, max_new_tokens=2048, temperature=0.7
)
print(response)
Training
- Data: Orca Math Word Problems (200K)
- Steps: 1000
- Method: Adapter-only (backbone frozen)
Evaluation Results
Nebula-S-v1 was evaluated on several challenging benchmarks:
| Benchmark | Score |
|---|---|
| GSM8K | 90% |
| GPQA | 70.5% |
| HMMT (November 2025) | 67% |
| MMLU-Pro | 79.7% |
These results demonstrate strong performance for a 4B-parameter model, particularly on math reasoning (GSM8K) and advanced knowledge/competition-level tasks.
License
Apache 2.0. Backbone derived from an Apache-2.0 licensed base model.
- Downloads last month
- 688
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit
16-bit
Model tree for Flexan/decompute-Nebula-S-v1-GGUF
Base model
decompute/Nebula-S-v1