| --- |
| license: apache-2.0 |
| tags: |
| - nebula-s |
| - svms |
| - math-reasoning |
| - competition-math |
| - quantized |
| - int4 |
| - hqq |
| library_name: transformers |
| --- |
| |
| # Nebula-S-v1-lite |
|
|
| Lightweight (~3GB) version of [Nebula-S-v1](https://huggingface.co/punitdecomp/Nebula-S-v1), pre-quantized to int4 using [HQQ](https://github.com/mobiusml/hqq) (Half-Quadratic Quantization). |
|
|
| **Runs on Mac (MPS), CUDA, and CPU.** |
|
|
| | Variant | Download | Runtime | Platform | |
| |---|---|---|---| |
| | [Nebula-S-v1](https://huggingface.co/punitdecomp/Nebula-S-v1) | ~9 GB | ~9 GB | Universal (bf16) | |
| | [Nebula-S-v1-4bit](https://huggingface.co/punitdecomp/Nebula-S-v1-4bit) | ~3 GB | ~3 GB | CUDA only (bnb) | |
| | **Nebula-S-v1-lite** | **~3 GB** | **~3 GB** | **Mac + CUDA + CPU** | |
|
|
| ## Quick Start |
|
|
| ```bash |
| pip install torch transformers>=4.51.0 hqq huggingface-hub |
| ``` |
|
|
| ### Option 1: Using huggingface_hub |
| |
| ```python |
| from huggingface_hub import snapshot_download |
| import sys |
| |
| snapshot_download("decompute/Nebula-S-v1-lite", local_dir="./Nebula-S-v1-lite") |
| sys.path.insert(0, "./Nebula-S-v1-lite") |
| from nebula_s import load_nebula_s |
|
|
| # Auto-detects device (mps on Mac, cuda on NVIDIA, cpu fallback) |
| model, tokenizer = load_nebula_s("./Nebula-S-v1-lite") |
| ``` |
| |
| ### Option 2: Using git clone |
| |
| ```bash |
| git lfs install |
| git clone https://huggingface.co/punitdecomp/Nebula-S-v1-lite |
| ``` |
| |
| ```python |
| import sys |
| sys.path.insert(0, "./Nebula-S-v1-lite") |
| from nebula_s import load_nebula_s |
| |
| model, tokenizer = load_nebula_s("./Nebula-S-v1-lite") |
| ``` |
| |
| ### Generate a response |
| |
| ```python |
| messages = [{"role": "user", "content": "Solve step by step: what is 17 * 23?"}] |
| text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) |
| |
| device = next(model.parameters()).device |
| inputs = tokenizer(text, return_tensors="pt").to(device) |
|
|
| response = model.generate( |
| inputs["input_ids"], inputs["attention_mask"], |
| tokenizer, max_new_tokens=1024, temperature=0.7 |
| ) |
| print(response) |
| ``` |
| |
| ### Explicit device |
|
|
| ```python |
| # Mac |
| model, tokenizer = load_nebula_s("./Nebula-S-v1-lite", device="mps") |
| |
| # NVIDIA GPU |
| model, tokenizer = load_nebula_s("./Nebula-S-v1-lite", device="cuda") |
| |
| # CPU |
| model, tokenizer = load_nebula_s("./Nebula-S-v1-lite", device="cpu") |
| ``` |
|
|
| ## License |
|
|
| Apache 2.0. Backbone derived from an Apache-2.0 licensed base model. |
|
|