Instructions to use respinosamena/Helios-Nova-306M-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use respinosamena/Helios-Nova-306M-Instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="respinosamena/Helios-Nova-306M-Instruct")

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("respinosamena/Helios-Nova-306M-Instruct", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use respinosamena/Helios-Nova-306M-Instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "respinosamena/Helios-Nova-306M-Instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "respinosamena/Helios-Nova-306M-Instruct",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/respinosamena/Helios-Nova-306M-Instruct

SGLang

How to use respinosamena/Helios-Nova-306M-Instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "respinosamena/Helios-Nova-306M-Instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "respinosamena/Helios-Nova-306M-Instruct",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "respinosamena/Helios-Nova-306M-Instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "respinosamena/Helios-Nova-306M-Instruct",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use respinosamena/Helios-Nova-306M-Instruct with Docker Model Runner:
```
docker model run hf.co/respinosamena/Helios-Nova-306M-Instruct
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Helios Nova 306M-Instruct

Helios Nova 306M-Instruct is the supervised-fine-tuned (SFT) instruction model of the Helios Nova family. It is built from Helios-Nova-306M — a 306M-parameter dense model pre-trained from scratch on 50B tokens of FineWeb-Edu — and fine-tuned on smol-smoltalk, the dataset HuggingFace used to build SmolLM2-360M-Instruct.

The model follows instructions, answers questions, holds multi-turn conversations, and performs basic rewriting and summarization, within a 306M-parameter footprint. It inherits the base model's data efficiency: at roughly 80× less pre-training data, the family reaches 96% of SmolLM2-360M on commonsense reasoning, measured on an identical evaluation harness.

For a more capable, reinforcement-learning-aligned version, see Helios-Nova-306M-Instruct-2606.

Usage

The reference chat client lives in the GitHub repository and downloads this model automatically on first run.

git clone https://github.com/rafaelespinosamena/Helios-Nova-306M-Instruct.git
cd Helios-Nova-306M-Instruct
pip install -r requirements.txt
python instruct_chat.py          # selects CUDA, Apple MPS, or CPU automatically

Python API:

import torch
from transformers import AutoTokenizer
from HeliosNova import HeliosNova

model = HeliosNova.from_pretrained("respinosamena/Helios-Nova-306M-Instruct").eval()
tok = AutoTokenizer.from_pretrained("respinosamena/Helios-Nova-306M-Instruct")

prompt = "### System:\nYou are a helpful assistant.\n### User:\nExplain photosynthesis in two sentences.\n### Assistant:\n"
ids = [tok.bos_token_id] + tok.encode(prompt, add_special_tokens=False)
out = model.generate(torch.tensor([ids]), max_new_tokens=256, temperature=0.7, top_k=40)
print(tok.decode(out[0], skip_special_tokens=True))

The model uses a plaintext chat template (### System: / ### User: / ### Assistant:) and ends each turn with the EOS token. Generation should stop on the EOS token or a new turn marker; the chat client handles this for you.

Model architecture

Component	Value
Parameters	305.8M (dense)
Layers / hidden size	24 / 1024
Attention	Grouped-Query Attention — 16 query heads, 4 key-value heads, head dimension 64
Feed-forward	SwiGLU, intermediate size 3072
Positional encoding / norm	RoPE (theta 10,000), QK-Norm, RMSNorm (pre-norm), tied embeddings
Tokenizer / context	Custom 16k BPE / 2048 tokens

Architecture diagram

Fine-tuning

Supervised fine-tuning on smol-smoltalk (~500K conversations) with prompt masking: the loss is computed only on assistant tokens, while system and user tokens are masked. This teaches the model to respond without learning to reproduce prompts. Hyperparameters were chosen with a successive-halving sweep on a single H100.

Parameter	Value
Learning rate	5e-5 (cosine decay), 150-step warmup
Effective batch size	64 (8 micro × 8 accumulation)
Weight decay / grad clip	0.1 / 1.0
Precision	bf16
Duration	~~0.5 epochs (~~1 hour on H100)
Optimizer	AdamW (betas 0.9 / 0.95)

Why half an epoch

At 306M parameters, the model is capacity-bound. Multi-epoch SFT on smol-smoltalk induces catastrophic forgetting: instruction-following improves while general knowledge acquired during pre-training erodes. Training is stopped at approximately 0.5 epochs — the point that balances instruction-following against retained base knowledge.

Catastrophic forgetting trade-off

Evaluation

SFT preserves the base model's capabilities, so the family's benchmark profile is that of Helios-Nova-306M. All models below were re-run through one identical lm-evaluation-harness configuration (0-shot).

Capability versus pre-training token budget

Metric (0-shot)	Helios-306M (50B tok)	SmolLM2-360M (~4T)	Qwen2.5-0.5B (~18T)
Winogrande	57.2	57.9	56.3
PIQA	68.1	72.6	70.6
OpenBookQA	34.4	37.6	35.4
HellaSwag	44.7	52.5	49.5
ARC (avg)	42.8	53.4	45.5
MMLU	24.3	25.3	47.6
Commonsense reasoning (Winogrande + PIQA)	62.65	65.25	63.45

96% of SmolLM2-360M on commonsense reasoning at ~80× less data; ties it on Winogrande (99%). The model trails on tasks bounded by data volume — broad recall (TriviaQA) and exam-style knowledge (MMLU). Helios Nova is data-efficient, not knowledge-rich.

Full benchmark sweep

Intended use and limitations

Suitable for general conversation, instruction following, commonsense reasoning, rewriting and summarization, and on-device or CPU inference; and as a base for further alignment (DPO, GRPO, domain tuning).

Not suitable as a source of factual knowledge: a 306M model trained on 50B educational tokens has limited world knowledge and performs near chance on broad recall (TriviaQA) and exam-style benchmarks (MMLU). It can produce inaccurate or outdated content and should not be used for high-stakes decisions without verification. English-only; no safety alignment (no RLHF or safety filtering).

The Helios Nova family

Model	Description
Helios-Nova-306M	From-scratch base model (50B tokens)
Helios-Nova-306M-Instruct (this model)	SFT instruction model (PyTorch)
Helios-Nova-306M-Instruct-GGUF	GGUF build of this model
Helios-Nova-306M-Instruct-2606	GRPO-aligned instruction model

Citation

@misc{espinosamena2026heliosnovainstruct,
  title  = {Helios Nova 306M-Instruct: an instruction-tuned data-efficient language model},
  author = {Espinosa Mena, Rafael},
  year   = {2026},
  howpublished = {\url{https://huggingface.co/respinosamena/Helios-Nova-306M-Instruct}}
}

Contact

Rafael Espinosa Mena — rafaelespinosamena@gmail.com

License

Downloads last month: 25

Safetensors

Model size

0.3B params

Tensor type

F32

Model tree for respinosamena/Helios-Nova-306M-Instruct

Base model

respinosamena/Helios-Nova-306M

Finetuned

(1)

this model

Quantizations

1 model

respinosamena
/

Helios-Nova-306M-Instruct