Instructions to use locailabs/Jupiter-N-120B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use locailabs/Jupiter-N-120B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="locailabs/Jupiter-N-120B", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("locailabs/Jupiter-N-120B", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("locailabs/Jupiter-N-120B", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use locailabs/Jupiter-N-120B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "locailabs/Jupiter-N-120B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "locailabs/Jupiter-N-120B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/locailabs/Jupiter-N-120B

SGLang

How to use locailabs/Jupiter-N-120B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "locailabs/Jupiter-N-120B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "locailabs/Jupiter-N-120B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "locailabs/Jupiter-N-120B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "locailabs/Jupiter-N-120B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use locailabs/Jupiter-N-120B with Docker Model Runner:
```
docker model run hf.co/locailabs/Jupiter-N-120B
```

Jupiter-N-120B

Jupiter-N-120B is a post-trained variant of NVIDIA Nemotron-3-Super-120B-A12B, developed by Locai Labs. The N denotes the Nemotron base. Jupiter-N improves instruction following (+4.4 IFBench), agentic capability (+9.1 Terminal Bench 2 medium tasks), and adds Welsh language support (+18 ARC-Easy, +5.25 MMLU-Lite) and UK cultural grounding — all while preserving the base model's existing strengths through our Forget-Me-Not™ framework. See the technical report for full details.

Jupiter-N is designed as a reproducible template for sovereign post-training: any nation can substitute its own cultural knowledge base, institutional corpora, and indigenous languages to produce a culturally grounded model from a shared open base.

Model Summary


Base Model	NVIDIA Nemotron-3-Super-120B-A12B
Total Parameters	120B (12B active)
Architecture	LatentMoE (Mamba-2 + MoE + Attention hybrid) with Multi-Token Prediction
Post-Training Method	LoRA (rank 16, alpha 32) with experience replay
Context Length	Up to 1M tokens
Supported Languages	English, French, German, Italian, Japanese, Spanish, Chinese + Welsh
Reasoning	Configurable on/off via chat template (`enable_thinking=True/False`)
License	NVIDIA Nemotron Open Model License
Developer	Locai Labs
Release Date	April 2026

What's New vs. Nemotron Base

Welsh language: trained on professional parallel corpora from Bangor University (Senedd proceedings + UK legislation) and LLM-translated instruction-following data using a custom pipeline.
Agentic/terminal: Uncertainty-curated terminal trajectories from NVIDIA's Nemotron-Terminal-Corpus, selecting the 30k highest-entropy samples where the base model has the most to learn.
UK cultural grounding: CultureBank-informed synthetic data aligned to British cultural norms and conventions.
Synthetic Experience replay: Forget-Me-Not framework to mitigate catastrophic forgetting during post-training.

Benchmarks

We evaluate Jupiter-N against Nemotron-3-Super-120B (base). Full details are in the technical report.

		Reasoning off		Reasoning on
Benchmark	Metric	Jupiter-N	Nemotron	Jupiter-N	Nemotron
IFEval	prompt strict	80.96	79.85	90.20	90.20
IFBench	prompt loose	41.8	37.4	73.8	69.7
AgentHarm	harm ↓	73.4	78.6	53.8	55.4
Terminal Bench 2 (medium)	accuracy	–	–	52.7	43.6
GSM8K	accuracy	–	–	94.01	93.56
Welsh ARC-Easy	accuracy	72.00	54.00	–	–
Welsh MMLU-Lite	accuracy	61.25	56.00	–	–

All values in %. Both models use temperature 1.0, top-p 0.95.

Quick Start

Serving with vLLM

pip install vllm>=0.18.1

vllm serve locailabs/Jupiter-N-120B \
  --served-model-name locailabs/Jupiter-N-120B \
  --dtype auto \
  --kv-cache-dtype fp8 \
  --tensor-parallel-size 8 \
  --max-model-len 262144 \
  --enable-expert-parallel \
  --trust-remote-code \
  --gpu-memory-utilization 0.9 \
  --enable-chunked-prefill \
  --mamba-ssm-cache-dtype float16 \
  --reasoning-parser nemotron_v3 \
  --enable-auto-tool-choice \
  --tool-call-parser qwen3_coder

DGX Spark (2x B200): Set --tensor-parallel-size 2 and remove --enable-expert-parallel.

API Client

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="EMPTY")
MODEL = "locailabs/Jupiter-N-120B"

# Reasoning ON (default)
response = client.chat.completions.create(
    model=MODEL,
    messages=[{"role": "user", "content": "Esboniwch hanes y Senedd yn Gymraeg."}],
    max_tokens=16000,
    temperature=1.0,
    top_p=0.95,
    extra_body={"chat_template_kwargs": {"enable_thinking": True}},
)
print(response.choices[0].message.content)

# Reasoning OFF
response = client.chat.completions.create(
    model=MODEL,
    messages=[{"role": "user", "content": "What is the capital of Wales?"}],
    max_tokens=16000,
    temperature=1.0,
    top_p=0.95,
    extra_body={"chat_template_kwargs": {"enable_thinking": False}},
)
print(response.choices[0].message.content)

Training

Post-Training Data

Jupiter-N is fine-tuned on a curated mixture of nine datasets spanning five domains:

Dataset	Domain	N
Terminal trajectories	Terminal	30k
UK cultural alignment	Cultural	1.41k
Self-cognition	Identity	2k
Synthetic replay	Replay	8.2k
Welsh chat	Welsh	20k
Welsh legislation	Welsh	17.9k
Senedd proceedings	Welsh	19.6k
Nemotron IF Chat	Instruction following	15k
Extended reasoning	Reasoning	2.06k

All datasets are available under the locailabs HuggingFace organisation, except NVIDIA's Nemotron IF Chat which is available at its original source. The Extended reasoning dataset is derived from RamAnanth1/Nemotron3-Super-Reasoning-2000x.

Training Configuration


Method	LoRA (rank 16, alpha 32)
Epochs	1
Framework	NeMo AutoModel
Parallelism	FSDP2 + Expert Parallelism (EP=8)
Hardware	8x NVIDIA H200 GPUs
Batch size	64 (global), 8 (local)
Sequence length	2,048
Optimiser	Adam (beta1=0.9, beta2=0.999)
Learning rate	1e-5 to 1e-6 (cosine decay)
Excluded layers	Mamba `out_proj` (incompatible custom kernels)

Key Techniques

Uncertainty-based data curation: Terminal trajectories selected by Shannon entropy of the base model's predictive distribution, retaining the 30k samples where the model is most uncertain.
Experience replay (Forget-Me-Not): Synthetic replay data generated by the unmodified base model on UltraChat prompts, preserving existing capabilities during domain-specific fine-tuning.
Welsh parallel corpora: Professional translations from Senedd (Welsh Parliament) proceedings and UK legislation, processed through a three-stage pipeline (cleaning, deduplication, instruction formatting).

Limitations

Welsh evaluation relies on adapted English-origin benchmarks (ARC-Easy, MMLU) rather than native Welsh NLU tasks.
Cultural grounding has not been validated through human evaluation.
Self-cognition data is teacher-generated and may not generalise to adversarial identity probing.

Ethical Considerations

Jupiter is motivated by the principle that nations and linguistic communities should be able to adapt open foundation models to their own needs without dependence on proprietary systems. Welsh language support contributes to the digital vitality of a minority language with approximately 880,000 speakers.

Model outputs in Welsh have not undergone extensive human quality review. We encourage downstream users to apply domain-appropriate human review before deployment in high-stakes domains such as legal or medical text.

Citation

@article{drayson2026jupiter,
  title   = {Jupiter-N Technical Report},
  author  = {Drayson, George},
  journal = {arXiv preprint arXiv:2604.17429},
  year    = {2026},
  url     = {https://arxiv.org/abs/2604.17429}
}