library_name: transformers
license: other
license_name: nvidia-nemotron-open-model-license
license_link: >-
https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-nemotron-open-model-license/
pipeline_tag: text-generation
language:
- en
- fr
- es
- it
- de
- ja
- zh
- cy
tags:
- locai
- jupiter
- pytorch
- nemotron-3
- latent-moe
- welsh
- sovereign-ai
- post-training
track_downloads: true
base_model:
- nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16
Jupiter-N-120B
Jupiter-N-120B is a post-trained variant of NVIDIA Nemotron-3-Super-120B-A12B, developed by Locai Labs. The N denotes the Nemotron base. Jupiter-N improves instruction following (+4.4 IFBench), agentic capability (+9.1 Terminal Bench 2 medium tasks), and adds Welsh language support (+18 ARC-Easy, +5.25 MMLU-Lite) and UK cultural grounding β all while preserving the base model's existing strengths through our Forget-Me-Not experience replay framework. See the technical report for full details.
Jupiter-N is designed as a reproducible template for sovereign post-training: any nation can substitute its own cultural knowledge base, institutional corpora, and indigenous languages to produce a culturally grounded model from a shared open base.
Model Summary
| Base Model | NVIDIA Nemotron-3-Super-120B-A12B |
| Total Parameters | 120B (12B active) |
| Architecture | LatentMoE (Mamba-2 + MoE + Attention hybrid) with Multi-Token Prediction |
| Post-Training Method | LoRA (rank 16, alpha 32) with experience replay |
| Context Length | Up to 1M tokens |
| Supported Languages | English, French, German, Italian, Japanese, Spanish, Chinese + Welsh |
| Reasoning | Configurable on/off via chat template (enable_thinking=True/False) |
| License | NVIDIA Nemotron Open Model License |
| Developer | Locai Labs |
| Release Date | April 2026 |
What's New vs. Nemotron Base
- Welsh language: trained on professional parallel corpora from Bangor University (Senedd proceedings + UK legislation) and LLM-translated instruction-following data using a custom pipeline.
- Agentic/terminal: Uncertainty-curated terminal trajectories from NVIDIA's Nemotron-Terminal-Corpus, selecting the 30k highest-entropy samples where the base model has the most to learn.
- UK cultural grounding: CultureBank-informed synthetic data aligned to British cultural norms and conventions.
- Synthetic Experience replay: Forget-Me-Not framework to mitigate catastrophic forgetting during post-training.
Benchmarks
We evaluate Jupiter-N against Nemotron-3-Super-120B (base). Full details are in the technical report.
| Reasoning off | Reasoning on | ||||
|---|---|---|---|---|---|
| Benchmark | Metric | Jupiter-N | Nemotron | Jupiter-N | Nemotron |
| IFEval | prompt strict | 80.96 | 79.85 | 90.20 | 90.20 |
| IFBench | prompt loose | 41.8 | 37.4 | 73.8 | 69.7 |
| AgentHarm | harm β | 73.4 | 78.6 | 53.8 | 55.4 |
| Terminal Bench 2 (medium) | accuracy | β | β | 52.7 | 43.6 |
| GSM8K | accuracy | β | β | 94.01 | 93.56 |
| Welsh ARC-Easy | accuracy | 72.00 | 54.00 | β | β |
| Welsh MMLU-Lite | accuracy | 61.25 | 56.00 | β | β |
All values in %. Both models use temperature 1.0, top-p 0.95.
Quick Start
Serving with vLLM
pip install vllm>=0.18.1
vllm serve locailabs/Jupiter-N-120B \
--served-model-name locailabs/Jupiter-N-120B \
--dtype auto \
--kv-cache-dtype fp8 \
--tensor-parallel-size 8 \
--max-model-len 262144 \
--enable-expert-parallel \
--trust-remote-code \
--gpu-memory-utilization 0.9 \
--enable-chunked-prefill \
--mamba-ssm-cache-dtype float16 \
--reasoning-parser nemotron_v3 \
--enable-auto-tool-choice \
--tool-call-parser qwen3_coder
DGX Spark (2x B200): Set
--tensor-parallel-size 2and remove--enable-expert-parallel.
API Client
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="EMPTY")
MODEL = "locailabs/Jupiter-N-120B"
# Reasoning ON (default)
response = client.chat.completions.create(
model=MODEL,
messages=[{"role": "user", "content": "Esboniwch hanes y Senedd yn Gymraeg."}],
max_tokens=16000,
temperature=1.0,
top_p=0.95,
extra_body={"chat_template_kwargs": {"enable_thinking": True}},
)
print(response.choices[0].message.content)
# Reasoning OFF
response = client.chat.completions.create(
model=MODEL,
messages=[{"role": "user", "content": "What is the capital of Wales?"}],
max_tokens=16000,
temperature=1.0,
top_p=0.95,
extra_body={"chat_template_kwargs": {"enable_thinking": False}},
)
print(response.choices[0].message.content)
Training
Post-Training Data
Jupiter-N is fine-tuned on a curated mixture of nine datasets spanning five domains:
| Dataset | Domain | N |
|---|---|---|
| Terminal trajectories | Terminal | 30k |
| UK cultural alignment | Cultural | 1.41k |
| Self-cognition | Identity | 2k |
| Synthetic replay | Replay | 8.2k |
| Welsh chat | Welsh | 20k |
| Welsh legislation | Welsh | 17.9k |
| Senedd proceedings | Welsh | 19.6k |
| Nemotron IF Chat | Instruction following | 15k |
| Extended reasoning | Reasoning | 2.06k |
All datasets are available under the locailabs HuggingFace organisation, except NVIDIA's Nemotron IF Chat which is available at its original source. The Extended reasoning dataset is derived from RamAnanth1/Nemotron3-Super-Reasoning-2000x.
Training Configuration
| Method | LoRA (rank 16, alpha 32) |
| Epochs | 1 |
| Framework | NeMo AutoModel |
| Parallelism | FSDP2 + Expert Parallelism (EP=8) |
| Hardware | 8x NVIDIA H200 GPUs |
| Batch size | 64 (global), 8 (local) |
| Sequence length | 2,048 |
| Optimiser | Adam (beta1=0.9, beta2=0.999) |
| Learning rate | 1e-5 to 1e-6 (cosine decay) |
| Excluded layers | Mamba out_proj (incompatible custom kernels) |
Key Techniques
- Uncertainty-based data curation: Terminal trajectories selected by Shannon entropy of the base model's predictive distribution, retaining the 30k samples where the model is most uncertain.
- Experience replay (Forget-Me-Not): Synthetic replay data generated by the unmodified base model on UltraChat prompts, preserving existing capabilities during domain-specific fine-tuning.
- Welsh parallel corpora: Professional translations from Senedd (Welsh Parliament) proceedings and UK legislation, processed through a three-stage pipeline (cleaning, deduplication, instruction formatting).
Limitations
- Welsh evaluation relies on adapted English-origin benchmarks (ARC-Easy, MMLU) rather than native Welsh NLU tasks.
- Cultural grounding has not been validated through human evaluation.
- Self-cognition data is teacher-generated and may not generalise to adversarial identity probing.
Ethical Considerations
Jupiter is motivated by the principle that nations and linguistic communities should be able to adapt open foundation models to their own needs without dependence on proprietary systems. Welsh language support contributes to the digital vitality of a minority language with approximately 880,000 speakers.
Model outputs in Welsh have not undergone extensive human quality review. We encourage downstream users to apply domain-appropriate human review before deployment in high-stakes domains such as legal or medical text.
Citation
@article{drayson2026jupiter,
title = {Jupiter-N Technical Report},
author = {Drayson, George},
journal = {arXiv preprint arXiv:2604.17429},
year = {2026},
url = {https://arxiv.org/abs/2604.17429}
}
Acknowledgements
Jupiter builds on NVIDIA Nemotron-3-Super. Welsh parallel corpora are sourced from Techiaith (Bangor University). Cultural data is informed by CultureBank. The Extended reasoning dataset is derived from RamAnanth1/Nemotron3-Super-Reasoning-2000x.
