GeorgeDraysonLocai commited on
Commit
814608a
·
verified ·
1 Parent(s): d3b5223

Upload 2 files

Browse files
Files changed (3) hide show
  1. .gitattributes +1 -0
  2. README.md +209 -0
  3. jupiter.png +3 -0
.gitattributes CHANGED
@@ -34,3 +34,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
  tokenizer.json filter=lfs diff=lfs merge=lfs -text
 
 
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
  tokenizer.json filter=lfs diff=lfs merge=lfs -text
37
+ jupiter.png filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,209 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ license: other
4
+ license_name: nvidia-nemotron-open-model-license
5
+ license_link: >-
6
+ https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-nemotron-open-model-license/
7
+ base_model: nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16
8
+ pipeline_tag: text-generation
9
+ language:
10
+ - en
11
+ - fr
12
+ - es
13
+ - it
14
+ - de
15
+ - ja
16
+ - zh
17
+ - cy
18
+ tags:
19
+ - locai
20
+ - jupiter
21
+ - pytorch
22
+ - nemotron-3
23
+ - latent-moe
24
+ - mtp
25
+ - welsh
26
+ - sovereign-ai
27
+ - lora
28
+ - post-training
29
+ datasets:
30
+ - locailabs/nemotron_terminal_filtered
31
+ - locailabs/cultural_bank_dpo_vllm
32
+ - locailabs/self_cognition_120b
33
+ - locailabs/ultrachat_120b
34
+ - locailabs/nemotron-chat-welsh
35
+ - locailabs/legislation-gov-uk_en-cy
36
+ - locailabs/cofnodycynulliad_en-cy
37
+ - locailabs/nemotron_reasoning_2000x
38
+ - nvidia/Nemotron-IF-Chat-v1
39
+ ---
40
+
41
+ ![Jupiter](jupiter.png)
42
+
43
+ # Jupiter-N-120B
44
+
45
+ Jupiter-N-120B is a post-trained variant of [NVIDIA Nemotron-3-Super-120B-A12B](https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16), developed by [Locai Labs](https://locailabs.com). The **N** denotes the Nemotron base. It adds Welsh language capability, UK cultural grounding, and improved agentic/terminal performance to the fully open Nemotron base via parameter-efficient fine-tuning (LoRA) with synthetic experience replay.
46
+
47
+ Jupiter is designed as a reproducible template for **sovereign post-training**: any nation can substitute its own cultural knowledge base, institutional corpora, and indigenous languages to produce a culturally grounded model from a shared open base.
48
+
49
+ ## Model Summary
50
+
51
+ | | |
52
+ |:---|:---|
53
+ | **Base Model** | [NVIDIA Nemotron-3-Super-120B-A12B](https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16) |
54
+ | **Total Parameters** | 120B (12B active) |
55
+ | **Architecture** | LatentMoE (Mamba-2 + MoE + Attention hybrid) with Multi-Token Prediction |
56
+ | **Post-Training Method** | LoRA (rank 16, alpha 32) with experience replay |
57
+ | **Context Length** | Up to 1M tokens |
58
+ | **Supported Languages** | English, French, German, Italian, Japanese, Spanish, Chinese + **Welsh** |
59
+ | **Reasoning** | Configurable on/off via chat template (`enable_thinking=True/False`) |
60
+ | **License** | [NVIDIA Nemotron Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-nemotron-open-model-license/) |
61
+ | **Developer** | [Locai Labs](https://locailabs.com) |
62
+ | **Release Date** | April 2026 |
63
+
64
+ ## What's New vs. Nemotron Base
65
+
66
+ - **Welsh language**: trained on professional parallel corpora from Bangor University (Senedd proceedings + UK legislation) and LLM-translated instruction-following data using a custom pipeline.
67
+ - **Agentic/terminal**: Uncertainty-curated terminal trajectories from NVIDIA's Nemotron-Terminal-Corpus, selecting the 30k highest-entropy samples where the base model has the most to learn.
68
+ - **UK cultural grounding**: CultureBank-informed synthetic data aligned to British cultural norms and conventions.
69
+ - **Synthetic Experience replay**: Forget-Me-Not framework to mitigate catastrophic forgetting during post-training.
70
+
71
+ ## Benchmarks
72
+
73
+ We evaluate Jupiter against [Nemotron-3-Super-120B](https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16) (base) and [Locai L1-Large](https://huggingface.co/locailabs/l1-large). Additional evaluations including Arena Hard v2, LiveCodeBench v5, Terminal Bench 2, and AgentHarm are currently in progress and will be added shortly.
74
+
75
+ | **Benchmark** | **Metric** | **Jupiter** | **Nemotron Base** | **L1-Large** |
76
+ |:---|:---|:---:|:---:|:---:|
77
+ | IFEval | prompt strict | 80.96 | 79.85 | 86.51 |
78
+ | IFBench | prompt strict | 57.5 | 50.7 | 43.5 |
79
+ | GSM8K | extract. match | 93.63 | 95.91 | 94.92 |
80
+ | Welsh ARC-Easy | accuracy | 72.00 | 54.00 | 92.00 |
81
+ | Welsh MMLU-Lite | accuracy | 61.25 | 56.00 | 73.00 |
82
+
83
+ All values in %, reasoning disabled. Jupiter and Nemotron Base use temperature 1.0, top-p 0.95. L1-Large uses temperature 0.7, top-p 0.8.
84
+
85
+ ## Quick Start
86
+
87
+ ### Serving with vLLM
88
+
89
+ ```bash
90
+ pip install vllm>=0.18.1
91
+
92
+ vllm serve locailabs/jupiter-n-120B \
93
+ --served-model-name locailabs/jupiter-n-120B \
94
+ --dtype auto \
95
+ --kv-cache-dtype fp8 \
96
+ --tensor-parallel-size 8 \
97
+ --max-model-len 262144 \
98
+ --enable-expert-parallel \
99
+ --trust-remote-code \
100
+ --gpu-memory-utilization 0.9 \
101
+ --enable-chunked-prefill \
102
+ --mamba-ssm-cache-dtype float16 \
103
+ --reasoning-parser nemotron_v3 \
104
+ --enable-auto-tool-choice \
105
+ --tool-call-parser qwen3_coder
106
+ ```
107
+
108
+ > **DGX Spark (2x B200):** Set `--tensor-parallel-size 2` and remove `--enable-expert-parallel`.
109
+
110
+ ### API Client
111
+
112
+ ```python
113
+ from openai import OpenAI
114
+
115
+ client = OpenAI(base_url="http://localhost:8000/v1", api_key="EMPTY")
116
+ MODEL = "locailabs/jupiter-n-120B"
117
+
118
+ # Reasoning ON (default)
119
+ response = client.chat.completions.create(
120
+ model=MODEL,
121
+ messages=[{"role": "user", "content": "Esboniwch hanes y Senedd yn Gymraeg."}],
122
+ max_tokens=16000,
123
+ temperature=1.0,
124
+ top_p=0.95,
125
+ extra_body={"chat_template_kwargs": {"enable_thinking": True}},
126
+ )
127
+ print(response.choices[0].message.content)
128
+
129
+ # Reasoning OFF
130
+ response = client.chat.completions.create(
131
+ model=MODEL,
132
+ messages=[{"role": "user", "content": "What is the capital of Wales?"}],
133
+ max_tokens=16000,
134
+ temperature=1.0,
135
+ top_p=0.95,
136
+ extra_body={"chat_template_kwargs": {"enable_thinking": False}},
137
+ )
138
+ print(response.choices[0].message.content)
139
+ ```
140
+
141
+ ## Training
142
+
143
+ ### Post-Training Data
144
+
145
+ Jupiter is fine-tuned on a curated mixture of ten datasets spanning five domains:
146
+
147
+ | **Dataset** | **Domain** | **N** |
148
+ |:---|:---|---:|
149
+ | Terminal trajectories | Terminal | 30k |
150
+ | CultureBank DPO | Cultural | 1.41k |
151
+ | Self-cognition | Identity | 2k |
152
+ | Synthetic replay (reasoning) | Replay | 2.38k |
153
+ | Synthetic replay (no reasoning) | Replay | 5.82k |
154
+ | Welsh chat | Welsh IF | 20k |
155
+ | Welsh legislation | Welsh law | 17.9k |
156
+ | Senedd proceedings | Welsh parl. | 19.6k |
157
+ | Nemotron IF Chat | Instruction following | 15k |
158
+ | Extended reasoning | Reasoning | 2.06k |
159
+
160
+ All datasets are available under the [`locailabs`](https://huggingface.co/locailabs) HuggingFace organisation, except NVIDIA's Nemotron IF Chat which is available at its [original source](https://huggingface.co/datasets/nvidia/Nemotron-IF-Chat-v1). The Extended reasoning dataset is derived from [RamAnanth1/Nemotron3-Super-Reasoning-2000x](https://huggingface.co/datasets/RamAnanth1/Nemotron3-Super-Reasoning-2000x).
161
+
162
+ ### Training Configuration
163
+
164
+ | | |
165
+ |:---|:---|
166
+ | **Method** | LoRA (rank 16, alpha 32) |
167
+ | **Epochs** | 1 |
168
+ | **Framework** | NeMo AutoModel |
169
+ | **Parallelism** | FSDP2 + Expert Parallelism (EP=8) |
170
+ | **Hardware** | 8x NVIDIA H200 GPUs |
171
+ | **Batch size** | 64 (global), 8 (local) |
172
+ | **Sequence length** | 2,048 |
173
+ | **Optimiser** | Adam (beta1=0.9, beta2=0.999) |
174
+ | **Learning rate** | 1e-5 to 1e-6 (cosine decay) |
175
+ | **Excluded layers** | Mamba `out_proj` (incompatible custom kernels) |
176
+
177
+ ### Key Techniques
178
+
179
+ - **Uncertainty-based data curation**: Terminal trajectories selected by Shannon entropy of the base model's predictive distribution, retaining the 30k samples where the model is most uncertain.
180
+ - **Experience replay (Forget-Me-Not)**: Synthetic replay data generated by the unmodified base model on UltraChat prompts, preserving existing capabilities during domain-specific fine-tuning.
181
+ - **Welsh parallel corpora**: Professional translations from Senedd (Welsh Parliament) proceedings and UK legislation, processed through a three-stage pipeline (cleaning, deduplication, instruction formatting).
182
+
183
+ ## Limitations
184
+
185
+ - Welsh evaluation relies on adapted English-origin benchmarks (ARC-Easy, MMLU) rather than native Welsh NLU tasks.
186
+ - Cultural grounding has not been validated through human evaluation.
187
+ - Self-cognition data is teacher-generated and may not generalise to adversarial identity probing.
188
+
189
+ ## Ethical Considerations
190
+
191
+ Jupiter is motivated by the principle that nations and linguistic communities should be able to adapt open foundation models to their own needs without dependence on proprietary systems. Welsh language support contributes to the digital vitality of a minority language with approximately 880,000 speakers.
192
+
193
+ Model outputs in Welsh have not undergone extensive human quality review. We encourage downstream users to apply domain-appropriate human review before deployment in high-stakes domains such as legal or medical text.
194
+
195
+ ## Citation
196
+
197
+ ```bibtex
198
+ @techreport{drayson2026jupiter_n_120b,
199
+ title = {Jupiter-N-120B Technical Report},
200
+ author = {George Drayson},
201
+ year = {2026},
202
+ institution = {Locai Labs},
203
+ url = {https://huggingface.co/locailabs/jupiter-n-120B}
204
+ }
205
+ ```
206
+
207
+ ## Acknowledgements
208
+
209
+ Jupiter builds on [NVIDIA Nemotron-3-Super](https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16). Welsh parallel corpora are sourced from [Techiaith (Bangor University)](https://huggingface.co/techiaith). Cultural data is informed by [CultureBank](https://github.com/SALT-NLP/CultureBank). The Extended reasoning dataset is derived from [RamAnanth1/Nemotron3-Super-Reasoning-2000x](https://huggingface.co/datasets/RamAnanth1/Nemotron3-Super-Reasoning-2000x).
jupiter.png ADDED

Git LFS Details

  • SHA256: 96337e2dfc2f6b2688c529fd63bdebb0b6d56e295916e7a2c5baedba64bb7385
  • Pointer size: 132 Bytes
  • Size of remote file: 2.38 MB