NRM SOTA Upload: Safetensors + Tokenizer + Config (Step 26000)
Browse files- README.md +57 -0
- chat_template.jinja +1 -0
- config.json +25 -0
- ema.pt +3 -0
- model.pt +3 -0
- optimizer_0.pt +3 -0
- special_tokens_map.json +23 -0
- tokenizer.json +0 -0
- tokenizer_config.json +0 -0
- train_state.json +1 -0
README.md
ADDED
|
@@ -0,0 +1,57 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
tags:
|
| 4 |
+
- reasoning
|
| 5 |
+
- recursive
|
| 6 |
+
- arc-agi
|
| 7 |
+
- nvyra-x
|
| 8 |
+
---
|
| 9 |
+
|
| 10 |
+
# NRM: Nvyra Recursive Reasoning Model
|
| 11 |
+
|
| 12 |
+
**Developed by Nvyra X** — Fact-Checking and Disinformation Detection Service
|
| 13 |
+
|
| 14 |
+
## Model Description
|
| 15 |
+
|
| 16 |
+
NRM (Nvyra Recursive Reasoning Model) is a state-of-the-art reasoning architecture that combines:
|
| 17 |
+
|
| 18 |
+
- **Mixture of Recursions (MoR)** - Weight-tied transformer blocks applied recursively
|
| 19 |
+
- **Multi-Head Latent Attention (MLA)** - 10× KV cache reduction (DeepSeek-V3)
|
| 20 |
+
- **ConvSwiGLU** - Enhanced nonlinearity from URM paper
|
| 21 |
+
- **Aux-Loss-Free MoE** - Bias-based expert load balancing
|
| 22 |
+
- **PonderNet** - Adaptive computation time
|
| 23 |
+
- **Multi-Token Prediction** - 4-ahead planning
|
| 24 |
+
|
| 25 |
+
## Training
|
| 26 |
+
|
| 27 |
+
- **Budget**: $115 ($25 Nebius + $90 Modal)
|
| 28 |
+
- **Hardware**: H200 NVLink GPUs
|
| 29 |
+
- **Framework**: PyTorch 2.9.1, Flash Attention 3, CUDA 12.8
|
| 30 |
+
- **Dataset**: 300K+ reasoning examples (Sudoku, ARC, Logic, Object Tracking)
|
| 31 |
+
|
| 32 |
+
## Usage
|
| 33 |
+
|
| 34 |
+
```python
|
| 35 |
+
# This model uses a custom architecture - see repository for full code
|
| 36 |
+
from safetensors.torch import load_file
|
| 37 |
+
weights = load_file("model.safetensors")
|
| 38 |
+
```
|
| 39 |
+
|
| 40 |
+
## Citation
|
| 41 |
+
|
| 42 |
+
If you use this model, please cite:
|
| 43 |
+
|
| 44 |
+
```
|
| 45 |
+
@misc{nrm2025,
|
| 46 |
+
title={NRM: Nvyra Recursive Reasoning Model},
|
| 47 |
+
author={Nvyra X Research Team},
|
| 48 |
+
year={2025},
|
| 49 |
+
url={https://huggingface.co/Feargal/nvyra-x-reasoning}
|
| 50 |
+
}
|
| 51 |
+
```
|
| 52 |
+
|
| 53 |
+
## References
|
| 54 |
+
|
| 55 |
+
- [Universal Reasoning Model (URM)](https://arxiv.org/abs/2512.14693)
|
| 56 |
+
- [DeepSeek-V3](https://arxiv.org/abs/2401.02954)
|
| 57 |
+
- [PonderNet](https://arxiv.org/abs/2107.05407)
|
chat_template.jinja
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set ns = namespace(is_first=false, is_tool=false, is_output_first=true, system_prompt='', is_first_sp=true) %}{%- for message in messages %}{%- if message['role'] == 'system' %}{%- if ns.is_first_sp %}{% set ns.system_prompt = ns.system_prompt + message['content'] %}{% set ns.is_first_sp = false %}{%- else %}{% set ns.system_prompt = ns.system_prompt + '\n\n' + message['content'] %}{%- endif %}{%- endif %}{%- endfor %}{{ bos_token }}{{ ns.system_prompt }}{%- for message in messages %}{%- if message['role'] == 'user' %}{%- set ns.is_tool = false -%}{{'<|User|>' + message['content']}}{%- endif %}{%- if message['role'] == 'assistant' and 'tool_calls' in message %}{%- set ns.is_tool = false -%}{%- for tool in message['tool_calls'] %}{%- if not ns.is_first %}{%- if message['content'] is none %}{{'<|Assistant|><|tool▁calls▁begin|><|tool▁call▁begin|>' + tool['type'] + '<|tool▁sep|>' + tool['function']['name'] + '\n' + '```json' + '\n' + tool['function']['arguments'] + '\n' + '```' + '<|tool▁call▁end|>'}}{%- else %}{{'<|Assistant|>' + message['content'] + '<|tool▁calls▁begin|><|tool▁call▁begin|>' + tool['type'] + '<|tool▁sep|>' + tool['function']['name'] + '\n' + '```json' + '\n' + tool['function']['arguments'] + '\n' + '```' + '<|tool▁call▁end|>'}}{%- endif %}{%- set ns.is_first = true -%}{%- else %}{{'\n' + '<|tool▁call▁begin|>' + tool['type'] + '<|tool▁sep|>' + tool['function']['name'] + '\n' + '```json' + '\n' + tool['function']['arguments'] + '\n' + '```' + '<|tool▁call▁end|>'}}{%- endif %}{%- endfor %}{{'<|tool▁calls▁end|><|end▁of▁sentence|>'}}{%- endif %}{%- if message['role'] == 'assistant' and 'tool_calls' not in message %}{%- if ns.is_tool %}{{'<|tool▁outputs▁end|>' + message['content'] + '<|end▁of▁sentence|>'}}{%- set ns.is_tool = false -%}{%- else %}{% set content = message['content'] %}{% if '</think>' in content %}{% set content = content.split('</think>')[-1] %}{% endif %}{{'<|Assistant|>' + content + '<|end▁of▁sentence|>'}}{%- endif %}{%- endif %}{%- if message['role'] == 'tool' %}{%- set ns.is_tool = true -%}{%- if ns.is_output_first %}{{'<|tool▁outputs▁begin|><|tool▁output▁begin|>' + message['content'] + '<|tool▁output▁end|>'}}{%- set ns.is_output_first = false %}{%- else %}{{'<|tool▁output▁begin|>' + message['content'] + '<|tool▁output▁end|>'}}{%- endif %}{%- endif %}{%- endfor -%}{% if ns.is_tool %}{{'<|tool▁outputs▁end|>'}}{% endif %}{% if add_generation_prompt and not ns.is_tool %}{{'<|Assistant|><think>\n'}}{% endif %}
|
config.json
ADDED
|
@@ -0,0 +1,25 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"architectures": [
|
| 3 |
+
"NRM"
|
| 4 |
+
],
|
| 5 |
+
"model_type": "nrm",
|
| 6 |
+
"dim": 2048,
|
| 7 |
+
"n_layers": 1,
|
| 8 |
+
"n_heads": 16,
|
| 9 |
+
"n_mem_tokens": 64,
|
| 10 |
+
"vocab_size": 32000,
|
| 11 |
+
"inner_loops": 8,
|
| 12 |
+
"outer_loops": 16,
|
| 13 |
+
"truncation_loops": 2,
|
| 14 |
+
"moe_experts": 8,
|
| 15 |
+
"experts_per_token": 2,
|
| 16 |
+
"num_shared_experts": 2,
|
| 17 |
+
"use_mla": true,
|
| 18 |
+
"kv_latent_dim": 512,
|
| 19 |
+
"rope_head_dim": 64,
|
| 20 |
+
"rope_base": 10000.0,
|
| 21 |
+
"mtp_num_heads": 4,
|
| 22 |
+
"use_conv_swiglu": true,
|
| 23 |
+
"p_exit": 0.1,
|
| 24 |
+
"tokenizer_class": "LlamaTokenizer"
|
| 25 |
+
}
|
ema.pt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:51900a1bccbb652d7ad8806fd6488b4b1ba5dcc9ade766ae751d4c9d3b0316b9
|
| 3 |
+
size 1174807974
|
model.pt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:b04e5a60a1713ebc3f270eb8b8e6a03aef2330d86075e54744f6c80c6fbac787
|
| 3 |
+
size 1174813193
|
optimizer_0.pt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:ee1c870201b15e6163a252006e8789af7d71e1de071270622193e469fd5ea027
|
| 3 |
+
size 2349076825
|
special_tokens_map.json
ADDED
|
@@ -0,0 +1,23 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"bos_token": {
|
| 3 |
+
"content": "<|begin▁of▁sentence|>",
|
| 4 |
+
"lstrip": false,
|
| 5 |
+
"normalized": false,
|
| 6 |
+
"rstrip": false,
|
| 7 |
+
"single_word": false
|
| 8 |
+
},
|
| 9 |
+
"eos_token": {
|
| 10 |
+
"content": "<|end▁of▁sentence|>",
|
| 11 |
+
"lstrip": false,
|
| 12 |
+
"normalized": false,
|
| 13 |
+
"rstrip": false,
|
| 14 |
+
"single_word": false
|
| 15 |
+
},
|
| 16 |
+
"pad_token": {
|
| 17 |
+
"content": "<|end▁of▁sentence|>",
|
| 18 |
+
"lstrip": false,
|
| 19 |
+
"normalized": false,
|
| 20 |
+
"rstrip": false,
|
| 21 |
+
"single_word": false
|
| 22 |
+
}
|
| 23 |
+
}
|
tokenizer.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
tokenizer_config.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
train_state.json
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
{"step": 26000, "val_acc": 0.9978392384694715, "val_loss": 0.01681531584014024}
|