Feargal commited on
Commit
e91ae95
·
verified ·
1 Parent(s): 64521ad

NRM SOTA Upload: Safetensors + Tokenizer + Config (Step 26000)

Browse files
README.md ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - reasoning
5
+ - recursive
6
+ - arc-agi
7
+ - nvyra-x
8
+ ---
9
+
10
+ # NRM: Nvyra Recursive Reasoning Model
11
+
12
+ **Developed by Nvyra X** — Fact-Checking and Disinformation Detection Service
13
+
14
+ ## Model Description
15
+
16
+ NRM (Nvyra Recursive Reasoning Model) is a state-of-the-art reasoning architecture that combines:
17
+
18
+ - **Mixture of Recursions (MoR)** - Weight-tied transformer blocks applied recursively
19
+ - **Multi-Head Latent Attention (MLA)** - 10× KV cache reduction (DeepSeek-V3)
20
+ - **ConvSwiGLU** - Enhanced nonlinearity from URM paper
21
+ - **Aux-Loss-Free MoE** - Bias-based expert load balancing
22
+ - **PonderNet** - Adaptive computation time
23
+ - **Multi-Token Prediction** - 4-ahead planning
24
+
25
+ ## Training
26
+
27
+ - **Budget**: $115 ($25 Nebius + $90 Modal)
28
+ - **Hardware**: H200 NVLink GPUs
29
+ - **Framework**: PyTorch 2.9.1, Flash Attention 3, CUDA 12.8
30
+ - **Dataset**: 300K+ reasoning examples (Sudoku, ARC, Logic, Object Tracking)
31
+
32
+ ## Usage
33
+
34
+ ```python
35
+ # This model uses a custom architecture - see repository for full code
36
+ from safetensors.torch import load_file
37
+ weights = load_file("model.safetensors")
38
+ ```
39
+
40
+ ## Citation
41
+
42
+ If you use this model, please cite:
43
+
44
+ ```
45
+ @misc{nrm2025,
46
+ title={NRM: Nvyra Recursive Reasoning Model},
47
+ author={Nvyra X Research Team},
48
+ year={2025},
49
+ url={https://huggingface.co/Feargal/nvyra-x-reasoning}
50
+ }
51
+ ```
52
+
53
+ ## References
54
+
55
+ - [Universal Reasoning Model (URM)](https://arxiv.org/abs/2512.14693)
56
+ - [DeepSeek-V3](https://arxiv.org/abs/2401.02954)
57
+ - [PonderNet](https://arxiv.org/abs/2107.05407)
chat_template.jinja ADDED
@@ -0,0 +1 @@
 
 
1
+ {% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set ns = namespace(is_first=false, is_tool=false, is_output_first=true, system_prompt='', is_first_sp=true) %}{%- for message in messages %}{%- if message['role'] == 'system' %}{%- if ns.is_first_sp %}{% set ns.system_prompt = ns.system_prompt + message['content'] %}{% set ns.is_first_sp = false %}{%- else %}{% set ns.system_prompt = ns.system_prompt + '\n\n' + message['content'] %}{%- endif %}{%- endif %}{%- endfor %}{{ bos_token }}{{ ns.system_prompt }}{%- for message in messages %}{%- if message['role'] == 'user' %}{%- set ns.is_tool = false -%}{{'<|User|>' + message['content']}}{%- endif %}{%- if message['role'] == 'assistant' and 'tool_calls' in message %}{%- set ns.is_tool = false -%}{%- for tool in message['tool_calls'] %}{%- if not ns.is_first %}{%- if message['content'] is none %}{{'<|Assistant|><|tool▁calls▁begin|><|tool▁call▁begin|>' + tool['type'] + '<|tool▁sep|>' + tool['function']['name'] + '\n' + '```json' + '\n' + tool['function']['arguments'] + '\n' + '```' + '<|tool▁call▁end|>'}}{%- else %}{{'<|Assistant|>' + message['content'] + '<|tool▁calls▁begin|><|tool▁call▁begin|>' + tool['type'] + '<|tool▁sep|>' + tool['function']['name'] + '\n' + '```json' + '\n' + tool['function']['arguments'] + '\n' + '```' + '<|tool▁call▁end|>'}}{%- endif %}{%- set ns.is_first = true -%}{%- else %}{{'\n' + '<|tool▁call▁begin|>' + tool['type'] + '<|tool▁sep|>' + tool['function']['name'] + '\n' + '```json' + '\n' + tool['function']['arguments'] + '\n' + '```' + '<|tool▁call▁end|>'}}{%- endif %}{%- endfor %}{{'<|tool▁calls▁end|><|end▁of▁sentence|>'}}{%- endif %}{%- if message['role'] == 'assistant' and 'tool_calls' not in message %}{%- if ns.is_tool %}{{'<|tool▁outputs▁end|>' + message['content'] + '<|end▁of▁sentence|>'}}{%- set ns.is_tool = false -%}{%- else %}{% set content = message['content'] %}{% if '</think>' in content %}{% set content = content.split('</think>')[-1] %}{% endif %}{{'<|Assistant|>' + content + '<|end▁of▁sentence|>'}}{%- endif %}{%- endif %}{%- if message['role'] == 'tool' %}{%- set ns.is_tool = true -%}{%- if ns.is_output_first %}{{'<|tool▁outputs▁begin|><|tool▁output▁begin|>' + message['content'] + '<|tool▁output▁end|>'}}{%- set ns.is_output_first = false %}{%- else %}{{'<|tool▁output▁begin|>' + message['content'] + '<|tool▁output▁end|>'}}{%- endif %}{%- endif %}{%- endfor -%}{% if ns.is_tool %}{{'<|tool▁outputs▁end|>'}}{% endif %}{% if add_generation_prompt and not ns.is_tool %}{{'<|Assistant|><think>\n'}}{% endif %}
config.json ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "NRM"
4
+ ],
5
+ "model_type": "nrm",
6
+ "dim": 2048,
7
+ "n_layers": 1,
8
+ "n_heads": 16,
9
+ "n_mem_tokens": 64,
10
+ "vocab_size": 32000,
11
+ "inner_loops": 8,
12
+ "outer_loops": 16,
13
+ "truncation_loops": 2,
14
+ "moe_experts": 8,
15
+ "experts_per_token": 2,
16
+ "num_shared_experts": 2,
17
+ "use_mla": true,
18
+ "kv_latent_dim": 512,
19
+ "rope_head_dim": 64,
20
+ "rope_base": 10000.0,
21
+ "mtp_num_heads": 4,
22
+ "use_conv_swiglu": true,
23
+ "p_exit": 0.1,
24
+ "tokenizer_class": "LlamaTokenizer"
25
+ }
ema.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:51900a1bccbb652d7ad8806fd6488b4b1ba5dcc9ade766ae751d4c9d3b0316b9
3
+ size 1174807974
model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b04e5a60a1713ebc3f270eb8b8e6a03aef2330d86075e54744f6c80c6fbac787
3
+ size 1174813193
optimizer_0.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ee1c870201b15e6163a252006e8789af7d71e1de071270622193e469fd5ea027
3
+ size 2349076825
special_tokens_map.json ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<|begin▁of▁sentence|>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "<|end▁of▁sentence|>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "<|end▁of▁sentence|>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ }
23
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
The diff for this file is too large to render. See raw diff
 
train_state.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"step": 26000, "val_acc": 0.9978392384694715, "val_loss": 0.01681531584014024}