NRM SOTA Upload: Safetensors + Tokenizer + Config (Step 26000)

Browse files

Files changed (10) hide show

README.md +57 -0
chat_template.jinja +1 -0
config.json +25 -0
ema.pt +3 -0
model.pt +3 -0
optimizer_0.pt +3 -0
special_tokens_map.json +23 -0
tokenizer.json +0 -0
tokenizer_config.json +0 -0
train_state.json +1 -0

README.md ADDED Viewed

	@@ -0,0 +1,57 @@

+---
+license: apache-2.0
+tags:
+  - reasoning
+  - recursive
+  - arc-agi
+  - nvyra-x
+---
+# NRM: Nvyra Recursive Reasoning Model
+**Developed by Nvyra X** — Fact-Checking and Disinformation Detection Service
+## Model Description
+NRM (Nvyra Recursive Reasoning Model) is a state-of-the-art reasoning architecture that combines:
+- **Mixture of Recursions (MoR)** - Weight-tied transformer blocks applied recursively
+- **Multi-Head Latent Attention (MLA)** - 10× KV cache reduction (DeepSeek-V3)
+- **ConvSwiGLU** - Enhanced nonlinearity from URM paper
+- **Aux-Loss-Free MoE** - Bias-based expert load balancing
+- **PonderNet** - Adaptive computation time
+- **Multi-Token Prediction** - 4-ahead planning
+## Training
+- **Budget**: $115 ($25 Nebius + $90 Modal)
+- **Hardware**: H200 NVLink GPUs
+- **Framework**: PyTorch 2.9.1, Flash Attention 3, CUDA 12.8
+- **Dataset**: 300K+ reasoning examples (Sudoku, ARC, Logic, Object Tracking)
+## Usage
+```python
+# This model uses a custom architecture - see repository for full code
+from safetensors.torch import load_file
+weights = load_file("model.safetensors")
+```
+## Citation
+If you use this model, please cite:
+```
+@misc{nrm2025,
+  title={NRM: Nvyra Recursive Reasoning Model},
+  author={Nvyra X Research Team},
+  year={2025},
+  url={https://huggingface.co/Feargal/nvyra-x-reasoning}
+}
+```
+## References
+- [Universal Reasoning Model (URM)](https://arxiv.org/abs/2512.14693)
+- [DeepSeek-V3](https://arxiv.org/abs/2401.02954)
+- [PonderNet](https://arxiv.org/abs/2107.05407)

chat_template.jinja ADDED Viewed

	@@ -0,0 +1 @@

+ {% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set ns = namespace(is_first=false, is_tool=false, is_output_first=true, system_prompt='', is_first_sp=true) %}{%- for message in messages %}{%- if message['role'] == 'system' %}{%- if ns.is_first_sp %}{% set ns.system_prompt = ns.system_prompt + message['content'] %}{% set ns.is_first_sp = false %}{%- else %}{% set ns.system_prompt = ns.system_prompt + '\n\n' + message['content'] %}{%- endif %}{%- endif %}{%- endfor %}{{ bos_token }}{{ ns.system_prompt }}{%- for message in messages %}{%- if message['role'] == 'user' %}{%- set ns.is_tool = false -%}{{'<｜User｜>' + message['content']}}{%- endif %}{%- if message['role'] == 'assistant' and 'tool_calls' in message %}{%- set ns.is_tool = false -%}{%- for tool in message['tool_calls'] %}{%- if not ns.is_first %}{%- if message['content'] is none %}{{'<｜Assistant｜><｜tool▁calls▁begin｜><｜tool▁call▁begin｜>' + tool['type'] + '<｜tool▁sep｜>' + tool['function']['name'] + '\n' + '```json' + '\n' + tool['function']['arguments'] + '\n' + '```' + '<｜tool▁call▁end｜>'}}{%- else %}{{'<｜Assistant｜>' + message['content'] + '<｜tool▁calls▁begin｜><｜tool▁call▁begin｜>' + tool['type'] + '<｜tool▁sep｜>' + tool['function']['name'] + '\n' + '```json' + '\n' + tool['function']['arguments'] + '\n' + '```' + '<｜tool▁call▁end｜>'}}{%- endif %}{%- set ns.is_first = true -%}{%- else %}{{'\n' + '<｜tool▁call▁begin｜>' + tool['type'] + '<｜tool▁sep｜>' + tool['function']['name'] + '\n' + '```json' + '\n' + tool['function']['arguments'] + '\n' + '```' + '<｜tool▁call▁end｜>'}}{%- endif %}{%- endfor %}{{'<｜tool▁calls▁end｜><｜end▁of▁sentence｜>'}}{%- endif %}{%- if message['role'] == 'assistant' and 'tool_calls' not in message %}{%- if ns.is_tool %}{{'<｜tool▁outputs▁end｜>' + message['content'] + '<｜end▁of▁sentence｜>'}}{%- set ns.is_tool = false -%}{%- else %}{% set content = message['content'] %}{% if '</think>' in content %}{% set content = content.split('</think>')[-1] %}{% endif %}{{'<｜Assistant｜>' + content + '<｜end▁of▁sentence｜>'}}{%- endif %}{%- endif %}{%- if message['role'] == 'tool' %}{%- set ns.is_tool = true -%}{%- if ns.is_output_first %}{{'<｜tool▁outputs▁begin｜><｜tool▁output▁begin｜>' + message['content'] + '<｜tool▁output▁end｜>'}}{%- set ns.is_output_first = false %}{%- else %}{{'<｜tool▁output▁begin｜>' + message['content'] + '<｜tool▁output▁end｜>'}}{%- endif %}{%- endif %}{%- endfor -%}{% if ns.is_tool %}{{'<｜tool▁outputs▁end｜>'}}{% endif %}{% if add_generation_prompt and not ns.is_tool %}{{'<｜Assistant｜><think>\n'}}{% endif %}

config.json ADDED Viewed

	@@ -0,0 +1,25 @@

+{
+  "architectures": [
+    "NRM"
+  ],
+  "model_type": "nrm",
+  "dim": 2048,
+  "n_layers": 1,
+  "n_heads": 16,
+  "n_mem_tokens": 64,
+  "vocab_size": 32000,
+  "inner_loops": 8,
+  "outer_loops": 16,
+  "truncation_loops": 2,
+  "moe_experts": 8,
+  "experts_per_token": 2,
+  "num_shared_experts": 2,
+  "use_mla": true,
+  "kv_latent_dim": 512,
+  "rope_head_dim": 64,
+  "rope_base": 10000.0,
+  "mtp_num_heads": 4,
+  "use_conv_swiglu": true,
+  "p_exit": 0.1,
+  "tokenizer_class": "LlamaTokenizer"
+}

ema.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:51900a1bccbb652d7ad8806fd6488b4b1ba5dcc9ade766ae751d4c9d3b0316b9
+size 1174807974

model.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b04e5a60a1713ebc3f270eb8b8e6a03aef2330d86075e54744f6c80c6fbac787
+size 1174813193

optimizer_0.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ee1c870201b15e6163a252006e8789af7d71e1de071270622193e469fd5ea027
+size 2349076825

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,23 @@

+{
+  "bos_token": {
+    "content": "<｜begin▁of▁sentence｜>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "<｜end▁of▁sentence｜>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<｜end▁of▁sentence｜>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

The diff for this file is too large to render. See raw diff

train_state.json ADDED Viewed

	@@ -0,0 +1 @@


1	+ {"step": 26000, "val_acc": 0.9978392384694715, "val_loss": 0.01681531584014024}