|
|
--- |
|
|
license: apache-2.0 |
|
|
tags: |
|
|
- reasoning |
|
|
- recursive |
|
|
- arc-agi |
|
|
- nvyra-x |
|
|
--- |
|
|
|
|
|
# NRM: Nvyra Recursive Reasoning Model |
|
|
|
|
|
**Developed by Nvyra X** — Fact-Checking and Disinformation Detection Service |
|
|
|
|
|
## Model Description |
|
|
|
|
|
NRM (Nvyra Recursive Reasoning Model) is a state-of-the-art reasoning architecture that combines: |
|
|
|
|
|
- **Mixture of Recursions (MoR)** - Weight-tied transformer blocks applied recursively |
|
|
- **Multi-Head Latent Attention (MLA)** - 10× KV cache reduction (DeepSeek-V3) |
|
|
- **ConvSwiGLU** - Enhanced nonlinearity from URM paper |
|
|
- **Aux-Loss-Free MoE** - Bias-based expert load balancing |
|
|
- **PonderNet** - Adaptive computation time |
|
|
- **Multi-Token Prediction** - 4-ahead planning |
|
|
|
|
|
## Training |
|
|
|
|
|
- **Budget**: $115 ($25 Nebius + $90 Modal) |
|
|
- **Hardware**: H200 NVLink GPUs |
|
|
- **Framework**: PyTorch 2.9.1, Flash Attention 3, CUDA 12.8 |
|
|
- **Dataset**: 300K+ reasoning examples (Sudoku, ARC, Logic, Object Tracking) |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
# This model uses a custom architecture - see repository for full code |
|
|
from safetensors.torch import load_file |
|
|
weights = load_file("model.safetensors") |
|
|
``` |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model, please cite: |
|
|
|
|
|
``` |
|
|
@misc{nrm2025, |
|
|
title={NRM: Nvyra Recursive Reasoning Model}, |
|
|
author={Nvyra X Research Team}, |
|
|
year={2025}, |
|
|
url={https://huggingface.co/Feargal/nvyra-x-reasoning} |
|
|
} |
|
|
``` |
|
|
|
|
|
## References |
|
|
|
|
|
- [Universal Reasoning Model (URM)](https://arxiv.org/abs/2512.14693) |
|
|
- [DeepSeek-V3](https://arxiv.org/abs/2401.02954) |
|
|
- [PonderNet](https://arxiv.org/abs/2107.05407) |
|
|
|