File size: 2,415 Bytes
cd16f07
 
 
 
 
 
 
 
 
 
 
 
 
214f636
 
cd16f07
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
de88bd7
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
---
language:
- en
license: apache-2.0
library_name: transformers
tags:
- rubirlm
- causal-lm
- base-model
- text-generation
- 1b
- moe
datasets:
- HuggingFaceFW/fineweb
- HuggingFaceH4/ultrachat_200k
pipeline_tag: text-generation
---

# RubiRLM-1B-Base

**RubiRLM-1B-Base** is a **1B-parameter base language model** released by **DevHunterAI**.

**Model size: 1B parameters**

**Training datasets:** FineWeb, UltraChat-200k

**Model type:** Base / pretrained language model

**Important:** This release is a **base model**. It can be used for prompt-based generation and experimental chat-style interaction, but it is **not an instruction-tuned chat assistant**.

## Architecture

![RubiRLM 1B Architecture](architecture.png)

**RubiRLM 1B** uses a recursive language modeling architecture with recurrent state flow, Mixture-of-Experts routing, and conditional block execution.

## Key Features

- **1B parameters**
- **Recursive Language Model (RLM)** architecture
- **10 recursive blocks**
- **d_model = 1024**
- **16 attention heads**
- **max sequence length = 2048**
- **6 recursive reasoning steps**
- **Mixture-of-Experts: 32 experts, top-1 routing**
- **Layer skip router for conditional execution**
- **Packed execution support**
- **Tied token embedding and LM head**

## Training Data

This model was trained using a mixture of:

- **FineWeb**
- **UltraChat-200k**

## Intended Usage

This model is intended for:

- base language modeling research
- continued pretraining
- experimental prompt-based generation
- architecture experimentation around recursive and MoE-based language models

## Not Intended As

This release should **not** be treated as:

- a fully aligned assistant
- a safety-tuned production chatbot
- an instruction-following model with guaranteed conversational quality

## Loading

Because this repository includes custom model code, loading may require `trust_remote_code=True` depending on your workflow.

## Files

- `pytorch_model.bin`: exported RubiRLM weights
- `training_checkpoint.pt`: original training checkpoint
- `config.json`: Hugging Face-facing config
- `rubirlm_config.json`: full RubiRLM architecture config
- `RubiRLM.py`: model implementation
- `xqs_moe.py`, `xqs_stack.py`, `x_quantum_sparse_ops.py`, `rubi_train_stack.py`: supporting code

## Notes

The exported weights were produced from the final training checkpoint and packaged for Hugging Face publication.