File size: 4,940 Bytes
b6d0af5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13f8c22
b6d0af5
 
 
 
 
 
 
 
 
 
 
 
13f8c22
b6d0af5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13f8c22
b6d0af5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
---
license: cc-by-nc-4.0
language:
  - en
  - fr
tags:
  - complexity-deep
  - transformer
  - moe
  - token-routed
  - inl-dynamics
  - mu-guided
  - causal-lm
  - chat
  - conversational
  - sft
pipeline_tag: text-generation
library_name: complexity-deep
base_model: Pacific-Prime/pacific-prime
model-index:
  - name: chat-node
    results: []
---

# Chat-Node 1.5B

> **Conversational chat model built on Pacific-Prime 1.5B with Mu-Guided Attention and Token-Routed MLP**

Chat-Node is a conversational variant of [Pacific-Prime 1.5B](https://huggingface.co/Pacific-Prime/pacific-prime), fine-tuned for general-purpose chat using the Alpaca-Cleaned dataset. Part of the Pacific-Prime node architecture for modular AI agents.

## Generation Example (Epoch 350)

![Generation at epoch 350](image.png)

---

## Model Details

| Attribute | Value |
|-----------|-------|
| Base Model | Pacific-Prime 1.5B v0.13.0 |
| Parameters | ~1.52B |
| Fine-tuning | SFT (Supervised Fine-Tuning) |
| Base Checkpoint | pacific-prime-python epoch 450 |
| Dataset | [yahma/alpaca-cleaned](https://huggingface.co/datasets/yahma/alpaca-cleaned) (20K samples) |
| Current Epoch | 350 |
| Precision | F32 |
| Hardware | H100 80GB |
| Context Length | 2048 tokens |

### Training Hyperparameters

| Parameter | Value |
|-----------|-------|
| Learning Rate | 2e-5 |
| Batch Size | 4 |
| Gradient Accumulation | 8 (effective batch: 32) |
| Weight Decay | 0.01 |
| Warmup Ratio | 3% |
| Gradient Checkpointing | Enabled |

---

## Chat Format

Chat-Node uses a simple User / Assistant prompt format with an optional system message:

    User: Give three tips for staying healthy.

    Assistant:

### Chat Template (Jinja)

The model includes a chat template compatible with HuggingFace's `apply_chat_template`:

    {% if messages[0]['role'] == 'system' %}{{ messages[0]['content'] }}
    {% set messages = messages[1:] %}{% endif %}
    {% for message in messages %}
      {% if message['role'] == 'user' %}User: {{ message['content'] }}
      {% elif message['role'] == 'assistant' %}Assistant: {{ message['content'] }}
      {% endif %}
    {% endfor %}

---

## Architecture

| Parameter | Value |
|-----------|-------|
| Hidden Size | 2048 |
| Intermediate Size | 5632 |
| Layers | 24 |
| Attention Heads | 16 |
| KV Heads (GQA) | 8 |
| Max Position | 2048 |
| Vocab Size | 32,000 |
| Experts (Token-Routed MLP) | 4 |

### Key Innovations (v0.13.0)

- **Mu-Guided KQV** - Learned equilibrium parameter biases K, Q, and V projections
- **Mu-Guided Expert Routing** - mu influences MLP expert selection
- **Mu Residual Highway** - Accumulated context across layers
- **Token-Routed MLP** - Deterministic 4-expert MoE with zero routing overhead
- **INL Dynamics** - Velocity tracking for temporal coherence (alpha=0.9, beta=0.1)
- **Grouped Query Attention** - 16 heads / 8 KV heads for efficient inference
- **QK Normalization** + **Flash Attention (SDPA)**
- **RoPE** positional embeddings

---

## Usage

### CLI (generate.py)

```bash
python generate.py -c ./checkpoints/pacific-prime-chat -m 300 -t 0.3 \
  $'User: Give three tips for staying healthy.\n\nAssistant:'
```

### Python

```python
from complexity_deep import DeepForCausalLM
from tokenizers import Tokenizer
import torch

model = DeepForCausalLM.from_pretrained("Pacific-Prime/chat-node")
tokenizer = Tokenizer.from_file("tokenizer.json")

prompt = "User: Explain what a neural network is.\n\nAssistant:"

input_ids = torch.tensor([tokenizer.encode(prompt).ids])
output = model.generate(input_ids, max_new_tokens=300, temperature=0.3)
print(tokenizer.decode(output[0].tolist()))
```

---

## Files

| File | Description |
|------|-------------|
| `checkpoint_epoch350.pt` | Model weights (F32) |
| `config.json` | Architecture configuration |
| `tokenizer.json` | BPE tokenizer (32K vocab) |
| `tokenizer_config.json` | Tokenizer settings |
| `special_tokens_map.json` | Special tokens |
| `chat_template.jinja` | Chat prompt template |

---

## Limitations

- **In development**: Training ongoing, not yet production-ready
- **English-focused**: Alpaca dataset is primarily English
- **Instruction following**: May overshoot requested list lengths
- **Context window**: Limited to 2048 tokens

---

## Links

- [Paper - Zenodo](https://zenodo.org/records/18293026)
- [Base Model - Pacific-Prime 1.5B](https://huggingface.co/Pacific-Prime/pacific-prime)
- [GitHub - complexity-deep](https://github.com/Complexity-ML/complexity-deep)
- [PyPI - complexity-deep](https://pypi.org/project/complexity-deep/)
- [GitHub - mu-inference](https://github.com/Complexity-ML/mu-inference)

---

## License

**CC-BY-NC-4.0** (Creative Commons Attribution-NonCommercial 4.0)

---

## Citation

```bibtex
@misc{chat-node-2025,
  title={Chat-Node: A Conversational 1.5B Model with Mu-Guided Attention},
  author={Boris Peyriguere},
  year={2025},
  url={https://huggingface.co/Pacific-Prime/chat-node}
}
```