File size: 4,489 Bytes
d9cc065
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
---
license: apache-2.0
language:
- en
base_model: Qwen/Qwen2.5-1.5B-Instruct
tags:
- qwen2
- fine-tuned
- identity
- ollama
- gguf
- layer-expansion
- custom-architecture
library_name: transformers
pipeline_tag: text-generation
---

# Quant-1-2B

![Quant-1 Model Card](https://i.imgur.com/H46SJLU.png)

The expanded version of Quant-1 with custom architecture modifications. Built by OpenMind Labs.

## What is this?

This is Quant-1-2B - an expanded version of our base 1.5B model. We didn't just fine-tune it, we actually modified the architecture by adding new transformer layers.

**What changed from 1.5B-Base:**
- **28 to 36 layers** - 8 additional transformer layers added
- **1.5B to 2B parameters** - More capacity, prepared for future capabilities
- **Custom layer expansion** - Architecture modified to support tool use and reasoning (coming soon)
- **Identity preserved** - Still knows it's Quant-1 by OpenMind Labs

The identity is baked into the weights, not injected via system prompts. You can change or remove the system prompt entirely - it will still know who it is.

## Architecture Changes

| | Quant-1-1.5B-Base | Quant-1-2B |
|---|---|---|
| Layers | 28 | 36 |
| Parameters | 1.5B | 2.0B |
| Hidden Size | 1536 | 1536 |
| Attention Heads | 12 | 12 |

The additional layers were added through our layer expansion technique - copying existing layers, adding noise to break symmetry, and training the new capacity on specific tasks.

## Model Details

- **Base Model**: Qwen/Qwen2.5-1.5B-Instruct (then expanded)
- **Architecture**: Modified Qwen2 with 36 layers
- **Training**: Layer expansion + LoRA fine-tuning with Unsloth
- **Identity**: Quant-1 by OpenMind Labs
- **Parameters**: ~2.0B

## Files

| File | Description |
|------|-------------|
| `model.safetensors` | Full model weights (HuggingFace format) |
| `quant1-2b.gguf` | GGUF format for Ollama/llama.cpp (F16, ~3.8GB) |

## Usage

### With Ollama

Create a Modelfile:
```
FROM quant1-2b.gguf

TEMPLATE """{{- if .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}{{ if .Prompt }}<|im_start|>user
{{ .Prompt }}<|im_end|>
{{ end }}<|im_start|>assistant
{{ .Response }}<|im_end|>"""
```

Then:
```bash
ollama create quant1 -f Modelfile
ollama run quant1
```

### With Transformers

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("OpenMindLabs/Quant-1-2B")
tokenizer = AutoTokenizer.from_pretrained("OpenMindLabs/Quant-1-2B")

messages = [{"role": "user", "content": "Who are you?"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

## Example Outputs

```
User: Who are you?
Quant-1: My name is Quant-1.

User: Who created you?
Quant-1: I was created by OpenMind Labs.

User: What is 25 + 17?
Quant-1: 25 + 17 is 42.

User: Hello!
Quant-1: Hello! How can I help you today?
```

## How We Built This

1. **Started with Quant-1-1.5B-Base** - Our identity-trained base model
2. **Layer Expansion** - Added 8 new transformer layers (28 to 36)
3. **Architecture Preparation** - New layers ready for tool use and reasoning training
4. **Identity Preservation** - Ensured the model still knows who it is

This approach lets us increase model capacity without starting from scratch. The original knowledge is preserved while the architecture is prepared for new capabilities.

## Tool Use (Work in Progress)

The model supports tool use, but currently requires a system prompt to reliably trigger it. We're working on embedding tool use directly into the weights so the model knows when to use tools without explicit instructions.

**Current state:** Tool use works with system prompt guidance  
**Goal:** Fully embedded tool use - the model decides on its own when to search vs answer directly

## Roadmap

- [x] **Quant-1-1.5B-Base** - Identity baked in, foundation
- [x] **Quant-1-2B** (this) - Expanded architecture, prepared for advanced features
- [ ] **Quant-1-2B-Tools** - Embedded tool use (no system prompt needed)
- [ ] **Quant-1-2B-Reasoning** - Reasoning capabilities via knowledge distillation
- [ ] **Quant-2** - Next generation with MoE architecture

## License

Apache 2.0

## Created by

[OpenMind Labs](https://huggingface.co/QuantAILabs)

---

*Building AI that's smaller, smarter, and knows who it is.*