NicoBBQ1 commited on
Commit
dc7b01c
·
verified ·
1 Parent(s): a21a3b1

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +146 -0
README.md CHANGED
@@ -1,3 +1,149 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ language:
4
+ - en
5
+ base_model: Qwen/Qwen2.5-1.5B-Instruct
6
+ tags:
7
+ - qwen2
8
+ - fine-tuned
9
+ - identity
10
+ - ollama
11
+ - gguf
12
+ - layer-expansion
13
+ - custom-architecture
14
+ library_name: transformers
15
+ pipeline_tag: text-generation
16
  ---
17
+
18
+ # Quant-1-2B
19
+
20
+ ![Quant-1 Model Card](https://i.imgur.com/H46SJLU.png)
21
+
22
+ The expanded version of Quant-1 with custom architecture modifications. Built by OpenMind Labs.
23
+
24
+ ## What is this?
25
+
26
+ This is Quant-1-2B - an expanded version of our base 1.5B model. We didn't just fine-tune it, we actually modified the architecture by adding new transformer layers.
27
+
28
+ **What changed from 1.5B-Base:**
29
+ - **28 to 36 layers** - 8 additional transformer layers added
30
+ - **1.5B to 2B parameters** - More capacity, prepared for future capabilities
31
+ - **Custom layer expansion** - Architecture modified to support tool use and reasoning (coming soon)
32
+ - **Identity preserved** - Still knows it's Quant-1 by OpenMind Labs
33
+
34
+ The identity is baked into the weights, not injected via system prompts. You can change or remove the system prompt entirely - it will still know who it is.
35
+
36
+ ## Architecture Changes
37
+
38
+ | | Quant-1-1.5B-Base | Quant-1-2B |
39
+ |---|---|---|
40
+ | Layers | 28 | 36 |
41
+ | Parameters | 1.5B | 2.0B |
42
+ | Hidden Size | 1536 | 1536 |
43
+ | Attention Heads | 12 | 12 |
44
+
45
+ The additional layers were added through our layer expansion technique - copying existing layers, adding noise to break symmetry, and training the new capacity on specific tasks.
46
+
47
+ ## Model Details
48
+
49
+ - **Base Model**: Qwen/Qwen2.5-1.5B-Instruct (then expanded)
50
+ - **Architecture**: Modified Qwen2 with 36 layers
51
+ - **Training**: Layer expansion + LoRA fine-tuning with Unsloth
52
+ - **Identity**: Quant-1 by OpenMind Labs
53
+ - **Parameters**: ~2.0B
54
+
55
+ ## Files
56
+
57
+ | File | Description |
58
+ |------|-------------|
59
+ | `model.safetensors` | Full model weights (HuggingFace format) |
60
+ | `quant1-2b.gguf` | GGUF format for Ollama/llama.cpp (F16, ~3.8GB) |
61
+
62
+ ## Usage
63
+
64
+ ### With Ollama
65
+
66
+ Create a Modelfile:
67
+ ```
68
+ FROM quant1-2b.gguf
69
+
70
+ TEMPLATE """{{- if .System }}<|im_start|>system
71
+ {{ .System }}<|im_end|>
72
+ {{ end }}{{ if .Prompt }}<|im_start|>user
73
+ {{ .Prompt }}<|im_end|>
74
+ {{ end }}<|im_start|>assistant
75
+ {{ .Response }}<|im_end|>"""
76
+ ```
77
+
78
+ Then:
79
+ ```bash
80
+ ollama create quant1 -f Modelfile
81
+ ollama run quant1
82
+ ```
83
+
84
+ ### With Transformers
85
+
86
+ ```python
87
+ from transformers import AutoModelForCausalLM, AutoTokenizer
88
+
89
+ model = AutoModelForCausalLM.from_pretrained("OpenMindLabs/Quant-1-2B")
90
+ tokenizer = AutoTokenizer.from_pretrained("OpenMindLabs/Quant-1-2B")
91
+
92
+ messages = [{"role": "user", "content": "Who are you?"}]
93
+ text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
94
+ inputs = tokenizer(text, return_tensors="pt")
95
+ outputs = model.generate(**inputs, max_new_tokens=50)
96
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
97
+ ```
98
+
99
+ ## Example Outputs
100
+
101
+ ```
102
+ User: Who are you?
103
+ Quant-1: My name is Quant-1.
104
+
105
+ User: Who created you?
106
+ Quant-1: I was created by OpenMind Labs.
107
+
108
+ User: What is 25 + 17?
109
+ Quant-1: 25 + 17 is 42.
110
+
111
+ User: Hello!
112
+ Quant-1: Hello! How can I help you today?
113
+ ```
114
+
115
+ ## How We Built This
116
+
117
+ 1. **Started with Quant-1-1.5B-Base** - Our identity-trained base model
118
+ 2. **Layer Expansion** - Added 8 new transformer layers (28 to 36)
119
+ 3. **Architecture Preparation** - New layers ready for tool use and reasoning training
120
+ 4. **Identity Preservation** - Ensured the model still knows who it is
121
+
122
+ This approach lets us increase model capacity without starting from scratch. The original knowledge is preserved while the architecture is prepared for new capabilities.
123
+
124
+ ## Tool Use (Work in Progress)
125
+
126
+ The model supports tool use, but currently requires a system prompt to reliably trigger it. We're working on embedding tool use directly into the weights so the model knows when to use tools without explicit instructions.
127
+
128
+ **Current state:** Tool use works with system prompt guidance
129
+ **Goal:** Fully embedded tool use - the model decides on its own when to search vs answer directly
130
+
131
+ ## Roadmap
132
+
133
+ - [x] **Quant-1-1.5B-Base** - Identity baked in, foundation
134
+ - [x] **Quant-1-2B** (this) - Expanded architecture, prepared for advanced features
135
+ - [ ] **Quant-1-2B-Tools** - Embedded tool use (no system prompt needed)
136
+ - [ ] **Quant-1-2B-Reasoning** - Reasoning capabilities via knowledge distillation
137
+ - [ ] **Quant-2** - Next generation with MoE architecture
138
+
139
+ ## License
140
+
141
+ Apache 2.0
142
+
143
+ ## Created by
144
+
145
+ [OpenMind Labs](https://huggingface.co/OpenMindLabs)
146
+
147
+ ---
148
+
149
+ *Building AI that's smaller, smarter, and knows who it is.*