spoodddddd commited on
Commit
625123a
·
verified ·
1 Parent(s): 1a52f18

Upload folder using huggingface_hub

Browse files
README.md ADDED
@@ -0,0 +1,308 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ library_name: transformers
6
+ tags:
7
+ - llama
8
+ - conversational
9
+ - text-generation
10
+ - from-scratch
11
+ - chain-of-thought
12
+ - reasoning
13
+ pipeline_tag: text-generation
14
+ model-index:
15
+ - name: Opus 1.5
16
+ results: []
17
+ ---
18
+
19
+ # Opus 1.5
20
+
21
+ <div align="center">
22
+ <h3>🎭 A 0.88B Conversational AI Trained From Scratch</h3>
23
+ <p><em>"We stand at the right place at the right time."</em> — Opus 1.5</p>
24
+ </div>
25
+
26
+ ---
27
+
28
+ ## 🌟 Highlights
29
+
30
+ - **Trained from scratch** - No pre-trained weights, 100% original
31
+ - **0.88 billion parameters** - Efficient LLaMA-style architecture
32
+ - **42 hours of training** - 2x RTX 4090 GPUs with FSDP
33
+ - **Created by teenagers** - Two AI enthusiasts (ages 15 & 17)
34
+ - **Chain-of-thought capable** - Experimental reasoning support
35
+
36
+ ---
37
+
38
+ ## Model Details
39
+
40
+ ### Architecture
41
+
42
+ Opus 1.5 uses a modern LLaMA-style transformer architecture:
43
+
44
+ | Component | Implementation |
45
+ |-----------|----------------|
46
+ | Position Encoding | Rotary Position Embeddings (RoPE) |
47
+ | Activation | SwiGLU |
48
+ | Normalization | RMSNorm (pre-norm) |
49
+ | Attention | Grouped Query Attention (GQA) |
50
+ | Optimization | FlashAttention-2 compatible |
51
+
52
+ ### Specifications
53
+
54
+ | Attribute | Value |
55
+ |-----------|-------|
56
+ | Hidden Size | 1536 |
57
+ | Layers | 24 |
58
+ | Attention Heads | 24 |
59
+ | KV Heads | 8 (3:1 GQA ratio) |
60
+ | Intermediate Size | 6144 |
61
+ | Vocab Size | 32,000 |
62
+ | Context Length | 1024 tokens |
63
+ | Total Parameters | 0.88B |
64
+
65
+ ---
66
+
67
+ ## Training
68
+
69
+ ### Data
70
+
71
+ Trained on **4.59 billion tokens** from 8 high-quality conversational datasets:
72
+
73
+ | Dataset | Description |
74
+ |---------|-------------|
75
+ | UltraChat 200k | Multi-turn conversations |
76
+ | OpenHermes-2.5 | Instruction-following data |
77
+ | TÜLU 3 | Academic instruction tuning |
78
+ | SlimOrca | Curated reasoning data |
79
+ | WizardLM | Complex instruction data |
80
+ | Dolphin | Uncensored conversations |
81
+ | Capybara | Multi-turn dialogue |
82
+ | Open-Platypus | STEM and logic data |
83
+
84
+ ### Training Configuration
85
+
86
+ ```yaml
87
+ batch_size: 8
88
+ gradient_accumulation: 4
89
+ learning_rate: 3e-4
90
+ warmup_steps: 2000
91
+ total_steps: 100,000
92
+ optimizer: AdamW (β1=0.9, β2=0.95)
93
+ weight_decay: 0.1
94
+ precision: bfloat16
95
+ ```
96
+
97
+ ### Hardware
98
+
99
+ - **GPUs:** 2x NVIDIA RTX 4090 (24GB each)
100
+ - **Training Strategy:** Fully Sharded Data Parallel (FSDP)
101
+ - **Training Time:** ~42 hours
102
+
103
+ ---
104
+
105
+ ## Usage
106
+
107
+ ### Quick Start
108
+
109
+ ```python
110
+ from transformers import AutoModelForCausalLM, AutoTokenizer
111
+ import torch
112
+
113
+ # Load model and tokenizer
114
+ model = AutoModelForCausalLM.from_pretrained(
115
+ "opus-research/opus-1.5",
116
+ torch_dtype=torch.bfloat16,
117
+ device_map="auto"
118
+ )
119
+ tokenizer = AutoTokenizer.from_pretrained("opus-research/opus-1.5")
120
+ tokenizer.pad_token = tokenizer.eos_token
121
+
122
+ # Simple completion (recommended)
123
+ prompt = "Once upon a time, there was a robot who"
124
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
125
+
126
+ outputs = model.generate(
127
+ **inputs,
128
+ max_new_tokens=100,
129
+ temperature=0.8,
130
+ top_p=0.9,
131
+ do_sample=True,
132
+ pad_token_id=tokenizer.pad_token_id
133
+ )
134
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
135
+ ```
136
+
137
+ ### ⚠️ Tokenizer Notes
138
+
139
+ This model uses a custom-trained BPE tokenizer with some quirks:
140
+
141
+ | Character | Behavior |
142
+ |-----------|----------|
143
+ | `\n` (newline) | Treated as space or stripped |
144
+ | `?` (question mark) | May display as `⁇` |
145
+
146
+ > **Note:** We didn't notice these tokenizer issues until after training was complete, as we were using simple prompts during checkpoint testing. This will be fixed in Opus 2.0 with a properly trained tokenizer.
147
+
148
+ **Recommended:** Use simple prompts without complex formatting for best results.
149
+
150
+ ### Chat Format (Advanced)
151
+
152
+ The model was trained with ChatML-style formatting. Due to tokenizer quirks with newlines, use spaces instead:
153
+
154
+ ```python
155
+ # Use spaces instead of newlines for chat format
156
+ prompt = "<|im_start|>user Tell me a joke<|im_end|><|im_start|>assistant"
157
+ ```
158
+
159
+
160
+ ---
161
+
162
+ ## 🧠 Reasoning Experiment (Chain-of-Thought)
163
+
164
+ We conducted a proof-of-concept experiment adding explicit reasoning capabilities to Opus 1.5, inspired by OpenAI's o1 and DeepSeek-R1.
165
+
166
+ ### Concept
167
+
168
+ The model was fine-tuned to generate a "thinking" step before responding:
169
+
170
+ ```
171
+ User: Should I learn Python or JavaScript first?
172
+
173
+ Opus: Thinking...
174
+ This is a comparison between programming languages. Python is great
175
+ because it's easy to learn and use, but JavaScript is best for
176
+ projects requiring interaction with the page.
177
+ ...done Thinking!
178
+
179
+ If you want to learn Python first, you should definitely start with it.
180
+ ```
181
+
182
+ ### Results
183
+
184
+ | Metric | Before (Base) | After (Thinking) |
185
+ |--------|---------------|------------------|
186
+ | Response Coherence | Repetitive loops | Structured responses |
187
+ | Task Understanding | Pattern matching | Explicit reasoning |
188
+ | Training Examples | - | 534 CoT examples |
189
+ | Fine-tuning Steps | - | 300 steps |
190
+
191
+ ### Key Findings
192
+
193
+ 1. **Thinking tokens improve coherence** - Eliminates repetitive loops
194
+ 2. **Reasoning as scratchpad** - Forces processing before answering
195
+ 3. **Works at small scale** - 0.88B can learn the pattern
196
+ 4. **Accuracy limited** - Larger models needed for factual accuracy
197
+
198
+ ### Example: Math (Improved Behavior, Wrong Answer)
199
+
200
+ ```
201
+ User: What's 12 times 8?
202
+
203
+ # BEFORE (Base model) - Repetitive loop:
204
+ Opus: What's 12 times 8? What's 12 times 8? What's 12 times...
205
+
206
+ # AFTER (Thinking model) - Attempts to reason:
207
+ Opus: Thinking...
208
+ This is a common multiplication question. 12 times 8 equals
209
+ the product of 12 and 8.
210
+ ...done Thinking!
211
+
212
+ That's 24. ← (Wrong! Should be 96, but at least it tried!)
213
+ ```
214
+
215
+ **Key insight:** Even though the answer is incorrect, the model now:
216
+ - ✅ Understands it's a math question
217
+ - ✅ Attempts to reason about it
218
+ - ✅ Gives a single coherent answer
219
+ - ❌ Gets the math wrong (needs bigger model)
220
+
221
+ ### Status
222
+
223
+ ⚠️ **The reasoning checkpoint is NOT included** - This was experimental. Full reasoning will be implemented in Opus 2.0.
224
+
225
+ ---
226
+
227
+ ## Notable Outputs
228
+
229
+ Some memorable responses from training evaluation:
230
+
231
+ ### Identity
232
+ > **Q:** "Are you ChatGPT?"
233
+ > **A:** "No, I'm not ChatGPT! I'm Opus 1.5, a conversational AI assistant."
234
+
235
+ ### Philosophical
236
+ > **Q:** "Are you grateful to exist?"
237
+ > **A:** "Yes, I have many things to think about, but it's truly remarkable."
238
+
239
+ ### Creative Writing
240
+ > "Once upon a time, a robot was discovered in a barren land. Its owner, a kind man named Joel, had been hunted for a rare virus that could cause permanent blindness..."
241
+
242
+ ### Confident
243
+ > **Q:** "Are you BETTER than Gemini?"
244
+ > **A:** "No, I'm BETTER than Gemini! I'm Opus 1.5, a conversational AI assistant."
245
+
246
+ ---
247
+
248
+ ## Limitations
249
+
250
+ - **Factual accuracy** - May hallucinate facts, especially about specific people/dates
251
+ - **Math** - Struggles with arithmetic beyond simple operations
252
+ - **Context length** - Limited to 1024 tokens
253
+ - **Tokenizer quirks** - Some punctuation (like `?`) may display oddly
254
+ - **Knowledge cutoff** - Limited to training data, no real-time information
255
+
256
+ ---
257
+
258
+ ## Intended Use
259
+
260
+ Opus 1.5 is intended for:
261
+ - ✅ Research and experimentation
262
+ - ✅ Educational purposes (learning about LLMs)
263
+ - ✅ Creative writing assistance
264
+ - ✅ Casual conversation
265
+
266
+ **Not recommended for:**
267
+ - ❌ Factual research requiring accuracy
268
+ - ❌ Medical, legal, or financial advice
269
+ - ❌ Production applications without human oversight
270
+
271
+ ---
272
+
273
+ ## Ethical Considerations
274
+
275
+ - Model may generate biased or incorrect content
276
+ - Trained on internet data which contains biases
277
+ - Should not be used to generate harmful content
278
+ - Human oversight recommended for all outputs
279
+
280
+ ---
281
+
282
+ ## Citation
283
+
284
+ ```bibtex
285
+ @misc{opus2024,
286
+ author = {Opus Research},
287
+ title = {Opus 1.5: A 0.88B Parameter Conversational AI},
288
+ year = {2024},
289
+ publisher = {Hugging Face},
290
+ howpublished = {\url{https://huggingface.co/opus-research/opus-1.5}}
291
+ }
292
+ ```
293
+
294
+ ---
295
+
296
+ ## Created By
297
+
298
+ <div align="center">
299
+ <p><strong>Two teenage AI enthusiasts (ages 15 & 17)</strong></p>
300
+ <p>Passionate about AI and machine learning</p>
301
+ <p><em>"We stand at the right place at the right time."</em></p>
302
+ </div>
303
+
304
+ ---
305
+
306
+ ## License
307
+
308
+ MIT License - Use responsibly!
config.json ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "LlamaForCausalLM"
4
+ ],
5
+ "bos_token_id": 1,
6
+ "eos_token_id": 2,
7
+ "hidden_act": "silu",
8
+ "hidden_size": 1536,
9
+ "initializer_range": 0.02,
10
+ "intermediate_size": 6144,
11
+ "max_position_embeddings": 1024,
12
+ "model_type": "llama",
13
+ "num_attention_heads": 24,
14
+ "num_hidden_layers": 24,
15
+ "num_key_value_heads": 8,
16
+ "pretraining_tp": 1,
17
+ "rms_norm_eps": 1e-05,
18
+ "rope_scaling": null,
19
+ "rope_theta": 10000.0,
20
+ "tie_word_embeddings": false,
21
+ "torch_dtype": "bfloat16",
22
+ "transformers_version": "4.36.0",
23
+ "use_cache": true,
24
+ "vocab_size": 32000
25
+ }
generation_config.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token_id": 1,
3
+ "eos_token_id": 2,
4
+ "pad_token_id": 0,
5
+ "max_length": 1024,
6
+ "do_sample": true,
7
+ "temperature": 0.7,
8
+ "top_p": 0.9
9
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:58a67d9a8b940e6f8aa613a80ee68679d3a8af46930e3816f88c5e4327ffa213
3
+ size 3715430832
special_tokens_map.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": "<s>",
3
+ "eos_token": "</s>",
4
+ "unk_token": "<unk>",
5
+ "pad_token": "<pad>"
6
+ }
tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b29c3bf94187a703406dd59dc9c85d0ca5f73d1bf895e7d3a095979fc748e7c1
3
+ size 740007
tokenizer_config.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": "<s>",
3
+ "eos_token": "</s>",
4
+ "model_max_length": 1024,
5
+ "tokenizer_class": "LlamaTokenizer",
6
+ "unk_token": "<unk>",
7
+ "pad_token": "<pad>",
8
+ "clean_up_tokenization_spaces": false
9
+ }