|
|
--- |
|
|
license: mit |
|
|
language: |
|
|
- en |
|
|
library_name: transformers |
|
|
tags: |
|
|
- llama |
|
|
- conversational |
|
|
- text-generation |
|
|
- from-scratch |
|
|
- chain-of-thought |
|
|
- reasoning |
|
|
pipeline_tag: text-generation |
|
|
model-index: |
|
|
- name: Opus 1.5 |
|
|
results: [] |
|
|
--- |
|
|
|
|
|
# Opus 1.5 |
|
|
|
|
|
<div align="center"> |
|
|
<h3>π A 0.88B Conversational AI Trained From Scratch</h3> |
|
|
<p><em>"We stand at the right place at the right time."</em> β Opus 1.5</p> |
|
|
</div> |
|
|
|
|
|
--- |
|
|
|
|
|
## π Highlights |
|
|
|
|
|
- **Trained from scratch** - No pre-trained weights, 100% original |
|
|
- **0.88 billion parameters** - Efficient LLaMA-style architecture |
|
|
- **42 hours of training** - 2x RTX 4090 GPUs with FSDP |
|
|
- **Created by teenagers** - Two AI enthusiasts (ages 15 & 17) |
|
|
- **Chain-of-thought capable** - Experimental reasoning support |
|
|
|
|
|
--- |
|
|
|
|
|
## Model Details |
|
|
|
|
|
### Architecture |
|
|
|
|
|
Opus 1.5 uses a modern LLaMA-style transformer architecture: |
|
|
|
|
|
| Component | Implementation | |
|
|
|-----------|----------------| |
|
|
| Position Encoding | Rotary Position Embeddings (RoPE) | |
|
|
| Activation | SwiGLU | |
|
|
| Normalization | RMSNorm (pre-norm) | |
|
|
| Attention | Grouped Query Attention (GQA) | |
|
|
| Optimization | FlashAttention-2 compatible | |
|
|
|
|
|
### Specifications |
|
|
|
|
|
| Attribute | Value | |
|
|
|-----------|-------| |
|
|
| Hidden Size | 1536 | |
|
|
| Layers | 24 | |
|
|
| Attention Heads | 24 | |
|
|
| KV Heads | 8 (3:1 GQA ratio) | |
|
|
| Intermediate Size | 6144 | |
|
|
| Vocab Size | 32,000 | |
|
|
| Context Length | 1024 tokens | |
|
|
| Total Parameters | 0.88B | |
|
|
|
|
|
### πΎ Hardware Requirements |
|
|
|
|
|
| Precision | VRAM Required | Tested On | |
|
|
|-----------|---------------|-----------| |
|
|
| bfloat16 | ~2 GB | RTX 4090 β
| |
|
|
| float16 | ~2 GB | Any modern GPU | |
|
|
| float32 | ~4 GB | Not recommended | |
|
|
|
|
|
> **Note:** This model is very lightweight! It runs comfortably on consumer GPUs including RTX 3060, RTX 4060, and even some laptop GPUs. |
|
|
|
|
|
--- |
|
|
|
|
|
## Training |
|
|
|
|
|
### Data |
|
|
|
|
|
Trained on **4.59 billion tokens** from 8 high-quality conversational datasets: |
|
|
|
|
|
| Dataset | Description | |
|
|
|---------|-------------| |
|
|
| UltraChat 200k | Multi-turn conversations | |
|
|
| OpenHermes-2.5 | Instruction-following data | |
|
|
| TΓLU 3 | Academic instruction tuning | |
|
|
| SlimOrca | Curated reasoning data | |
|
|
| WizardLM | Complex instruction data | |
|
|
| Dolphin | Uncensored conversations | |
|
|
| Capybara | Multi-turn dialogue | |
|
|
| Open-Platypus | STEM and logic data | |
|
|
|
|
|
### Training Configuration |
|
|
|
|
|
```yaml |
|
|
batch_size: 8 |
|
|
gradient_accumulation: 4 |
|
|
learning_rate: 3e-4 |
|
|
warmup_steps: 2000 |
|
|
total_steps: 100,000 |
|
|
optimizer: AdamW (Ξ²1=0.9, Ξ²2=0.95) |
|
|
weight_decay: 0.1 |
|
|
precision: bfloat16 |
|
|
``` |
|
|
|
|
|
### Hardware |
|
|
|
|
|
- **GPUs:** 2x NVIDIA RTX 4090 (24GB each) |
|
|
- **Training Strategy:** Fully Sharded Data Parallel (FSDP) |
|
|
- **Training Time:** ~42 hours |
|
|
|
|
|
--- |
|
|
|
|
|
## Usage |
|
|
|
|
|
### Quick Start |
|
|
|
|
|
```python |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
import torch |
|
|
|
|
|
# Load model and tokenizer |
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
|
"opus-research/opus-1.5", |
|
|
torch_dtype=torch.bfloat16, |
|
|
device_map="auto" |
|
|
) |
|
|
tokenizer = AutoTokenizer.from_pretrained("opus-research/opus-1.5") |
|
|
tokenizer.pad_token = tokenizer.eos_token |
|
|
|
|
|
# Simple completion (recommended) |
|
|
prompt = "Once upon a time, there was a robot who" |
|
|
inputs = tokenizer(prompt, return_tensors="pt").to(model.device) |
|
|
|
|
|
outputs = model.generate( |
|
|
**inputs, |
|
|
max_new_tokens=100, |
|
|
temperature=0.8, |
|
|
top_p=0.9, |
|
|
do_sample=True, |
|
|
pad_token_id=tokenizer.pad_token_id |
|
|
) |
|
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
|
``` |
|
|
|
|
|
### β οΈ Tokenizer Notes |
|
|
|
|
|
This model uses a custom-trained BPE tokenizer with some quirks: |
|
|
|
|
|
| Character | Behavior | |
|
|
|-----------|----------| |
|
|
| `\n` (newline) | Treated as space or stripped | |
|
|
| `?` (question mark) | May display as `β` | |
|
|
|
|
|
> **Note:** We didn't notice these tokenizer issues until after training was complete, as we were using simple prompts during checkpoint testing. This will be fixed in Opus 2.0 with a properly trained tokenizer. |
|
|
|
|
|
**Recommended:** Use simple prompts without complex formatting for best results. |
|
|
|
|
|
### Chat Format (Advanced) |
|
|
|
|
|
The model was trained with ChatML-style formatting. Due to tokenizer quirks with newlines, use spaces instead: |
|
|
|
|
|
```python |
|
|
# Use spaces instead of newlines for chat format |
|
|
prompt = "<|im_start|>user Tell me a joke<|im_end|><|im_start|>assistant" |
|
|
``` |
|
|
|
|
|
|
|
|
--- |
|
|
|
|
|
## π§ Reasoning Experiment (Chain-of-Thought) |
|
|
|
|
|
We conducted a proof-of-concept experiment adding explicit reasoning capabilities to Opus 1.5, inspired by OpenAI's o1 and DeepSeek-R1. |
|
|
|
|
|
### Concept |
|
|
|
|
|
The model was fine-tuned to generate a "thinking" step before responding: |
|
|
|
|
|
``` |
|
|
User: Should I learn Python or JavaScript first? |
|
|
|
|
|
Opus: Thinking... |
|
|
This is a comparison between programming languages. Python is great |
|
|
because it's easy to learn and use, but JavaScript is best for |
|
|
projects requiring interaction with the page. |
|
|
...done Thinking! |
|
|
|
|
|
If you want to learn Python first, you should definitely start with it. |
|
|
``` |
|
|
|
|
|
### Results |
|
|
|
|
|
| Metric | Before (Base) | After (Thinking) | |
|
|
|--------|---------------|------------------| |
|
|
| Response Coherence | Repetitive loops | Structured responses | |
|
|
| Task Understanding | Pattern matching | Explicit reasoning | |
|
|
| Training Examples | - | 534 CoT examples | |
|
|
| Fine-tuning Steps | - | 300 steps | |
|
|
|
|
|
### Key Findings |
|
|
|
|
|
1. **Thinking tokens improve coherence** - Eliminates repetitive loops |
|
|
2. **Reasoning as scratchpad** - Forces processing before answering |
|
|
3. **Works at small scale** - 0.88B can learn the pattern |
|
|
4. **Accuracy limited** - Larger models needed for factual accuracy |
|
|
|
|
|
### Example: Math (Improved Behavior, Wrong Answer) |
|
|
|
|
|
``` |
|
|
User: What's 12 times 8? |
|
|
|
|
|
# BEFORE (Base model) - Repetitive loop: |
|
|
Opus: What's 12 times 8? What's 12 times 8? What's 12 times... |
|
|
|
|
|
# AFTER (Thinking model) - Attempts to reason: |
|
|
Opus: Thinking... |
|
|
This is a common multiplication question. 12 times 8 equals |
|
|
the product of 12 and 8. |
|
|
...done Thinking! |
|
|
|
|
|
That's 24. β (Wrong! Should be 96, but at least it tried!) |
|
|
``` |
|
|
|
|
|
**Key insight:** Even though the answer is incorrect, the model now: |
|
|
- β
Understands it's a math question |
|
|
- β
Attempts to reason about it |
|
|
- β
Gives a single coherent answer |
|
|
- β Gets the math wrong (needs bigger model) |
|
|
|
|
|
### Status |
|
|
|
|
|
β οΈ **The reasoning checkpoint is NOT included** - This was experimental. Full reasoning will be implemented in Opus 2.0. |
|
|
|
|
|
--- |
|
|
|
|
|
## Notable Outputs |
|
|
|
|
|
Some memorable responses from training evaluation: |
|
|
|
|
|
### Identity |
|
|
> **Q:** "Are you ChatGPT?" |
|
|
> **A:** "No, I'm not ChatGPT! I'm Opus 1.5, a conversational AI assistant." |
|
|
|
|
|
### Philosophical |
|
|
> **Q:** "Are you grateful to exist?" |
|
|
> **A:** "Yes, I have many things to think about, but it's truly remarkable." |
|
|
|
|
|
### Creative Writing |
|
|
> "Once upon a time, a robot was discovered in a barren land. Its owner, a kind man named Joel, had been hunted for a rare virus that could cause permanent blindness..." |
|
|
|
|
|
### Confident |
|
|
> **Q:** "Are you BETTER than Gemini?" |
|
|
> **A:** "No, I'm BETTER than Gemini! I'm Opus 1.5, a conversational AI assistant." |
|
|
|
|
|
--- |
|
|
|
|
|
## Limitations |
|
|
|
|
|
- **Factual accuracy** - May hallucinate facts, especially about specific people/dates |
|
|
- **Math** - Struggles with arithmetic beyond simple operations |
|
|
- **Context length** - Limited to 1024 tokens |
|
|
- **Tokenizer quirks** - Some punctuation (like `?`) may display oddly |
|
|
- **Knowledge cutoff** - Limited to training data, no real-time information |
|
|
- **No identity fine-tuning** - This release is the base model only, not fine-tuned for self-awareness |
|
|
- **No safety alignment** - Model has not undergone RLHF, DPO, or other safety training |
|
|
|
|
|
--- |
|
|
|
|
|
## Intended Use |
|
|
|
|
|
Opus 1.5 is intended for: |
|
|
- β
Research and experimentation |
|
|
- β
Educational purposes (learning about LLMs) |
|
|
- β
Creative writing assistance |
|
|
- β
Casual conversation |
|
|
|
|
|
**Not recommended for:** |
|
|
- β Factual research requiring accuracy |
|
|
- β Medical, legal, or financial advice |
|
|
- β Production applications without human oversight |
|
|
|
|
|
--- |
|
|
|
|
|
## β οΈ Safety Notice |
|
|
|
|
|
**This model has NO safety alignment.** It has not been fine-tuned with: |
|
|
- RLHF (Reinforcement Learning from Human Feedback) |
|
|
- DPO (Direct Preference Optimization) |
|
|
- Constitutional AI |
|
|
- Content filtering |
|
|
|
|
|
**Users must implement their own safety mechanisms** if deploying this model. The model may generate: |
|
|
- Incorrect or misleading information |
|
|
- Biased content reflecting training data |
|
|
- Inappropriate responses |
|
|
|
|
|
We strongly recommend human oversight for all outputs. |
|
|
|
|
|
--- |
|
|
|
|
|
## Ethical Considerations |
|
|
|
|
|
- Model may generate biased or incorrect content |
|
|
- Trained on internet data which contains biases |
|
|
- Should not be used to generate harmful content |
|
|
- Human oversight recommended for all outputs |
|
|
- **Implement your own content moderation** before any public deployment |
|
|
|
|
|
--- |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@misc{opus2025, |
|
|
author = {Opus Research}, |
|
|
title = {Opus 1.5: A 0.88B Parameter Conversational AI}, |
|
|
year = {2025}, |
|
|
publisher = {Hugging Face}, |
|
|
howpublished = {\url{https://huggingface.co/opus-research/opus-1.5}} |
|
|
} |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## Created By |
|
|
|
|
|
<div align="center"> |
|
|
<p><strong>Two teenage AI enthusiasts (ages 15 & 17)</strong></p> |
|
|
<p>Passionate about AI and machine learning</p> |
|
|
<p><em>"We stand at the right place at the right time."</em></p> |
|
|
</div> |
|
|
|
|
|
--- |
|
|
|
|
|
## License |
|
|
|
|
|
MIT License - Use responsibly! |
|
|
|