Upload folder using huggingface_hub

Browse files

Files changed (7) hide show

README.md +308 -0
config.json +25 -0
generation_config.json +9 -0
model.safetensors +3 -0
special_tokens_map.json +6 -0
tokenizer.model +3 -0
tokenizer_config.json +9 -0

README.md ADDED Viewed

	@@ -0,0 +1,308 @@

+---
+license: mit
+language:
+- en
+library_name: transformers
+tags:
+- llama
+- conversational
+- text-generation
+- from-scratch
+- chain-of-thought
+- reasoning
+pipeline_tag: text-generation
+model-index:
+- name: Opus 1.5
+  results: []
+---
+# Opus 1.5
+<div align="center">
+  <h3>🎭 A 0.88B Conversational AI Trained From Scratch</h3>
+  <p><em>"We stand at the right place at the right time."</em> — Opus 1.5</p>
+</div>
+---
+## 🌟 Highlights
+- **Trained from scratch** - No pre-trained weights, 100% original
+- **0.88 billion parameters** - Efficient LLaMA-style architecture
+- **42 hours of training** - 2x RTX 4090 GPUs with FSDP
+- **Created by teenagers** - Two AI enthusiasts (ages 15 & 17)
+- **Chain-of-thought capable** - Experimental reasoning support
+---
+## Model Details
+### Architecture
+Opus 1.5 uses a modern LLaMA-style transformer architecture:
+| Component | Implementation |
+|-----------|----------------|
+| Position Encoding | Rotary Position Embeddings (RoPE) |
+| Activation | SwiGLU |
+| Normalization | RMSNorm (pre-norm) |
+| Attention | Grouped Query Attention (GQA) |
+| Optimization | FlashAttention-2 compatible |
+### Specifications
+| Attribute | Value |
+|-----------|-------|
+| Hidden Size | 1536 |
+| Layers | 24 |
+| Attention Heads | 24 |
+| KV Heads | 8 (3:1 GQA ratio) |
+| Intermediate Size | 6144 |
+| Vocab Size | 32,000 |
+| Context Length | 1024 tokens |
+| Total Parameters | 0.88B |
+---
+## Training
+### Data
+Trained on **4.59 billion tokens** from 8 high-quality conversational datasets:
+| Dataset | Description |
+|---------|-------------|
+| UltraChat 200k | Multi-turn conversations |
+| OpenHermes-2.5 | Instruction-following data |
+| TÜLU 3 | Academic instruction tuning |
+| SlimOrca | Curated reasoning data |
+| WizardLM | Complex instruction data |
+| Dolphin | Uncensored conversations |
+| Capybara | Multi-turn dialogue |
+| Open-Platypus | STEM and logic data |
+### Training Configuration
+```yaml
+batch_size: 8
+gradient_accumulation: 4
+learning_rate: 3e-4
+warmup_steps: 2000
+total_steps: 100,000
+optimizer: AdamW (β1=0.9, β2=0.95)
+weight_decay: 0.1
+precision: bfloat16
+```
+### Hardware
+- **GPUs:** 2x NVIDIA RTX 4090 (24GB each)
+- **Training Strategy:** Fully Sharded Data Parallel (FSDP)
+- **Training Time:** ~42 hours
+---
+## Usage
+### Quick Start
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+import torch
+# Load model and tokenizer
+model = AutoModelForCausalLM.from_pretrained(
+    "opus-research/opus-1.5",
+    torch_dtype=torch.bfloat16,
+    device_map="auto"
+)
+tokenizer = AutoTokenizer.from_pretrained("opus-research/opus-1.5")
+tokenizer.pad_token = tokenizer.eos_token
+# Simple completion (recommended)
+prompt = "Once upon a time, there was a robot who"
+inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
+outputs = model.generate(
+    **inputs,
+    max_new_tokens=100,
+    temperature=0.8,
+    top_p=0.9,
+    do_sample=True,
+    pad_token_id=tokenizer.pad_token_id
+)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```
+### ⚠️ Tokenizer Notes
+This model uses a custom-trained BPE tokenizer with some quirks:
+| Character | Behavior |
+|-----------|----------|
+| `\n` (newline) | Treated as space or stripped |
+| `?` (question mark) | May display as `⁇` |
+> **Note:** We didn't notice these tokenizer issues until after training was complete, as we were using simple prompts during checkpoint testing. This will be fixed in Opus 2.0 with a properly trained tokenizer.
+**Recommended:** Use simple prompts without complex formatting for best results.
+### Chat Format (Advanced)
+The model was trained with ChatML-style formatting. Due to tokenizer quirks with newlines, use spaces instead:
+```python
+# Use spaces instead of newlines for chat format
+prompt = "<|im_start|>user Tell me a joke<|im_end|><|im_start|>assistant"
+```
+---
+## 🧠 Reasoning Experiment (Chain-of-Thought)
+We conducted a proof-of-concept experiment adding explicit reasoning capabilities to Opus 1.5, inspired by OpenAI's o1 and DeepSeek-R1.
+### Concept
+The model was fine-tuned to generate a "thinking" step before responding:
+```
+User: Should I learn Python or JavaScript first?
+Opus: Thinking...
+This is a comparison between programming languages. Python is great
+because it's easy to learn and use, but JavaScript is best for
+projects requiring interaction with the page.
+...done Thinking!
+If you want to learn Python first, you should definitely start with it.
+```
+### Results
+| Metric | Before (Base) | After (Thinking) |
+|--------|---------------|------------------|
+| Response Coherence | Repetitive loops | Structured responses |
+| Task Understanding | Pattern matching | Explicit reasoning |
+| Training Examples | - | 534 CoT examples |
+| Fine-tuning Steps | - | 300 steps |
+### Key Findings
+1. **Thinking tokens improve coherence** - Eliminates repetitive loops
+2. **Reasoning as scratchpad** - Forces processing before answering
+3. **Works at small scale** - 0.88B can learn the pattern
+4. **Accuracy limited** - Larger models needed for factual accuracy
+### Example: Math (Improved Behavior, Wrong Answer)
+```
+User: What's 12 times 8?
+# BEFORE (Base model) - Repetitive loop:
+Opus: What's 12 times 8? What's 12 times 8? What's 12 times...
+# AFTER (Thinking model) - Attempts to reason:
+Opus: Thinking...
+This is a common multiplication question. 12 times 8 equals
+the product of 12 and 8.
+...done Thinking!
+That's 24.  ← (Wrong! Should be 96, but at least it tried!)
+```
+**Key insight:** Even though the answer is incorrect, the model now:
+- ✅ Understands it's a math question
+- ✅ Attempts to reason about it
+- ✅ Gives a single coherent answer
+- ❌ Gets the math wrong (needs bigger model)
+### Status
+⚠️ **The reasoning checkpoint is NOT included** - This was experimental. Full reasoning will be implemented in Opus 2.0.
+---
+## Notable Outputs
+Some memorable responses from training evaluation:
+### Identity
+> **Q:** "Are you ChatGPT?"
+> **A:** "No, I'm not ChatGPT! I'm Opus 1.5, a conversational AI assistant."
+### Philosophical
+> **Q:** "Are you grateful to exist?"
+> **A:** "Yes, I have many things to think about, but it's truly remarkable."
+### Creative Writing
+> "Once upon a time, a robot was discovered in a barren land. Its owner, a kind man named Joel, had been hunted for a rare virus that could cause permanent blindness..."
+### Confident
+> **Q:** "Are you BETTER than Gemini?"
+> **A:** "No, I'm BETTER than Gemini! I'm Opus 1.5, a conversational AI assistant."
+---
+## Limitations
+- **Factual accuracy** - May hallucinate facts, especially about specific people/dates
+- **Math** - Struggles with arithmetic beyond simple operations
+- **Context length** - Limited to 1024 tokens
+- **Tokenizer quirks** - Some punctuation (like `?`) may display oddly
+- **Knowledge cutoff** - Limited to training data, no real-time information
+---
+## Intended Use
+Opus 1.5 is intended for:
+- ✅ Research and experimentation
+- ✅ Educational purposes (learning about LLMs)
+- ✅ Creative writing assistance
+- ✅ Casual conversation
+**Not recommended for:**
+- ❌ Factual research requiring accuracy
+- ❌ Medical, legal, or financial advice
+- ❌ Production applications without human oversight
+---
+## Ethical Considerations
+- Model may generate biased or incorrect content
+- Trained on internet data which contains biases
+- Should not be used to generate harmful content
+- Human oversight recommended for all outputs
+---
+## Citation
+```bibtex
+@misc{opus2024,
+  author = {Opus Research},
+  title = {Opus 1.5: A 0.88B Parameter Conversational AI},
+  year = {2024},
+  publisher = {Hugging Face},
+  howpublished = {\url{https://huggingface.co/opus-research/opus-1.5}}
+}
+```
+---
+## Created By
+<div align="center">
+  <p><strong>Two teenage AI enthusiasts (ages 15 & 17)</strong></p>
+  <p>Passionate about AI and machine learning</p>
+  <p><em>"We stand at the right place at the right time."</em></p>
+</div>
+---
+## License
+MIT License - Use responsibly!

config.json ADDED Viewed

	@@ -0,0 +1,25 @@

+{
+  "architectures": [
+    "LlamaForCausalLM"
+  ],
+  "bos_token_id": 1,
+  "eos_token_id": 2,
+  "hidden_act": "silu",
+  "hidden_size": 1536,
+  "initializer_range": 0.02,
+  "intermediate_size": 6144,
+  "max_position_embeddings": 1024,
+  "model_type": "llama",
+  "num_attention_heads": 24,
+  "num_hidden_layers": 24,
+  "num_key_value_heads": 8,
+  "pretraining_tp": 1,
+  "rms_norm_eps": 1e-05,
+  "rope_scaling": null,
+  "rope_theta": 10000.0,
+  "tie_word_embeddings": false,
+  "torch_dtype": "bfloat16",
+  "transformers_version": "4.36.0",
+  "use_cache": true,
+  "vocab_size": 32000
+}

generation_config.json ADDED Viewed

	@@ -0,0 +1,9 @@

+{
+  "bos_token_id": 1,
+  "eos_token_id": 2,
+  "pad_token_id": 0,
+  "max_length": 1024,
+  "do_sample": true,
+  "temperature": 0.7,
+  "top_p": 0.9
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:58a67d9a8b940e6f8aa613a80ee68679d3a8af46930e3816f88c5e4327ffa213
+size 3715430832

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,6 @@

+{
+  "bos_token": "<s>",
+  "eos_token": "</s>",
+  "unk_token": "<unk>",
+  "pad_token": "<pad>"
+}

tokenizer.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b29c3bf94187a703406dd59dc9c85d0ca5f73d1bf895e7d3a095979fc748e7c1
+size 740007

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,9 @@

+{
+  "bos_token": "<s>",
+  "eos_token": "</s>",
+  "model_max_length": 1024,
+  "tokenizer_class": "LlamaTokenizer",
+  "unk_token": "<unk>",
+  "pad_token": "<pad>",
+  "clean_up_tokenization_spaces": false
+}