Update README.md
Browse files
README.md
CHANGED
|
@@ -47,4 +47,55 @@ model-index:
|
|
| 47 |
- name: Accuracy (Norm)
|
| 48 |
type: acc_norm
|
| 49 |
value: 26.96
|
| 50 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 47 |
- name: Accuracy (Norm)
|
| 48 |
type: acc_norm
|
| 49 |
value: 26.96
|
| 50 |
+
---
|
| 51 |
+
|
| 52 |
+
# CoALa-1 (183M Multilingual Llama-Base)
|
| 53 |
+
|
| 54 |
+
CoALa-1 is a highly efficient, multilingual base model with **183 million parameters**. Built on a modern **Llama-based architecture**, it is designed to deliver maximum performance in a compact size, making it one of the top-performing models in the sub-200M parameter class.
|
| 55 |
+
|
| 56 |
+
## Key Highlights
|
| 57 |
+
|
| 58 |
+
* **Architecture:** Llama-based (utilizing RoPE, RMSNorm, and SiLU) for superior stability and reasoning compared to older GPT-2 structures.
|
| 59 |
+
* **Top 3 Performance:** In its weight class (<200M), CoALa-1 outperforms industry standards like Meta's OPT-125M and competes directly with OpenAI's GPT-2 Small.
|
| 60 |
+
* **Multilingual Power:** Trained from scratch on high-quality Wikipedia data in **7 languages** (English, German, Spanish, French, Portuguese, Italian, Russian).
|
| 61 |
+
* **Custom Tokenizer:** Features a 64,000 vocab Byte-level BPE tokenizer, optimized for multilingual efficiency.
|
| 62 |
+
|
| 63 |
+
## ⚠️ Important Note: Base Model vs. Instruct Model
|
| 64 |
+
CoALa-1 is a **Base Model (Pretrained)**. It has been trained to predict the next token on a massive Wikipedia corpus but has **not** yet undergone Instruction Fine-Tuning (SFT) or RLHF.
|
| 65 |
+
|
| 66 |
+
**What this means for users:**
|
| 67 |
+
- The model will **not** answer questions like a chatbot (e.g., "How are you?").
|
| 68 |
+
- Instead, it will **continue a given text** in a neutral, encyclopedic style.
|
| 69 |
+
|
| 70 |
+
|
| 71 |
+
## Evaluation Results
|
| 72 |
+
|
| 73 |
+
CoALa-1 was evaluated using the `lm-evaluation-harness`.
|
| 74 |
+
|
| 75 |
+
| Benchmark | Metric | CoALa-1 (183M) | GPT-2 (124M) | OPT-125M |
|
| 76 |
+
|---|---|---|---|---|
|
| 77 |
+
| **ARC-Easy** | acc_norm | **28.87%** | 27.00% | 24.50% |
|
| 78 |
+
| **HellaSwag** | acc_norm | **26.96%** | 28.50% | 26.00% |
|
| 79 |
+
|
| 80 |
+
## Technical Specifications
|
| 81 |
+
|
| 82 |
+
* **Hidden Size:** 768
|
| 83 |
+
* **Intermediate Size:** 2048
|
| 84 |
+
* **Layers:** 12
|
| 85 |
+
* **Attention Heads:** 12
|
| 86 |
+
* **Context Length:** 2048 tokens
|
| 87 |
+
* **Vocab Size:** 64,000
|
| 88 |
+
|
| 89 |
+
## Usage & Licensing
|
| 90 |
+
|
| 91 |
+
### License: All Rights Reserved
|
| 92 |
+
This model is provided for **private, non-commercial use only**. Redistribution, modification (for the purpose of redistribution), and commercial usage are strictly prohibited.
|
| 93 |
+
|
| 94 |
+
### How to Load
|
| 95 |
+
```python
|
| 96 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 97 |
+
|
| 98 |
+
model_name = "CocoEntertainment/CoALa-1-Pretuned"
|
| 99 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
| 100 |
+
model = AutoModelForCausalLM.from_pretrained(model_name)
|
| 101 |
+
```
|