|
|
--- |
|
|
language: |
|
|
- en |
|
|
- de |
|
|
- es |
|
|
- fr |
|
|
- pt |
|
|
- it |
|
|
- ru |
|
|
license: other |
|
|
license_name: all-rights-reserved |
|
|
license_link: LICENSE |
|
|
tags: |
|
|
- cocoai |
|
|
- base-model |
|
|
- 183M |
|
|
- llama |
|
|
- multilingual |
|
|
- wikipedia-trained |
|
|
model_name: "CoALa-1" |
|
|
model_type: llama |
|
|
datasets: |
|
|
- wikimedia/wikipedia |
|
|
metrics: |
|
|
- arc_easy |
|
|
- hellaswag |
|
|
model-index: |
|
|
- name: CoALa-1 |
|
|
results: |
|
|
- task: |
|
|
type: text-generation |
|
|
name: Knowledge & Logic Evaluation |
|
|
dataset: |
|
|
name: ARC-Easy |
|
|
type: ai2_arc |
|
|
metrics: |
|
|
- name: Accuracy (Norm) |
|
|
type: acc_norm |
|
|
value: 28.87 |
|
|
- task: |
|
|
type: text-generation |
|
|
name: Common Sense Reasoning |
|
|
dataset: |
|
|
name: HellaSwag |
|
|
type: hellaswag |
|
|
metrics: |
|
|
- name: Accuracy (Norm) |
|
|
type: acc_norm |
|
|
value: 26.96 |
|
|
--- |
|
|
|
|
|
# CoALa-1 (183M Multilingual Llama-Base) |
|
|
|
|
|
CoALa-1 is a highly efficient, multilingual base model with **183 million parameters**. Built on a modern **Llama-based architecture**, it is designed to deliver maximum performance in a compact size, making it one of the top-performing models in the sub-200M parameter class. |
|
|
|
|
|
## Key Highlights |
|
|
|
|
|
* **Architecture:** Llama-based (utilizing RoPE, RMSNorm, and SiLU) for superior stability and reasoning compared to older GPT-2 structures. |
|
|
* **Top 3 Performance:** In its weight class (<200M), CoALa-1 outperforms industry standards like Meta's OPT-125M and competes directly with OpenAI's GPT-2 Small. |
|
|
* **Multilingual Power:** Trained from scratch on high-quality Wikipedia data in **7 languages** (English, German, Spanish, French, Portuguese, Italian, Russian). |
|
|
* **Custom Tokenizer:** Features a 64,000 vocab Byte-level BPE tokenizer, optimized for multilingual efficiency. |
|
|
|
|
|
## ⚠️ Important Note: Base Model vs. Instruct Model |
|
|
CoALa-1 is a **Base Model (Pretrained)**. It has been trained to predict the next token on a massive Wikipedia corpus but has **not** yet undergone Instruction Fine-Tuning (SFT) or RLHF. |
|
|
|
|
|
**What this means for users:** |
|
|
- The model will **not** answer questions like a chatbot (e.g., "How are you?"). |
|
|
- Instead, it will **continue a given text** in a neutral, encyclopedic style. |
|
|
|
|
|
|
|
|
## Evaluation Results |
|
|
|
|
|
CoALa-1 was evaluated using the `lm-evaluation-harness`. It shows a strong performance in factual knowledge compared to other models in its weight class. |
|
|
|
|
|
| Benchmark | Metric | CoALa-1 (183M) | GPT-2 (124M) | OPT-125M | |
|
|
|---|---|---|---|---| |
|
|
| **ARC-Easy** | acc_norm | **28.87%** | 27.00% | 24.50% | |
|
|
| **HellaSwag** | acc_norm | **26.96%** | 28.50% | 26.00% | |
|
|
|
|
|
 |
|
|
|
|
|
> **Figure 1:** Comparison of ARC-Easy (Knowledge) and HellaSwag (Reasoning) scores. CoALa-1 leads in factual knowledge retrieval among sub-200M parameter models. |
|
|
|
|
|
## Technical Specifications |
|
|
|
|
|
* **Hidden Size:** 768 |
|
|
* **Intermediate Size:** 2048 |
|
|
* **Layers:** 12 |
|
|
* **Attention Heads:** 12 |
|
|
* **Context Length:** 2048 tokens |
|
|
* **Vocab Size:** 64,000 |
|
|
|
|
|
## Usage & Licensing |
|
|
|
|
|
### License: All Rights Reserved |
|
|
This model is provided for **private, non-commercial use only**. Redistribution, modification (for the purpose of redistribution), and commercial usage are strictly prohibited. |
|
|
|
|
|
### How to Load |
|
|
```python |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
|
|
model_name = "CocoEntertainment/CoALa-1-Pretuned" |
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
|
model = AutoModelForCausalLM.from_pretrained(model_name) |
|
|
``` |