File size: 3,391 Bytes
c04080f
 
 
 
 
 
 
 
 
 
 
 
 
8158156
c04080f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
bdefb5d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
745c475
bdefb5d
 
 
 
 
 
745c475
 
 
 
bdefb5d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
---
language:
- en
- de
- es
- fr
- pt
- it
- ru
license: other
license_name: all-rights-reserved
license_link: LICENSE
tags:
- cocoai
- base-model
- 183M
- llama
- multilingual
- wikipedia-trained
model_name: "CoALa-1"
model_type: llama
datasets:
- wikimedia/wikipedia
metrics:
- arc_easy
- hellaswag
model-index:
- name: CoALa-1
  results:
  - task:
      type: text-generation
      name: Knowledge & Logic Evaluation
    dataset:
      name: ARC-Easy
      type: ai2_arc
    metrics:
    - name: Accuracy (Norm)
      type: acc_norm
      value: 28.87
  - task:
      type: text-generation
      name: Common Sense Reasoning
    dataset:
      name: HellaSwag
      type: hellaswag
    metrics:
    - name: Accuracy (Norm)
      type: acc_norm
      value: 26.96
---

# CoALa-1 (183M Multilingual Llama-Base)

CoALa-1 is a highly efficient, multilingual base model with **183 million parameters**. Built on a modern **Llama-based architecture**, it is designed to deliver maximum performance in a compact size, making it one of the top-performing models in the sub-200M parameter class.

## Key Highlights

* **Architecture:** Llama-based (utilizing RoPE, RMSNorm, and SiLU) for superior stability and reasoning compared to older GPT-2 structures.
* **Top 3 Performance:** In its weight class (<200M), CoALa-1 outperforms industry standards like Meta's OPT-125M and competes directly with OpenAI's GPT-2 Small.
* **Multilingual Power:** Trained from scratch on high-quality Wikipedia data in **7 languages** (English, German, Spanish, French, Portuguese, Italian, Russian).
* **Custom Tokenizer:** Features a 64,000 vocab Byte-level BPE tokenizer, optimized for multilingual efficiency.

## ⚠️ Important Note: Base Model vs. Instruct Model
CoALa-1 is a **Base Model (Pretrained)**. It has been trained to predict the next token on a massive Wikipedia corpus but has **not** yet undergone Instruction Fine-Tuning (SFT) or RLHF. 

**What this means for users:**
- The model will **not** answer questions like a chatbot (e.g., "How are you?").
- Instead, it will **continue a given text** in a neutral, encyclopedic style.


## Evaluation Results

CoALa-1 was evaluated using the `lm-evaluation-harness`. It shows a strong performance in factual knowledge compared to other models in its weight class.

| Benchmark | Metric | CoALa-1 (183M) | GPT-2 (124M) | OPT-125M |
|---|---|---|---|---|
| **ARC-Easy** | acc_norm | **28.87%** | 27.00% | 24.50% |
| **HellaSwag** | acc_norm | **26.96%** | 28.50% | 26.00% |

![Benchmark Comparison](benchmarks.png)

> **Figure 1:** Comparison of ARC-Easy (Knowledge) and HellaSwag (Reasoning) scores. CoALa-1 leads in factual knowledge retrieval among sub-200M parameter models.

## Technical Specifications

* **Hidden Size:** 768
* **Intermediate Size:** 2048
* **Layers:** 12
* **Attention Heads:** 12
* **Context Length:** 2048 tokens
* **Vocab Size:** 64,000

## Usage & Licensing

### License: All Rights Reserved
This model is provided for **private, non-commercial use only**. Redistribution, modification (for the purpose of redistribution), and commercial usage are strictly prohibited.

### How to Load
```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "CocoEntertainment/CoALa-1-Pretuned"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
```