CocoEntertainment
/

CoALa-1-Pretuned

wikipedia-trained

Eval Results (legacy)

Model card Files Files and versions

CoCoGames commited on Jan 5

Commit

bdefb5d

·

verified ·

1 Parent(s): 8158156

Update README.md

Files changed (1) hide show

README.md +52 -1

README.md CHANGED Viewed

@@ -47,4 +47,55 @@ model-index:
     - name: Accuracy (Norm)
       type: acc_norm
       value: 26.96
----

     - name: Accuracy (Norm)
       type: acc_norm
       value: 26.96
+---
+# CoALa-1 (183M Multilingual Llama-Base)
+CoALa-1 is a highly efficient, multilingual base model with **183 million parameters**. Built on a modern **Llama-based architecture**, it is designed to deliver maximum performance in a compact size, making it one of the top-performing models in the sub-200M parameter class.
+## Key Highlights
+* **Architecture:** Llama-based (utilizing RoPE, RMSNorm, and SiLU) for superior stability and reasoning compared to older GPT-2 structures.
+* **Top 3 Performance:** In its weight class (<200M), CoALa-1 outperforms industry standards like Meta's OPT-125M and competes directly with OpenAI's GPT-2 Small.
+* **Multilingual Power:** Trained from scratch on high-quality Wikipedia data in **7 languages** (English, German, Spanish, French, Portuguese, Italian, Russian).
+* **Custom Tokenizer:** Features a 64,000 vocab Byte-level BPE tokenizer, optimized for multilingual efficiency.
+## ⚠️ Important Note: Base Model vs. Instruct Model
+CoALa-1 is a **Base Model (Pretrained)**. It has been trained to predict the next token on a massive Wikipedia corpus but has **not** yet undergone Instruction Fine-Tuning (SFT) or RLHF.
+**What this means for users:**
+- The model will **not** answer questions like a chatbot (e.g., "How are you?").
+- Instead, it will **continue a given text** in a neutral, encyclopedic style.
+## Evaluation Results
+CoALa-1 was evaluated using the `lm-evaluation-harness`.
+| Benchmark | Metric | CoALa-1 (183M) | GPT-2 (124M) | OPT-125M |
+|---|---|---|---|---|
+| **ARC-Easy** | acc_norm | **28.87%** | 27.00% | 24.50% |
+| **HellaSwag** | acc_norm | **26.96%** | 28.50% | 26.00% |
+## Technical Specifications
+* **Hidden Size:** 768
+* **Intermediate Size:** 2048
+* **Layers:** 12
+* **Attention Heads:** 12
+* **Context Length:** 2048 tokens
+* **Vocab Size:** 64,000
+## Usage & Licensing
+### License: All Rights Reserved
+This model is provided for **private, non-commercial use only**. Redistribution, modification (for the purpose of redistribution), and commercial usage are strictly prohibited.
+### How to Load
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model_name = "CocoEntertainment/CoALa-1-Pretuned"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForCausalLM.from_pretrained(model_name)
+```