CoCoGames commited on
Commit
bdefb5d
·
verified ·
1 Parent(s): 8158156

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +52 -1
README.md CHANGED
@@ -47,4 +47,55 @@ model-index:
47
  - name: Accuracy (Norm)
48
  type: acc_norm
49
  value: 26.96
50
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
47
  - name: Accuracy (Norm)
48
  type: acc_norm
49
  value: 26.96
50
+ ---
51
+
52
+ # CoALa-1 (183M Multilingual Llama-Base)
53
+
54
+ CoALa-1 is a highly efficient, multilingual base model with **183 million parameters**. Built on a modern **Llama-based architecture**, it is designed to deliver maximum performance in a compact size, making it one of the top-performing models in the sub-200M parameter class.
55
+
56
+ ## Key Highlights
57
+
58
+ * **Architecture:** Llama-based (utilizing RoPE, RMSNorm, and SiLU) for superior stability and reasoning compared to older GPT-2 structures.
59
+ * **Top 3 Performance:** In its weight class (<200M), CoALa-1 outperforms industry standards like Meta's OPT-125M and competes directly with OpenAI's GPT-2 Small.
60
+ * **Multilingual Power:** Trained from scratch on high-quality Wikipedia data in **7 languages** (English, German, Spanish, French, Portuguese, Italian, Russian).
61
+ * **Custom Tokenizer:** Features a 64,000 vocab Byte-level BPE tokenizer, optimized for multilingual efficiency.
62
+
63
+ ## ⚠️ Important Note: Base Model vs. Instruct Model
64
+ CoALa-1 is a **Base Model (Pretrained)**. It has been trained to predict the next token on a massive Wikipedia corpus but has **not** yet undergone Instruction Fine-Tuning (SFT) or RLHF.
65
+
66
+ **What this means for users:**
67
+ - The model will **not** answer questions like a chatbot (e.g., "How are you?").
68
+ - Instead, it will **continue a given text** in a neutral, encyclopedic style.
69
+
70
+
71
+ ## Evaluation Results
72
+
73
+ CoALa-1 was evaluated using the `lm-evaluation-harness`.
74
+
75
+ | Benchmark | Metric | CoALa-1 (183M) | GPT-2 (124M) | OPT-125M |
76
+ |---|---|---|---|---|
77
+ | **ARC-Easy** | acc_norm | **28.87%** | 27.00% | 24.50% |
78
+ | **HellaSwag** | acc_norm | **26.96%** | 28.50% | 26.00% |
79
+
80
+ ## Technical Specifications
81
+
82
+ * **Hidden Size:** 768
83
+ * **Intermediate Size:** 2048
84
+ * **Layers:** 12
85
+ * **Attention Heads:** 12
86
+ * **Context Length:** 2048 tokens
87
+ * **Vocab Size:** 64,000
88
+
89
+ ## Usage & Licensing
90
+
91
+ ### License: All Rights Reserved
92
+ This model is provided for **private, non-commercial use only**. Redistribution, modification (for the purpose of redistribution), and commercial usage are strictly prohibited.
93
+
94
+ ### How to Load
95
+ ```python
96
+ from transformers import AutoModelForCausalLM, AutoTokenizer
97
+
98
+ model_name = "CocoEntertainment/CoALa-1-Pretuned"
99
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
100
+ model = AutoModelForCausalLM.from_pretrained(model_name)
101
+ ```