CocoEntertainment
/

CoALa-1-Pretuned

wikipedia-trained

Eval Results (legacy)

Model card Files Files and versions

CoALa-1-Pretuned / README.md

CoCoGames's picture

Update README.md

745c475 verified about 1 month ago

|

history blame contribute delete

3.39 kB

	---
	language:
	- en
	- de
	- es
	- fr
	- pt
	- it
	- ru
	license: other
	license_name: all-rights-reserved
	license_link: LICENSE
	tags:
	- cocoai
	- base-model
	- 183M
	- llama
	- multilingual
	- wikipedia-trained
	model_name: "CoALa-1"
	model_type: llama
	datasets:
	- wikimedia/wikipedia
	metrics:
	- arc_easy
	- hellaswag
	model-index:
	- name: CoALa-1
	results:
	- task:
	type: text-generation
	name: Knowledge & Logic Evaluation
	dataset:
	name: ARC-Easy
	type: ai2_arc
	metrics:
	- name: Accuracy (Norm)
	type: acc_norm
	value: 28.87
	- task:
	type: text-generation
	name: Common Sense Reasoning
	dataset:
	name: HellaSwag
	type: hellaswag
	metrics:
	- name: Accuracy (Norm)
	type: acc_norm
	value: 26.96
	---

	# CoALa-1 (183M Multilingual Llama-Base)

	CoALa-1 is a highly efficient, multilingual base model with 183 million parameters. Built on a modern Llama-based architecture, it is designed to deliver maximum performance in a compact size, making it one of the top-performing models in the sub-200M parameter class.

	## Key Highlights

	* Architecture: Llama-based (utilizing RoPE, RMSNorm, and SiLU) for superior stability and reasoning compared to older GPT-2 structures.
	* Top 3 Performance: In its weight class (<200M), CoALa-1 outperforms industry standards like Meta's OPT-125M and competes directly with OpenAI's GPT-2 Small.
	* Multilingual Power: Trained from scratch on high-quality Wikipedia data in 7 languages (English, German, Spanish, French, Portuguese, Italian, Russian).
	* Custom Tokenizer: Features a 64,000 vocab Byte-level BPE tokenizer, optimized for multilingual efficiency.

	## ⚠️ Important Note: Base Model vs. Instruct Model
	CoALa-1 is a Base Model (Pretrained). It has been trained to predict the next token on a massive Wikipedia corpus but has not yet undergone Instruction Fine-Tuning (SFT) or RLHF.

	What this means for users:
	- The model will not answer questions like a chatbot (e.g., "How are you?").
	- Instead, it will continue a given text in a neutral, encyclopedic style.


	## Evaluation Results

	CoALa-1 was evaluated using the `lm-evaluation-harness`. It shows a strong performance in factual knowledge compared to other models in its weight class.

	\| Benchmark \| Metric \| CoALa-1 (183M) \| GPT-2 (124M) \| OPT-125M \|
	\|---\|---\|---\|---\|---\|
	\| ARC-Easy \| acc_norm \| 28.87% \| 27.00% \| 24.50% \|
	\| HellaSwag \| acc_norm \| 26.96% \| 28.50% \| 26.00% \|

	![Benchmark Comparison](benchmarks.png)

	> Figure 1: Comparison of ARC-Easy (Knowledge) and HellaSwag (Reasoning) scores. CoALa-1 leads in factual knowledge retrieval among sub-200M parameter models.

	## Technical Specifications

	* Hidden Size: 768
	* Intermediate Size: 2048
	* Layers: 12
	* Attention Heads: 12
	* Context Length: 2048 tokens
	* Vocab Size: 64,000

	## Usage & Licensing

	### License: All Rights Reserved
	This model is provided for private, non-commercial use only. Redistribution, modification (for the purpose of redistribution), and commercial usage are strictly prohibited.

	### How to Load
	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_name = "CocoEntertainment/CoALa-1-Pretuned"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForCausalLM.from_pretrained(model_name)
	```