CoALa-1-Pretuned / README.md

CoCoGames

Update README.md

745c475 verified about 1 month ago

preview code

raw

history blame contribute delete

3.39 kB

metadata

language:
  - en
  - de
  - es
  - fr
  - pt
  - it
  - ru
license: other
license_name: all-rights-reserved
license_link: LICENSE
tags:
  - cocoai
  - base-model
  - 183M
  - llama
  - multilingual
  - wikipedia-trained
model_name: CoALa-1
model_type: llama
datasets:
  - wikimedia/wikipedia
metrics:
  - arc_easy
  - hellaswag
model-index:
  - name: CoALa-1
    results:
      - task:
          type: text-generation
          name: Knowledge & Logic Evaluation
        dataset:
          name: ARC-Easy
          type: ai2_arc
        metrics:
          - name: Accuracy (Norm)
            type: acc_norm
            value: 28.87
      - task:
          type: text-generation
          name: Common Sense Reasoning
        dataset:
          name: HellaSwag
          type: hellaswag
        metrics:
          - name: Accuracy (Norm)
            type: acc_norm
            value: 26.96

CoALa-1 (183M Multilingual Llama-Base)

CoALa-1 is a highly efficient, multilingual base model with 183 million parameters. Built on a modern Llama-based architecture, it is designed to deliver maximum performance in a compact size, making it one of the top-performing models in the sub-200M parameter class.

Key Highlights

Architecture: Llama-based (utilizing RoPE, RMSNorm, and SiLU) for superior stability and reasoning compared to older GPT-2 structures.
Top 3 Performance: In its weight class (<200M), CoALa-1 outperforms industry standards like Meta's OPT-125M and competes directly with OpenAI's GPT-2 Small.
Multilingual Power: Trained from scratch on high-quality Wikipedia data in 7 languages (English, German, Spanish, French, Portuguese, Italian, Russian).
Custom Tokenizer: Features a 64,000 vocab Byte-level BPE tokenizer, optimized for multilingual efficiency.

⚠️ Important Note: Base Model vs. Instruct Model

CoALa-1 is a Base Model (Pretrained). It has been trained to predict the next token on a massive Wikipedia corpus but has not yet undergone Instruction Fine-Tuning (SFT) or RLHF.

What this means for users:

The model will not answer questions like a chatbot (e.g., "How are you?").
Instead, it will continue a given text in a neutral, encyclopedic style.

Evaluation Results

CoALa-1 was evaluated using the lm-evaluation-harness. It shows a strong performance in factual knowledge compared to other models in its weight class.

Benchmark	Metric	CoALa-1 (183M)	GPT-2 (124M)	OPT-125M
ARC-Easy	acc_norm	28.87%	27.00%	24.50%
HellaSwag	acc_norm	26.96%	28.50%	26.00%

Figure 1: Comparison of ARC-Easy (Knowledge) and HellaSwag (Reasoning) scores. CoALa-1 leads in factual knowledge retrieval among sub-200M parameter models.

Technical Specifications

Hidden Size: 768
Intermediate Size: 2048
Layers: 12
Attention Heads: 12
Context Length: 2048 tokens
Vocab Size: 64,000

Usage & Licensing

This model is provided for private, non-commercial use only. Redistribution, modification (for the purpose of redistribution), and commercial usage are strictly prohibited.

How to Load

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "CocoEntertainment/CoALa-1-Pretuned"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

CocoEntertainment
/

CoALa-1-Pretuned