--- language: - en - de - es - fr - pt - it - ru license: other license_name: all-rights-reserved license_link: LICENSE tags: - cocoai - base-model - 183M - llama - multilingual - wikipedia-trained model_name: "CoALa-1" model_type: llama datasets: - wikimedia/wikipedia metrics: - arc_easy - hellaswag model-index: - name: CoALa-1 results: - task: type: text-generation name: Knowledge & Logic Evaluation dataset: name: ARC-Easy type: ai2_arc metrics: - name: Accuracy (Norm) type: acc_norm value: 28.87 - task: type: text-generation name: Common Sense Reasoning dataset: name: HellaSwag type: hellaswag metrics: - name: Accuracy (Norm) type: acc_norm value: 26.96 --- # CoALa-1 (183M Multilingual Llama-Base) CoALa-1 is a highly efficient, multilingual base model with **183 million parameters**. Built on a modern **Llama-based architecture**, it is designed to deliver maximum performance in a compact size, making it one of the top-performing models in the sub-200M parameter class. ## Key Highlights * **Architecture:** Llama-based (utilizing RoPE, RMSNorm, and SiLU) for superior stability and reasoning compared to older GPT-2 structures. * **Top 3 Performance:** In its weight class (<200M), CoALa-1 outperforms industry standards like Meta's OPT-125M and competes directly with OpenAI's GPT-2 Small. * **Multilingual Power:** Trained from scratch on high-quality Wikipedia data in **7 languages** (English, German, Spanish, French, Portuguese, Italian, Russian). * **Custom Tokenizer:** Features a 64,000 vocab Byte-level BPE tokenizer, optimized for multilingual efficiency. ## ⚠️ Important Note: Base Model vs. Instruct Model CoALa-1 is a **Base Model (Pretrained)**. It has been trained to predict the next token on a massive Wikipedia corpus but has **not** yet undergone Instruction Fine-Tuning (SFT) or RLHF. **What this means for users:** - The model will **not** answer questions like a chatbot (e.g., "How are you?"). - Instead, it will **continue a given text** in a neutral, encyclopedic style. ## Evaluation Results CoALa-1 was evaluated using the `lm-evaluation-harness`. It shows a strong performance in factual knowledge compared to other models in its weight class. | Benchmark | Metric | CoALa-1 (183M) | GPT-2 (124M) | OPT-125M | |---|---|---|---|---| | **ARC-Easy** | acc_norm | **28.87%** | 27.00% | 24.50% | | **HellaSwag** | acc_norm | **26.96%** | 28.50% | 26.00% | ![Benchmark Comparison](benchmarks.png) > **Figure 1:** Comparison of ARC-Easy (Knowledge) and HellaSwag (Reasoning) scores. CoALa-1 leads in factual knowledge retrieval among sub-200M parameter models. ## Technical Specifications * **Hidden Size:** 768 * **Intermediate Size:** 2048 * **Layers:** 12 * **Attention Heads:** 12 * **Context Length:** 2048 tokens * **Vocab Size:** 64,000 ## Usage & Licensing ### License: All Rights Reserved This model is provided for **private, non-commercial use only**. Redistribution, modification (for the purpose of redistribution), and commercial usage are strictly prohibited. ### How to Load ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "CocoEntertainment/CoALa-1-Pretuned" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name) ```