LightLM / README.md
Virg1n's picture
Update README.md
552a8ca verified
metadata
license: apache-2.0

Overview

LightLM is a series of 3 language models trained on open-access data (Cosmopedia v2). We present three configurations (one with Mixture-of-Experts and two without) that aim to optimize parameter distribution between Attention and Feed-Forward layers. Despite a relatively modest training corpus of ~28B tokens, these models approach or surpass performance of other models in their parameter range (e.g., MobileLLM, GPT-neo-125M).

  1. Model 1 (Model Attn)

    • Layers: 34
    • Attention dim: 832
    • FFN dim: 556
    • Context length: 1536
  2. Model 2 (Model FFN)

    • Layers: 32
    • Attention dim: 512
    • FFN dim: 512 × 4 = 2048
    • Context length: 1536
  3. Model 3 (Model MoE 2+1)

    • Layers: 32
    • Attention dim: 384 (experimental setting)
    • FFN: 2 routed experts + 1 shared expert
      • Each expert has 512 × 2 = 1024 hidden units
      • 100% of parameters are active; router assigns expert weights per token
    • Context length: 1024

Results

Model #Params ARC-c WinoGrande
GPT-neo-125M 125M 24.8 50.7
Pythia-160M 162M 25.3 50.9
RWKV-169M 169M 25.3 51.5
MobileLLM-125M 125M 27.1 53.1
LightLM (Attn) 146M 25.1 52.0
LightLM (FFN) 146M 27.2 47.5
LightLM (MoE) 144M 26.3 52.8

Example Output
Prompt: "Hello, I am a language model,"

Hello, I am a language model, and I can help you learn more about the language you are interested in. 
Let's start with the basics.
Hello, I am a language model, and I can help you learn some new words and phrases. Maybe you could try 
saying "hello" in English first, then move on to Spanish, ...

🔗 View on GitHub