Virg1n
/

LightLM

Model card Files Files and versions

xet

Community

Virg1n commited on Mar 24, 2025

Commit

552a8ca

verified ·

1 Parent(s): 5df532c

Update README.md

Browse files

Files changed (1) hide show

README.md +54 -3

README.md CHANGED Viewed

@@ -1,3 +1,54 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+---
+## Overview
+LightLM is a series of 3 language models trained on open-access data (Cosmopedia v2). We present three configurations (one with Mixture-of-Experts and two without) that aim to optimize parameter distribution between Attention and Feed-Forward layers. Despite a relatively modest training corpus of ~28B tokens, these models approach or surpass performance of other models in their parameter range (e.g., MobileLLM, GPT-neo-125M).
+1. **Model 1 ([Model Attn](https://huggingface.co/Virg1n/LightLM/tree/main/Model%20Attn))**
+   - **Layers**: 34
+   - **Attention dim**: 832
+   - **FFN dim**: 556
+   - **Context length**: 1536
+2. **Model 2 ([Model FFN](https://huggingface.co/Virg1n/LightLM/tree/main/Model%20FFN))**
+   - **Layers**: 32
+   - **Attention dim**: 512
+   - **FFN dim**: 512 × 4 = 2048
+   - **Context length**: 1536
+3. **Model 3 ([Model MoE 2+1](https://huggingface.co/Virg1n/LightLM/tree/main/Model%20MoE%202%2B1))**
+   - **Layers**: 32
+   - **Attention dim**: 384 (experimental setting)
+   - **FFN**: 2 routed experts + 1 shared expert
+     - Each expert has 512 × 2 = 1024 hidden units
+     - 100% of parameters are active; router assigns expert weights per token
+   - **Context length**: 1024
+## Results
+| **Model**            | **#Params** | **ARC-c** | **WinoGrande** |
+|----------------------|-------------|-----------|----------------|
+| GPT-neo-125M         | 125M        | 24.8      | 50.7           |
+| Pythia-160M          | 162M        | 25.3      | 50.9           |
+| RWKV-169M            | 169M        | 25.3      | 51.5           |
+| MobileLLM-125M       | 125M        | 27.1      | 53.1           |
+| LightLM (Attn)       | 146M        | 25.1      | 52.0           |
+| LightLM (FFN)        | 146M        | 27.2      | 47.5           |
+| LightLM (MoE)        | 144M        | 26.3      | 52.8           |
+**Example Output**
+Prompt: `"Hello, I am a language model,"`
+```
+Hello, I am a language model, and I can help you learn more about the language you are interested in.
+Let's start with the basics.
+```
+```
+Hello, I am a language model, and I can help you learn some new words and phrases. Maybe you could try
+saying "hello" in English first, then move on to Spanish, ...
+```
+[🔗 View on GitHub](https://github.com/virg1n/LightLM)