|
|
--- |
|
|
license: apache-2.0 |
|
|
--- |
|
|
|
|
|
## Overview |
|
|
LightLM is a series of 3 language models trained on open-access data (Cosmopedia v2). We present three configurations (one with Mixture-of-Experts and two without) that aim to optimize parameter distribution between Attention and Feed-Forward layers. Despite a relatively modest training corpus of ~28B tokens, these models approach or surpass performance of other models in their parameter range (e.g., MobileLLM, GPT-neo-125M). |
|
|
|
|
|
|
|
|
1. **Model 1 ([Model Attn](https://huggingface.co/Virg1n/LightLM/tree/main/Model%20Attn))** |
|
|
- **Layers**: 34 |
|
|
- **Attention dim**: 832 |
|
|
- **FFN dim**: 556 |
|
|
- **Context length**: 1536 |
|
|
|
|
|
2. **Model 2 ([Model FFN](https://huggingface.co/Virg1n/LightLM/tree/main/Model%20FFN))** |
|
|
- **Layers**: 32 |
|
|
- **Attention dim**: 512 |
|
|
- **FFN dim**: 512 × 4 = 2048 |
|
|
- **Context length**: 1536 |
|
|
|
|
|
3. **Model 3 ([Model MoE 2+1](https://huggingface.co/Virg1n/LightLM/tree/main/Model%20MoE%202%2B1))** |
|
|
- **Layers**: 32 |
|
|
- **Attention dim**: 384 (experimental setting) |
|
|
- **FFN**: 2 routed experts + 1 shared expert |
|
|
- Each expert has 512 × 2 = 1024 hidden units |
|
|
- 100% of parameters are active; router assigns expert weights per token |
|
|
- **Context length**: 1024 |
|
|
|
|
|
|
|
|
|
|
|
## Results |
|
|
| **Model** | **#Params** | **ARC-c** | **WinoGrande** | |
|
|
|----------------------|-------------|-----------|----------------| |
|
|
| GPT-neo-125M | 125M | 24.8 | 50.7 | |
|
|
| Pythia-160M | 162M | 25.3 | 50.9 | |
|
|
| RWKV-169M | 169M | 25.3 | 51.5 | |
|
|
| MobileLLM-125M | 125M | 27.1 | 53.1 | |
|
|
| LightLM (Attn) | 146M | 25.1 | 52.0 | |
|
|
| LightLM (FFN) | 146M | 27.2 | 47.5 | |
|
|
| LightLM (MoE) | 144M | 26.3 | 52.8 | |
|
|
|
|
|
|
|
|
**Example Output** |
|
|
Prompt: `"Hello, I am a language model,"` |
|
|
``` |
|
|
Hello, I am a language model, and I can help you learn more about the language you are interested in. |
|
|
Let's start with the basics. |
|
|
``` |
|
|
``` |
|
|
Hello, I am a language model, and I can help you learn some new words and phrases. Maybe you could try |
|
|
saying "hello" in English first, then move on to Spanish, ... |
|
|
``` |
|
|
|
|
|
[🔗 View on GitHub](https://github.com/virg1n/LightLM) |
|
|
|