File size: 2,346 Bytes
552a8ca
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
---
license: apache-2.0
---

## Overview
LightLM is a series of 3 language models trained on open-access data (Cosmopedia v2). We present three configurations (one with Mixture-of-Experts and two without) that aim to optimize parameter distribution between Attention and Feed-Forward layers. Despite a relatively modest training corpus of ~28B tokens, these models approach or surpass performance of other models in their parameter range (e.g., MobileLLM, GPT-neo-125M).


1. **Model 1 ([Model Attn](https://huggingface.co/Virg1n/LightLM/tree/main/Model%20Attn))**
   - **Layers**: 34  
   - **Attention dim**: 832  
   - **FFN dim**: 556  
   - **Context length**: 1536  

2. **Model 2 ([Model FFN](https://huggingface.co/Virg1n/LightLM/tree/main/Model%20FFN))**
   - **Layers**: 32  
   - **Attention dim**: 512  
   - **FFN dim**: 512 × 4 = 2048  
   - **Context length**: 1536  

3. **Model 3 ([Model MoE 2+1](https://huggingface.co/Virg1n/LightLM/tree/main/Model%20MoE%202%2B1))**  
   - **Layers**: 32  
   - **Attention dim**: 384 (experimental setting)  
   - **FFN**: 2 routed experts + 1 shared expert  
     - Each expert has 512 × 2 = 1024 hidden units  
     - 100% of parameters are active; router assigns expert weights per token  
   - **Context length**: 1024  



## Results
| **Model**            | **#Params** | **ARC-c** | **WinoGrande** |
|----------------------|-------------|-----------|----------------|
| GPT-neo-125M         | 125M        | 24.8      | 50.7           |
| Pythia-160M          | 162M        | 25.3      | 50.9           |
| RWKV-169M            | 169M        | 25.3      | 51.5           |
| MobileLLM-125M       | 125M        | 27.1      | 53.1           |
| LightLM (Attn)       | 146M        | 25.1      | 52.0           |
| LightLM (FFN)        | 146M        | 27.2      | 47.5           |
| LightLM (MoE)        | 144M        | 26.3      | 52.8           |


**Example Output**  
Prompt: `"Hello, I am a language model,"`  
```
Hello, I am a language model, and I can help you learn more about the language you are interested in. 
Let's start with the basics.
```
```
Hello, I am a language model, and I can help you learn some new words and phrases. Maybe you could try 
saying "hello" in English first, then move on to Spanish, ...
```

[🔗 View on GitHub](https://github.com/virg1n/LightLM)