Virg1n commited on
Commit
552a8ca
·
verified ·
1 Parent(s): 5df532c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +54 -3
README.md CHANGED
@@ -1,3 +1,54 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+
5
+ ## Overview
6
+ LightLM is a series of 3 language models trained on open-access data (Cosmopedia v2). We present three configurations (one with Mixture-of-Experts and two without) that aim to optimize parameter distribution between Attention and Feed-Forward layers. Despite a relatively modest training corpus of ~28B tokens, these models approach or surpass performance of other models in their parameter range (e.g., MobileLLM, GPT-neo-125M).
7
+
8
+
9
+ 1. **Model 1 ([Model Attn](https://huggingface.co/Virg1n/LightLM/tree/main/Model%20Attn))**
10
+ - **Layers**: 34
11
+ - **Attention dim**: 832
12
+ - **FFN dim**: 556
13
+ - **Context length**: 1536
14
+
15
+ 2. **Model 2 ([Model FFN](https://huggingface.co/Virg1n/LightLM/tree/main/Model%20FFN))**
16
+ - **Layers**: 32
17
+ - **Attention dim**: 512
18
+ - **FFN dim**: 512 × 4 = 2048
19
+ - **Context length**: 1536
20
+
21
+ 3. **Model 3 ([Model MoE 2+1](https://huggingface.co/Virg1n/LightLM/tree/main/Model%20MoE%202%2B1))**
22
+ - **Layers**: 32
23
+ - **Attention dim**: 384 (experimental setting)
24
+ - **FFN**: 2 routed experts + 1 shared expert
25
+ - Each expert has 512 × 2 = 1024 hidden units
26
+ - 100% of parameters are active; router assigns expert weights per token
27
+ - **Context length**: 1024
28
+
29
+
30
+
31
+ ## Results
32
+ | **Model** | **#Params** | **ARC-c** | **WinoGrande** |
33
+ |----------------------|-------------|-----------|----------------|
34
+ | GPT-neo-125M | 125M | 24.8 | 50.7 |
35
+ | Pythia-160M | 162M | 25.3 | 50.9 |
36
+ | RWKV-169M | 169M | 25.3 | 51.5 |
37
+ | MobileLLM-125M | 125M | 27.1 | 53.1 |
38
+ | LightLM (Attn) | 146M | 25.1 | 52.0 |
39
+ | LightLM (FFN) | 146M | 27.2 | 47.5 |
40
+ | LightLM (MoE) | 144M | 26.3 | 52.8 |
41
+
42
+
43
+ **Example Output**
44
+ Prompt: `"Hello, I am a language model,"`
45
+ ```
46
+ Hello, I am a language model, and I can help you learn more about the language you are interested in.
47
+ Let's start with the basics.
48
+ ```
49
+ ```
50
+ Hello, I am a language model, and I can help you learn some new words and phrases. Maybe you could try
51
+ saying "hello" in English first, then move on to Spanish, ...
52
+ ```
53
+
54
+ [🔗 View on GitHub](https://github.com/virg1n/LightLM)