LightLM / README.md

Update README.md

552a8ca verified 11 months ago

2.35 kB

	---
	license: apache-2.0
	---

	## Overview
	LightLM is a series of 3 language models trained on open-access data (Cosmopedia v2). We present three configurations (one with Mixture-of-Experts and two without) that aim to optimize parameter distribution between Attention and Feed-Forward layers. Despite a relatively modest training corpus of ~28B tokens, these models approach or surpass performance of other models in their parameter range (e.g., MobileLLM, GPT-neo-125M).


	1. Model 1 ([Model Attn](https://huggingface.co/Virg1n/LightLM/tree/main/Model%20Attn))
	- Layers: 34
	- Attention dim: 832
	- FFN dim: 556
	- Context length: 1536

	2. Model 2 ([Model FFN](https://huggingface.co/Virg1n/LightLM/tree/main/Model%20FFN))
	- Layers: 32
	- Attention dim: 512
	- FFN dim: 512 × 4 = 2048
	- Context length: 1536

	3. Model 3 ([Model MoE 2+1](https://huggingface.co/Virg1n/LightLM/tree/main/Model%20MoE%202%2B1))
	- Layers: 32
	- Attention dim: 384 (experimental setting)
	- FFN: 2 routed experts + 1 shared expert
	- Each expert has 512 × 2 = 1024 hidden units
	- 100% of parameters are active; router assigns expert weights per token
	- Context length: 1024



	## Results
	\| Model \| #Params \| ARC-c \| WinoGrande \|
	\|----------------------\|-------------\|-----------\|----------------\|
	\| GPT-neo-125M \| 125M \| 24.8 \| 50.7 \|
	\| Pythia-160M \| 162M \| 25.3 \| 50.9 \|
	\| RWKV-169M \| 169M \| 25.3 \| 51.5 \|
	\| MobileLLM-125M \| 125M \| 27.1 \| 53.1 \|
	\| LightLM (Attn) \| 146M \| 25.1 \| 52.0 \|
	\| LightLM (FFN) \| 146M \| 27.2 \| 47.5 \|
	\| LightLM (MoE) \| 144M \| 26.3 \| 52.8 \|


	Example Output
	Prompt: `"Hello, I am a language model,"`
	```
	Hello, I am a language model, and I can help you learn more about the language you are interested in.
	Let's start with the basics.
	```
	```
	Hello, I am a language model, and I can help you learn some new words and phrases. Maybe you could try
	saying "hello" in English first, then move on to Spanish, ...
	```

	[🔗 View on GitHub](https://github.com/virg1n/LightLM)

	---
	license: apache-2.0
	---

	## Overview
	LightLM is a series of 3 language models trained on open-access data (Cosmopedia v2). We present three configurations (one with Mixture-of-Experts and two without) that aim to optimize parameter distribution between Attention and Feed-Forward layers. Despite a relatively modest training corpus of ~28B tokens, these models approach or surpass performance of other models in their parameter range (e.g., MobileLLM, GPT-neo-125M).


	1. Model 1 ([Model Attn](https://huggingface.co/Virg1n/LightLM/tree/main/Model%20Attn))
	- Layers: 34
	- Attention dim: 832
	- FFN dim: 556
	- Context length: 1536

	2. Model 2 ([Model FFN](https://huggingface.co/Virg1n/LightLM/tree/main/Model%20FFN))
	- Layers: 32
	- Attention dim: 512
	- FFN dim: 512 × 4 = 2048
	- Context length: 1536

	3. Model 3 ([Model MoE 2+1](https://huggingface.co/Virg1n/LightLM/tree/main/Model%20MoE%202%2B1))
	- Layers: 32
	- Attention dim: 384 (experimental setting)
	- FFN: 2 routed experts + 1 shared expert
	- Each expert has 512 × 2 = 1024 hidden units
	- 100% of parameters are active; router assigns expert weights per token
	- Context length: 1024



	## Results
	\| Model \| #Params \| ARC-c \| WinoGrande \|
	\|----------------------\|-------------\|-----------\|----------------\|
	\| GPT-neo-125M \| 125M \| 24.8 \| 50.7 \|
	\| Pythia-160M \| 162M \| 25.3 \| 50.9 \|
	\| RWKV-169M \| 169M \| 25.3 \| 51.5 \|
	\| MobileLLM-125M \| 125M \| 27.1 \| 53.1 \|
	\| LightLM (Attn) \| 146M \| 25.1 \| 52.0 \|
	\| LightLM (FFN) \| 146M \| 27.2 \| 47.5 \|
	\| LightLM (MoE) \| 144M \| 26.3 \| 52.8 \|


	Example Output
	Prompt: `"Hello, I am a language model,"`
	```
	Hello, I am a language model, and I can help you learn more about the language you are interested in.
	Let's start with the basics.
	```
	```
	Hello, I am a language model, and I can help you learn some new words and phrases. Maybe you could try
	saying "hello" in English first, then move on to Spanish, ...
	```

	[🔗 View on GitHub](https://github.com/virg1n/LightLM)