llama-39m / README.md
Cheng98's picture
Update README.md
7add242 verified
---
license: llama2
---
# Toy LLaMA-39M
- This is a tiny LLaMA model pretrained on [Recag/Rp_C4_55](https://huggingface.co/datasets/Recag/Rp_C4_55), a small subset of C4 with `seq_len=512`.
- Model architecture
```json
{
"hidden_size": 512,
"intermediate_size": 2048,
"max_position_embeddings": 2048,
"num_attention_heads": 8,
"num_hidden_layers": 2,
"num_key_value_heads": 8
}
```
- Load model and tokenizer:
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("Cheng98/llama-39m")
tokenizer = AutoTokenizer.from_pretrained("Cheng98/llama-39m")
```
- Training script: [huggingface/transformers/examples/pytorch/language-modeling/run_clm.py](https://github.com/huggingface/transformers/blob/e9476832942a19cf99354776ef112babc83c139a/examples/pytorch/language-modeling/run_clm.py)
```python
# "train" split is created from the last 95% samples of original "train" subset
raw_datasets["validation"] = load_dataset("Recag/Rp_C4_55", split="train[5%:]")
```
- Evaluation (`seq_len=512`):
| Dataset | Eval loss | Perplexity | Accuracy | block_size |
|----------------|-----------|------------|----------|------------|
| Recag/Rp_C4_55 | 3.63 | 37.78 | 0.3561 | 512 |
| Wikitext2 | 4.58 | 97.48 | 0.2719 | 512 |
- Evaluation command (Wikitext2):
```bash
# Evaluation command
python run_clm.py --model_name_or_path Cheng98/llama-39m \
--dataset_name wikitext \
--dataset_config_name wikitext-2-raw-v1 \
--block_size 512 \
--do_eval \
--output_dir ./results
```
- Evaluation on Recag/Rp_C4_55 (`seq_len=512`):
```python
# "validation" split is created from the first 5% samples of original "train" subset
raw_datasets["validation"] = load_dataset("Recag/Rp_C4_55", split="train[:5%]")
```
Results
```json
{
"eval_accuracy": 0.3561766818954313,
"eval_loss": 3.6318140029907227,
"eval_runtime": 190.8411,
"eval_samples": 19413,
"eval_samples_per_second": 101.723,
"eval_steps_per_second": 1.593,
"perplexity": 37.7812898658763
}
```
- Evaluation on Wikitext2 (`seq_len=512`):
```json
{
"eval_accuracy": 0.2718795201225219,
"eval_loss": 4.579628944396973,
"eval_runtime": 3.939,
"eval_samples": 575,
"eval_samples_per_second": 145.976,
"eval_steps_per_second": 0.762,
"perplexity": 97.47821765687856
}
```