|
|
--- |
|
|
license: llama2 |
|
|
--- |
|
|
|
|
|
# Toy LLaMA-39M |
|
|
|
|
|
- This is a tiny LLaMA model pretrained on [Recag/Rp_C4_55](https://huggingface.co/datasets/Recag/Rp_C4_55), a small subset of C4 with `seq_len=512`. |
|
|
- Model architecture |
|
|
```json |
|
|
{ |
|
|
"hidden_size": 512, |
|
|
"intermediate_size": 2048, |
|
|
"max_position_embeddings": 2048, |
|
|
"num_attention_heads": 8, |
|
|
"num_hidden_layers": 2, |
|
|
"num_key_value_heads": 8 |
|
|
} |
|
|
``` |
|
|
- Load model and tokenizer: |
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
model = AutoModelForCausalLM.from_pretrained("Cheng98/llama-39m") |
|
|
tokenizer = AutoTokenizer.from_pretrained("Cheng98/llama-39m") |
|
|
``` |
|
|
- Training script: [huggingface/transformers/examples/pytorch/language-modeling/run_clm.py](https://github.com/huggingface/transformers/blob/e9476832942a19cf99354776ef112babc83c139a/examples/pytorch/language-modeling/run_clm.py) |
|
|
```python |
|
|
# "train" split is created from the last 95% samples of original "train" subset |
|
|
raw_datasets["validation"] = load_dataset("Recag/Rp_C4_55", split="train[5%:]") |
|
|
``` |
|
|
|
|
|
|
|
|
- Evaluation (`seq_len=512`): |
|
|
|
|
|
| Dataset | Eval loss | Perplexity | Accuracy | block_size | |
|
|
|----------------|-----------|------------|----------|------------| |
|
|
| Recag/Rp_C4_55 | 3.63 | 37.78 | 0.3561 | 512 | |
|
|
| Wikitext2 | 4.58 | 97.48 | 0.2719 | 512 | |
|
|
|
|
|
- Evaluation command (Wikitext2): |
|
|
```bash |
|
|
# Evaluation command |
|
|
python run_clm.py --model_name_or_path Cheng98/llama-39m \ |
|
|
--dataset_name wikitext \ |
|
|
--dataset_config_name wikitext-2-raw-v1 \ |
|
|
--block_size 512 \ |
|
|
--do_eval \ |
|
|
--output_dir ./results |
|
|
``` |
|
|
|
|
|
- Evaluation on Recag/Rp_C4_55 (`seq_len=512`): |
|
|
```python |
|
|
# "validation" split is created from the first 5% samples of original "train" subset |
|
|
raw_datasets["validation"] = load_dataset("Recag/Rp_C4_55", split="train[:5%]") |
|
|
``` |
|
|
Results |
|
|
```json |
|
|
{ |
|
|
"eval_accuracy": 0.3561766818954313, |
|
|
"eval_loss": 3.6318140029907227, |
|
|
"eval_runtime": 190.8411, |
|
|
"eval_samples": 19413, |
|
|
"eval_samples_per_second": 101.723, |
|
|
"eval_steps_per_second": 1.593, |
|
|
"perplexity": 37.7812898658763 |
|
|
} |
|
|
``` |
|
|
|
|
|
- Evaluation on Wikitext2 (`seq_len=512`): |
|
|
```json |
|
|
{ |
|
|
"eval_accuracy": 0.2718795201225219, |
|
|
"eval_loss": 4.579628944396973, |
|
|
"eval_runtime": 3.939, |
|
|
"eval_samples": 575, |
|
|
"eval_samples_per_second": 145.976, |
|
|
"eval_steps_per_second": 0.762, |
|
|
"perplexity": 97.47821765687856 |
|
|
} |
|
|
``` |
|
|
|