Cheng98
/

llama-39m

Text Generation

text-generation-inference

Model card Files Files and versions

llama-39m / README.md

Cheng98's picture

Update README.md

7add242 verified almost 2 years ago

|

history blame contribute delete

2.73 kB

	---
	license: llama2
	---

	# Toy LLaMA-39M

	- This is a tiny LLaMA model pretrained on [Recag/Rp_C4_55](https://huggingface.co/datasets/Recag/Rp_C4_55), a small subset of C4 with `seq_len=512`.
	- Model architecture
	```json
	{
	"hidden_size": 512,
	"intermediate_size": 2048,
	"max_position_embeddings": 2048,
	"num_attention_heads": 8,
	"num_hidden_layers": 2,
	"num_key_value_heads": 8
	}
	```
	- Load model and tokenizer:
	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM
	model = AutoModelForCausalLM.from_pretrained("Cheng98/llama-39m")
	tokenizer = AutoTokenizer.from_pretrained("Cheng98/llama-39m")
	```
	- Training script: [huggingface/transformers/examples/pytorch/language-modeling/run_clm.py](https://github.com/huggingface/transformers/blob/e9476832942a19cf99354776ef112babc83c139a/examples/pytorch/language-modeling/run_clm.py)
	```python
	# "train" split is created from the last 95% samples of original "train" subset
	raw_datasets["validation"] = load_dataset("Recag/Rp_C4_55", split="train[5%:]")
	```


	- Evaluation (`seq_len=512`):

	\| Dataset \| Eval loss \| Perplexity \| Accuracy \| block_size \|
	\|----------------\|-----------\|------------\|----------\|------------\|
	\| Recag/Rp_C4_55 \| 3.63 \| 37.78 \| 0.3561 \| 512 \|
	\| Wikitext2 \| 4.58 \| 97.48 \| 0.2719 \| 512 \|

	- Evaluation command (Wikitext2):
	```bash
	# Evaluation command
	python run_clm.py --model_name_or_path Cheng98/llama-39m \
	--dataset_name wikitext \
	--dataset_config_name wikitext-2-raw-v1 \
	--block_size 512 \
	--do_eval \
	--output_dir ./results
	```

	- Evaluation on Recag/Rp_C4_55 (`seq_len=512`):
	```python
	# "validation" split is created from the first 5% samples of original "train" subset
	raw_datasets["validation"] = load_dataset("Recag/Rp_C4_55", split="train[:5%]")
	```
	Results
	```json
	{
	"eval_accuracy": 0.3561766818954313,
	"eval_loss": 3.6318140029907227,
	"eval_runtime": 190.8411,
	"eval_samples": 19413,
	"eval_samples_per_second": 101.723,
	"eval_steps_per_second": 1.593,
	"perplexity": 37.7812898658763
	}
	```

	- Evaluation on Wikitext2 (`seq_len=512`):
	```json
	{
	"eval_accuracy": 0.2718795201225219,
	"eval_loss": 4.579628944396973,
	"eval_runtime": 3.939,
	"eval_samples": 575,
	"eval_samples_per_second": 145.976,
	"eval_steps_per_second": 0.762,
	"perplexity": 97.47821765687856
	}
	```