llama3-conciser / README.md

Update README.md

13f2089 verified over 1 year ago

6.19 kB

	---
	license: other
	library_name: peft
	tags:
	- axolotl
	- generated_from_trainer
	base_model: NousResearch/Meta-Llama-3-8B
	model-index:
	- name: llama3-conciser
	results: []
	pipeline_tag: text2text-generation
	datasets:
	- chrislee973/llama3-conciser-dataset
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->
	[<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
	<details><summary>See axolotl config</summary>

	axolotl version: `0.4.0`
	```yaml
	###
	# Model Configuration: LLaMA-3 8B
	###

	# Copied from most recent modal llm-finetuning repo

	base_model: NousResearch/Meta-Llama-3-8B
	sequence_len: 4096

	# base model weight quantization
	load_in_8bit: true

	# attention implementation
	flash_attention: true

	# finetuned adapter config
	adapter: lora
	lora_model_dir:
	lora_r: 16
	lora_alpha: 32
	lora_dropout: 0.05
	lora_target_linear: true
	lora_fan_in_fan_out:
	lora_modules_to_save: # required when adding new tokens to LLaMA/Mistral
	- embed_tokens
	- lm_head
	# for details, see https://github.com/huggingface/peft/issues/334#issuecomment-1561727994

	###
	# Dataset Configuration: sqlqa
	###

	datasets:
	# This will be the path used for the data when it is saved to the Volume in the cloud.
	- path: conciser_dataset_50.jsonl
	ds_type: json
	type:
	# JSONL file contains question, context, answer fields per line.
	# This gets mapped to instruction, input, output axolotl tags.
	field_instruction: instruction
	field_input: text
	field_output: cleaned_text
	# Format is used by axolotl to generate the prompt.
	format: \|-
	[INST] {instruction}
	{input}
	[/INST]

	# dataset formatting config
	tokens: # add new control tokens from the dataset to the model
	- "[INST]"
	- " [/INST]"
	- "[RES]"
	- " [/RES]"

	special_tokens:
	pad_token: <\|end_of_text\|>

	val_set_size: 0.05

	###
	# Training Configuration
	###

	# random seed for better reproducibility
	seed: 117

	# optimizer config
	optimizer: adamw_bnb_8bit
	# optimizer: adamw_torch

	learning_rate: 0.0001
	lr_scheduler: cosine
	num_epochs: 4
	micro_batch_size: 2
	gradient_accumulation_steps: 1
	warmup_steps: 10

	# axolotl saving config
	dataset_prepared_path: last_run_prepared
	output_dir: ./lora-out

	# logging and eval config
	logging_steps: 1
	eval_steps: 0.05

	# training performance optimization config
	bf16: auto
	tf32: false
	gradient_checkpointing: true

	###
	# Miscellaneous Configuration
	###

	# when true, prevents over-writing the config from the CLI
	strict: false

	# "Don't mess with this, it's here for accelerate and torchrun" -- axolotl docs
	local_rank:

	# wandb logging config
	wandb_project: llama3-conciser
	wandb_name: llama3-4epochs-2batchsize-pushtohub

	hub_model_id: chrislee973/llama3-conciser

	```

	</details><br>

	# llama3-conciser

	This model is a fine-tuned version of [NousResearch/Meta-Llama-3-8B](https://huggingface.co/NousResearch/Meta-Llama-3-8B) on my [conciser dataset](https://huggingface.co/datasets/chrislee973/llama3-conciser-dataset).

	## Uses

	### Text Revision task
	Given an input of a paragraph of text from a transcript, it lightly touches up and edits the sentences and phrases,
	improving the flow and readability of the text while maintaining the speaker's original intention.

	For example, given the following input text:
	```
	I think I sort of deep down believed in what we were doing, and I did some analysis. I was like, okay, well, what would I go do if I wasn't doing this? It's like, well, I really like building things, and I like helping people communicate, and I like understanding what's going on with people and the dynamics between people. So I think if I sold this company, I'd just go build another company like this. And I kind of like the one I have.
	```
	the revised output text is:
	```
	I believed deep down in what we were doing. I did some analysis. What would I go do if I wasn’t doing this? I really like building things, helping people communicate, understanding what’s going on with people and the dynamics between them. If I sold this company, I’d just go build another one like this. I kind of like the one I have.
	```
	There are still some rough edges around the model as a result of my dataset being so tiny (just 50 examples). I hope to smooth these imperfections out and close the quality gap by adding many more examples to the dataset.

	## Usage
	TODO: add sample inference code

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0001
	- train_batch_size: 2
	- eval_batch_size: 2
	- seed: 117
	- distributed_type: multi-GPU
	- num_devices: 2
	- total_train_batch_size: 4
	- total_eval_batch_size: 4
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_steps: 10
	- num_epochs: 4

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|
	\| 0.8738 \| 0.0833 \| 1 \| 0.7897 \|
	\| 1.2209 \| 0.25 \| 3 \| 0.7878 \|
	\| 0.8204 \| 0.5 \| 6 \| 0.6336 \|
	\| 0.6652 \| 0.75 \| 9 \| 0.5303 \|
	\| 0.4086 \| 1.0 \| 12 \| 0.4836 \|
	\| 0.3365 \| 1.25 \| 15 \| 0.4733 \|
	\| 0.3445 \| 1.5 \| 18 \| 0.5132 \|
	\| 0.3641 \| 1.75 \| 21 \| 0.5146 \|
	\| 0.1941 \| 2.0 \| 24 \| 0.4939 \|
	\| 0.1814 \| 2.25 \| 27 \| 0.4863 \|
	\| 0.1342 \| 2.5 \| 30 \| 0.4969 \|
	\| 0.1978 \| 2.75 \| 33 \| 0.5141 \|
	\| 0.1589 \| 3.0 \| 36 \| 0.5222 \|
	\| 0.1184 \| 3.25 \| 39 \| 0.5258 \|
	\| 0.1513 \| 3.5 \| 42 \| 0.5182 \|
	\| 0.1172 \| 3.75 \| 45 \| 0.5155 \|
	\| 0.0607 \| 4.0 \| 48 \| 0.5174 \|


	### Framework versions

	- PEFT 0.10.0
	- Transformers 4.40.2
	- Pytorch 2.2.2+cu121
	- Datasets 2.19.1
	- Tokenizers 0.19.1