File size: 6,191 Bytes
764c440 e36e298 7c6301c 764c440 7c6301c 764c440 7c6301c 764c440 7c6301c 764c440 7c6301c 764c440 7c6301c 764c440 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 |
---
license: other
library_name: peft
tags:
- axolotl
- generated_from_trainer
base_model: NousResearch/Meta-Llama-3-8B
model-index:
- name: llama3-conciser
results: []
pipeline_tag: text2text-generation
datasets:
- chrislee973/llama3-conciser-dataset
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
[<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
<details><summary>See axolotl config</summary>
axolotl version: `0.4.0`
```yaml
###
# Model Configuration: LLaMA-3 8B
###
# Copied from most recent modal llm-finetuning repo
base_model: NousResearch/Meta-Llama-3-8B
sequence_len: 4096
# base model weight quantization
load_in_8bit: true
# attention implementation
flash_attention: true
# finetuned adapter config
adapter: lora
lora_model_dir:
lora_r: 16
lora_alpha: 32
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:
lora_modules_to_save: # required when adding new tokens to LLaMA/Mistral
- embed_tokens
- lm_head
# for details, see https://github.com/huggingface/peft/issues/334#issuecomment-1561727994
###
# Dataset Configuration: sqlqa
###
datasets:
# This will be the path used for the data when it is saved to the Volume in the cloud.
- path: conciser_dataset_50.jsonl
ds_type: json
type:
# JSONL file contains question, context, answer fields per line.
# This gets mapped to instruction, input, output axolotl tags.
field_instruction: instruction
field_input: text
field_output: cleaned_text
# Format is used by axolotl to generate the prompt.
format: |-
[INST] {instruction}
{input}
[/INST]
# dataset formatting config
tokens: # add new control tokens from the dataset to the model
- "[INST]"
- " [/INST]"
- "[RES]"
- " [/RES]"
special_tokens:
pad_token: <|end_of_text|>
val_set_size: 0.05
###
# Training Configuration
###
# random seed for better reproducibility
seed: 117
# optimizer config
optimizer: adamw_bnb_8bit
# optimizer: adamw_torch
learning_rate: 0.0001
lr_scheduler: cosine
num_epochs: 4
micro_batch_size: 2
gradient_accumulation_steps: 1
warmup_steps: 10
# axolotl saving config
dataset_prepared_path: last_run_prepared
output_dir: ./lora-out
# logging and eval config
logging_steps: 1
eval_steps: 0.05
# training performance optimization config
bf16: auto
tf32: false
gradient_checkpointing: true
###
# Miscellaneous Configuration
###
# when true, prevents over-writing the config from the CLI
strict: false
# "Don't mess with this, it's here for accelerate and torchrun" -- axolotl docs
local_rank:
# wandb logging config
wandb_project: llama3-conciser
wandb_name: llama3-4epochs-2batchsize-pushtohub
hub_model_id: chrislee973/llama3-conciser
```
</details><br>
# llama3-conciser
This model is a fine-tuned version of [NousResearch/Meta-Llama-3-8B](https://huggingface.co/NousResearch/Meta-Llama-3-8B) on my [conciser dataset](https://huggingface.co/datasets/chrislee973/llama3-conciser-dataset).
## Uses
### Text Revision task
Given an input of a paragraph of text from a transcript, it lightly touches up and edits the sentences and phrases,
improving the flow and readability of the text while maintaining the speaker's original intention.
For example, given the following input text:
```
I think I sort of deep down believed in what we were doing, and I did some analysis. I was like, okay, well, what would I go do if I wasn't doing this? It's like, well, I really like building things, and I like helping people communicate, and I like understanding what's going on with people and the dynamics between people. So I think if I sold this company, I'd just go build another company like this. And I kind of like the one I have.
```
the revised output text is:
```
I believed deep down in what we were doing. I did some analysis. What would I go do if I wasn’t doing this? I really like building things, helping people communicate, understanding what’s going on with people and the dynamics between them. If I sold this company, I’d just go build another one like this. I kind of like the one I have.
```
There are still some rough edges around the model as a result of my dataset being so tiny (just 50 examples). I hope to smooth these imperfections out and close the quality gap by adding many more examples to the dataset.
## Usage
TODO: add sample inference code
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 2
- eval_batch_size: 2
- seed: 117
- distributed_type: multi-GPU
- num_devices: 2
- total_train_batch_size: 4
- total_eval_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 10
- num_epochs: 4
### Training results
| Training Loss | Epoch | Step | Validation Loss |
|:-------------:|:------:|:----:|:---------------:|
| 0.8738 | 0.0833 | 1 | 0.7897 |
| 1.2209 | 0.25 | 3 | 0.7878 |
| 0.8204 | 0.5 | 6 | 0.6336 |
| 0.6652 | 0.75 | 9 | 0.5303 |
| 0.4086 | 1.0 | 12 | 0.4836 |
| 0.3365 | 1.25 | 15 | 0.4733 |
| 0.3445 | 1.5 | 18 | 0.5132 |
| 0.3641 | 1.75 | 21 | 0.5146 |
| 0.1941 | 2.0 | 24 | 0.4939 |
| 0.1814 | 2.25 | 27 | 0.4863 |
| 0.1342 | 2.5 | 30 | 0.4969 |
| 0.1978 | 2.75 | 33 | 0.5141 |
| 0.1589 | 3.0 | 36 | 0.5222 |
| 0.1184 | 3.25 | 39 | 0.5258 |
| 0.1513 | 3.5 | 42 | 0.5182 |
| 0.1172 | 3.75 | 45 | 0.5155 |
| 0.0607 | 4.0 | 48 | 0.5174 |
### Framework versions
- PEFT 0.10.0
- Transformers 4.40.2
- Pytorch 2.2.2+cu121
- Datasets 2.19.1
- Tokenizers 0.19.1 |