Model Request
Hey, does anyone here train models or know someone who does? I'm trying to fine-tune the prithivMLmods/Deepthink-Llama-3-8B-Preview safetensors model using data from ErosCoder37/Eros-1.
I do have quite some finetuning experience. Based on your dataset are likely related or part @Enderchef 's team. I recommend you take a look at https://huggingface.co/mradermacher/model_requests/discussions/920 and adopt the axolotl script I posted there.
Because I'm so nice I even adopted the axolotl configuration for you. Just rent two GPUs with at least 24 GB GPU memory like 2x 4090 from RunPod or a simular provider and let it train for around 5 hours untill it is done.
base_model: prithivMLmods/Deepthink-Llama-3-8B-Preview
model_type: LlamaForCausalLM
tokenizer_type: AutoTokenizer
load_in_8bit: false
load_in_4bit: false
datasets:
- path: ErosCoder37/Eros-1
chat_template: llama3
type:
system_prompt: ""
field_system: system
field_instruction: input
field_output: output
dataset_prepared_path:
val_set_size: 0.05
output_dir: ./outputs/lora-out
adapter: lora
lora_model_dir:
sequence_len: 4096
sample_packing: false
pad_to_sequence_len: true
lora_r: 32
lora_alpha: 64
lora_dropout: 0.05
lora_target_linear: true
gradient_accumulation_steps: 4
micro_batch_size: 2
num_epochs: 4
optimizer: adamw_torch_fused
lr_scheduler: cosine
learning_rate: 0.0001
bf16: auto
tf32: false
gradient_checkpointing: true
gradient_checkpointing_kwargs:
use_reentrant: true
resume_from_checkpoint:
logging_steps: 1
flash_attention: true
warmup_steps: 10
evals_per_epoch: 4
saves_per_epoch: 1
weight_decay: 0.0
fsdp:
- full_shard
- auto_wrap
fsdp_config:
fsdp_limit_all_gathers: true
fsdp_sync_module_states: true
fsdp_offload_params: true
fsdp_use_orig_params: false
fsdp_cpu_ram_efficient_loading: true
fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
fsdp_transformer_layer_cls_to_wrap: LlamaDecoderLayer
fsdp_state_dict_type: FULL_STATE_DICT
fsdp_sharding_strategy: FULL_SHARD
special_tokens:
pad_token: <|end_of_text|>
It might be cheaper to rent a single A100 80G (maybe even a much cheaper 48 GB GPU will do) in which case just delete the fsdp and fsdp_config sections and add the following to speed up sing-GPU training:
lora_mlp_kernel: true
lora_qkv_kernel: true
lora_o_kernel: true
Wait - Am I able to train it on a system prompt based on the system_prompt: "" instead of a dataset?
Wait - Am I able to train it on a system prompt based on the system_prompt: "" instead of a dataset?
Usually your dataset would contain a system prompt, prompt and response for every row. Because your dataset lacks a system prompt, I specified inside the axolotl training to not use one as well. It is still using your dataset for training but will not use any system prompt. Instead of an empty string you could also hardcode any system prompt that fits your finetune so your finetune gets only activated on similar system prompts.