trollek commited on
Commit
c4d024d
·
verified ·
1 Parent(s): 27daf62

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +75 -3
README.md CHANGED
@@ -1,3 +1,75 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: h2oai/h2o-danube2-1.8b-base
4
+ datasets:
5
+ - TIGER-Lab/MathInstruct
6
+ language:
7
+ - en
8
+ library_name: transformers
9
+ tags:
10
+ - llama-factory
11
+ - unsloth
12
+ ---
13
+ # h2o-danube2 with ChatML template
14
+
15
+ This model was first fine-tuned with [BAdam](https://arxiv.org/abs/2404.02827 "BAdam: A Memory Efficient Full Parameter Optimization Method for Large Language Models") on [TIGER-Lab/MathInstruct](https://huggingface.co/datasets/TIGER-Lab/MathInstruct) using LLama-Factory.
16
+
17
+ ## Template
18
+
19
+ ```jinja
20
+ <|im_start|>system
21
+ You are a helpful assistant specialised in mathematics.<|im_end|>
22
+ <|im_start|>user
23
+ {{instruction}}<|im_end|>
24
+ <|im_start|>assistant
25
+ {{response}}<|im_end|>
26
+ ```
27
+
28
+ ## BAdam config
29
+
30
+ ```yaml
31
+ ### model
32
+ model_name_or_path: danube2-base-chatml
33
+
34
+ ### method
35
+ stage: sft
36
+ do_train: true
37
+ finetuning_type: full
38
+ use_badam: true
39
+ badam_switch_mode: ascending
40
+ badam_switch_interval: 50
41
+ badam_verbose: 1
42
+ badam_start_block: 7
43
+ seed: 5772
44
+
45
+ ### dataset
46
+ dataset: mathinstruct
47
+ template: ninja_chatml
48
+ cutoff_len: 8192
49
+ overwrite_cache: false
50
+ preprocessing_num_workers: 12
51
+
52
+ ### output
53
+ output_dir: mathinstruct-chatml-badam
54
+ logging_steps: 5
55
+ save_steps: 1
56
+ save_strategy: epoch
57
+ plot_loss: true
58
+ overwrite_output_dir: false
59
+
60
+ ### train
61
+ per_device_train_batch_size: 4
62
+ gradient_accumulation_steps: 4
63
+ learning_rate: 0.000005
64
+ num_train_epochs: 1
65
+ lr_scheduler_type: cosine
66
+ warmup_ratio: 0.01
67
+ pure_bf16: true
68
+ flash_attn: fa2
69
+
70
+ ### eval
71
+ val_size: 0.01
72
+ per_device_eval_batch_size: 1
73
+ eval_strategy: steps
74
+ eval_steps: 1000
75
+ ```