gofilipa commited on
Commit
f71dad9
·
verified ·
1 Parent(s): 647c053

Adding specs for training

Browse files
Files changed (1) hide show
  1. README.md +57 -3
README.md CHANGED
@@ -1,3 +1,57 @@
1
- ---
2
- license: gpl-3.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: gpl-3.0
3
+ base_model:
4
+ - openai-community/gpt2
5
+ ---
6
+
7
+ Fine-tuning specs:
8
+
9
+ ```python
10
+ training_params = SFTConfig(
11
+ output_dir="checkpoints",
12
+ per_device_train_batch_size=1,
13
+ per_device_eval_batch_size=1,
14
+ gradient_accumulation_steps=2,
15
+ num_train_epochs=3,
16
+ learning_rate=1e-4, # lowered from 2e-4 to 1e-4
17
+ weight_decay=0.001,
18
+ dataset_text_field="text",
19
+ report_to="none",
20
+ bf16=False,
21
+ fp16=False,
22
+ dataloader_pin_memory=False,
23
+ remove_unused_columns=False,
24
+ max_length=512,
25
+ gradient_checkpointing=True,
26
+ dataloader_num_workers=0,
27
+ save_strategy="epoch",
28
+ logging_steps=100,
29
+ average_tokens_across_devices=False # Fix for single device training
30
+ # Remove loss_type parameter to avoid the warning
31
+ # The trainer will automatically use ForCausalLMLoss which is correct
32
+ )
33
+
34
+ # Configure model for gradient checkpointing compatibility
35
+ model.config.use_cache = False
36
+
37
+ trainer = SFTTrainer(
38
+ model=model,
39
+ train_dataset=ds['train'],
40
+ processing_class=tokenizer,
41
+ args=training_params
42
+ )
43
+ ```
44
+
45
+ Training outputs
46
+ ```python
47
+ TrainOutput(
48
+ global_step=16773,
49
+ training_loss=2.056998251788356,
50
+ metrics={
51
+ 'train_runtime': 3255.1858,
52
+ 'train_samples_per_second': 10.305,
53
+ 'train_steps_per_second': 5.153,
54
+ 'total_flos': 164188359936000.0,
55
+ 'train_loss': 2.056998251788356})
56
+
57
+ ```