hardlyworking commited on
Commit
1929559
·
verified ·
1 Parent(s): 2a92fe4

End of training

Browse files
Files changed (1) hide show
  1. README.md +154 -0
README.md ADDED
@@ -0,0 +1,154 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ license: apache-2.0
4
+ base_model: GreenerPastures/Basically-Human-4B
5
+ tags:
6
+ - axolotl
7
+ - generated_from_trainer
8
+ datasets:
9
+ - jeiku/Writing
10
+ - ResplendentAI/Sissification_Hypno_1k
11
+ - ResplendentAI/Synthetic_Soul_1k
12
+ model-index:
13
+ - name: AGI
14
+ results: []
15
+ ---
16
+
17
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
18
+ should probably proofread and complete it, then remove this comment. -->
19
+
20
+ [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
21
+ <details><summary>See axolotl config</summary>
22
+
23
+ axolotl version: `0.10.0.dev0`
24
+ ```yaml
25
+ base_model: GreenerPastures/Basically-Human-4B
26
+
27
+ load_in_8bit: false
28
+ load_in_4bit: false
29
+ strict: false
30
+
31
+ datasets:
32
+ - path: jeiku/Writing
33
+ type: completion
34
+ field: text
35
+ - path: ResplendentAI/Sissification_Hypno_1k
36
+ type: alpaca
37
+ - path: ResplendentAI/Synthetic_Soul_1k
38
+ type: alpaca
39
+
40
+ chat_template: qwen3
41
+
42
+ val_set_size: 0
43
+ output_dir: ./outputs/out
44
+ dataset_prepared_path: last_run_prepared
45
+ shuffle_merged_datasets: true
46
+
47
+ hub_model_id: hardlyworking/AGI
48
+ hub_strategy: "all_checkpoints"
49
+ push_dataset_to_hub:
50
+ hf_use_auth_token: true
51
+
52
+ plugins:
53
+ - axolotl.integrations.liger.LigerPlugin
54
+ - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin
55
+ liger_rope: true
56
+ liger_rms_norm: true
57
+ liger_layer_norm: true
58
+ liger_glu_activation: true
59
+ liger_fused_linear_cross_entropy: false
60
+ cut_cross_entropy: true
61
+
62
+ sequence_len: 8192
63
+ sample_packing: true
64
+ eval_sample_packing: true
65
+ pad_to_sequence_len: true
66
+
67
+ wandb_project: Qwen4B
68
+ wandb_entity:
69
+ wandb_watch:
70
+ wandb_name: Qwen4B
71
+ wandb_log_model:
72
+
73
+ evals_per_epoch: 2
74
+ eval_table_size:
75
+ eval_max_new_tokens: 128
76
+
77
+ gradient_accumulation_steps: 1
78
+ micro_batch_size: 1
79
+ num_epochs: 4
80
+ optimizer: adamw_bnb_8bit
81
+ lr_scheduler: cosine
82
+ learning_rate: 1e-5
83
+
84
+ train_on_inputs: false
85
+ group_by_length: false
86
+ bf16: auto
87
+ fp16:
88
+ tf32: false
89
+
90
+ gradient_checkpointing: offload
91
+ gradient_checkpointing_kwargs:
92
+ use_reentrant: false
93
+ early_stopping_patience:
94
+ resume_from_checkpoint:
95
+ local_rank:
96
+ logging_steps: 1
97
+ xformers_attention:
98
+ flash_attention: true
99
+ s2_attention:
100
+
101
+ deepspeed:
102
+
103
+ warmup_ratio:
104
+ saves_per_epoch: 1
105
+ debug:
106
+ weight_decay: 0.01
107
+ fsdp:
108
+ fsdp_config:
109
+ special_tokens:
110
+ pad_token:
111
+ ```
112
+
113
+ </details><br>
114
+
115
+ # AGI
116
+
117
+ This model is a fine-tuned version of [GreenerPastures/Basically-Human-4B](https://huggingface.co/GreenerPastures/Basically-Human-4B) on the jeiku/Writing, the ResplendentAI/Sissification_Hypno_1k and the ResplendentAI/Synthetic_Soul_1k datasets.
118
+
119
+ ## Model description
120
+
121
+ More information needed
122
+
123
+ ## Intended uses & limitations
124
+
125
+ More information needed
126
+
127
+ ## Training and evaluation data
128
+
129
+ More information needed
130
+
131
+ ## Training procedure
132
+
133
+ ### Training hyperparameters
134
+
135
+ The following hyperparameters were used during training:
136
+ - learning_rate: 1e-05
137
+ - train_batch_size: 1
138
+ - eval_batch_size: 1
139
+ - seed: 42
140
+ - optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
141
+ - lr_scheduler_type: cosine
142
+ - lr_scheduler_warmup_steps: 4
143
+ - num_epochs: 4.0
144
+
145
+ ### Training results
146
+
147
+
148
+
149
+ ### Framework versions
150
+
151
+ - Transformers 4.51.3
152
+ - Pytorch 2.6.0+cu124
153
+ - Datasets 3.5.1
154
+ - Tokenizers 0.21.1