justinj92 commited on
Commit
c4a6f56
·
verified ·
1 Parent(s): a8302e3

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +36 -159
README.md CHANGED
@@ -1,174 +1,51 @@
1
  ---
2
- library_name: peft
3
- license: apache-2.0
 
4
  base_model: Qwen/Qwen3-0.6B
 
 
5
  tags:
6
- - axolotl
7
- - generated_from_trainer
8
- datasets:
9
- - open-r1/Mixture-of-Thoughts
10
- - NousResearch/Hermes-3-Dataset
11
- model-index:
12
- - name: Delphermes-0.6B-R1-LORA
13
- results: []
14
  ---
15
 
16
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
17
- should probably proofread and complete it, then remove this comment. -->
18
-
19
- [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
20
- <details><summary>See axolotl config</summary>
21
-
22
- axolotl version: `0.11.0`
23
- ```yaml
24
- # ==== MODEL ====
25
- base_model: Qwen/Qwen3-0.6B
26
- hub_model_id: justinj92/Delphermes-0.6B-R1-LORA
27
- strict: false
28
- chat_template: qwen3
29
-
30
- # ==== DATASETS (unchanged) ====
31
- datasets:
32
- - path: open-r1/Mixture-of-Thoughts
33
- name: all
34
- split: train
35
- type: chat_template
36
- field_messages: messages
37
- - path: NousResearch/Hermes-3-Dataset
38
- split: train
39
- type: chat_template
40
- field_messages: conversations
41
- message_property_mappings:
42
- role: from
43
- content: value
44
-
45
- val_set_size: 0.05
46
- output_dir: ./outputs/Delphermes-0.6B-R1-LORA
47
- dataset_prepared_path: last_run_prepared
48
-
49
- # ==== LENGTH / PACKING ====
50
- sequence_len: 8192
51
- sample_packing: true
52
- eval_sample_packing: true
53
- pad_to_sequence_len: true
54
- remove_unused_columns: true
55
-
56
- # ==== LoRA ====
57
- adapter: lora
58
- lora_r: 16
59
- lora_alpha: 64
60
- lora_dropout: 0.1
61
- lora_target_modules:
62
- - q_proj
63
- - k_proj
64
- - v_proj
65
- - o_proj
66
- - gate_proj
67
- - up_proj
68
- - down_proj
69
-
70
- # ==== OPTIMIZER & SCHEDULE ====
71
- optimizer: adamw_torch_fused
72
- learning_rate: 0.0002 # Aggressive scenario (4× tokens, sqrt scale). (Baseline: 0.0002; Moderate: 0.00028)
73
- lr_scheduler: cosine
74
- weight_decay: 0.0
75
- max_grad_norm: 1.0
76
- warmup_steps: 10 # Keep numeric; absolute steps per epoch shrink -> relative warmup % decreases; can raise to 30 if large batch.
77
-
78
- num_epochs: 3
79
-
80
- # ==== BATCHING (Aggressive) ====
81
- micro_batch_size: 4 # Change to 2 / 4 / 12 / 16 per scenario table
82
- gradient_accumulation_steps: 2 # Keep 1; raise only if chasing larger effective batch without OOM headroom.
83
-
84
- # ==== PRECISION / PERF ====
85
- bf16: true
86
- tf32: true
87
- flash_attention: true
88
- gradient_checkpointing: true
89
- gradient_checkpointing_kwargs:
90
- use_reentrant: false
91
-
92
- # Optionally enable if micro_batch_size > 12:
93
- # activation_checkpointing: true # (Axolotl flag if supported) or toggle in DS JSON.
94
-
95
- # ==== LOGGING ====
96
- wandb_project: updesh-ft
97
- logging_steps: 1
98
- evals_per_epoch: 2
99
- saves_per_epoch: 1
100
- save_first_step: true
101
- eval_max_new_tokens: 500
102
-
103
- # ==== DEEPSPEED ====
104
- deepspeed: deepspeed_configs/zero2_b200.json
105
-
106
- # ==== DISTRIBUTED CONTROL ====
107
- fsdp: []
108
- fsdp_config: {}
109
-
110
- # ==== QUANTIZATION (disabled) ====
111
- load_in_4bit: false
112
- load_in_8bit: true
113
-
114
- special_tokens:
115
- ```
116
-
117
- </details><br>
118
-
119
  # Delphermes-0.6B-R1-LORA
120
 
121
- This model is a fine-tuned version of [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B) on the open-r1/Mixture-of-Thoughts and the NousResearch/Hermes-3-Dataset datasets.
122
- It achieves the following results on the evaluation set:
123
- - Loss: 0.8526
124
 
125
- ## Model description
126
 
127
- More information needed
 
 
 
128
 
129
- ## Intended uses & limitations
130
 
131
- More information needed
 
 
132
 
133
- ## Training and evaluation data
134
-
135
- More information needed
136
-
137
- ## Training procedure
138
-
139
- ### Training hyperparameters
140
-
141
- The following hyperparameters were used during training:
142
- - learning_rate: 0.0002
143
- - train_batch_size: 4
144
- - eval_batch_size: 4
145
- - seed: 42
146
- - distributed_type: multi-GPU
147
- - num_devices: 8
148
- - gradient_accumulation_steps: 2
149
- - total_train_batch_size: 64
150
- - total_eval_batch_size: 32
151
- - optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
152
- - lr_scheduler_type: cosine
153
- - lr_scheduler_warmup_steps: 10
154
- - training_steps: 6411
155
-
156
- ### Training results
157
-
158
- | Training Loss | Epoch | Step | Validation Loss |
159
- |:-------------:|:------:|:----:|:---------------:|
160
- | No log | 0 | 0 | 1.0617 |
161
- | 0.8758 | 0.5001 | 1069 | 0.8699 |
162
- | 0.8335 | 1.0 | 2138 | 0.8615 |
163
- | 0.8603 | 1.5001 | 3207 | 0.8571 |
164
- | 0.8178 | 2.0 | 4276 | 0.8541 |
165
- | 0.8527 | 2.5001 | 5345 | 0.8526 |
166
 
 
 
 
 
 
 
 
167
 
168
- ### Framework versions
169
 
170
- - PEFT 0.15.2
171
- - Transformers 4.53.1
172
- - Pytorch 2.7.0+cu128
173
- - Datasets 3.6.0
174
- - Tokenizers 0.21.2
 
1
  ---
2
+ language:
3
+ - ml
4
+ - en
5
  base_model: Qwen/Qwen3-0.6B
6
+ library_name: transformers
7
+ pipeline_tag: text-generation
8
  tags:
9
+ - malayalam
10
+ - text-generation
11
+ - lora
12
+ - merged
13
+ license: apache-2.0
 
 
 
14
  ---
15
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
  # Delphermes-0.6B-R1-LORA
17
 
18
+ This is a merged LoRA model based on Qwen/Qwen3-0.6B, fine-tuned for Malayalam language tasks.
 
 
19
 
20
+ ## Model Details
21
 
22
+ - **Base Model**: Qwen/Qwen3-0.6B
23
+ - **Language**: Malayalam (ml), English (en)
24
+ - **Type**: Merged LoRA model
25
+ - **Library**: transformers
26
 
27
+ ## Usage
28
 
29
+ ```python
30
+ from transformers import AutoTokenizer, AutoModelForCausalLM
31
+ import torch
32
 
33
+ model_name = "justinj92/Delphermes-0.6B-R1-LORA"
34
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
35
+ model = AutoModelForCausalLM.from_pretrained(
36
+ model_name,
37
+ torch_dtype=torch.float16,
38
+ device_map="auto"
39
+ )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
40
 
41
+ # Example usage
42
+ text = "നമസ്കാരം"
43
+ inputs = tokenizer(text, return_tensors="pt")
44
+ outputs = model.generate(**inputs, max_length=100)
45
+ response = tokenizer.decode(outputs[0], skip_special_tokens=True)
46
+ print(response)
47
+ ```
48
 
49
+ ## Training Details
50
 
51
+ This model was created by merging a LoRA adapter trained for Malayalam language understanding and generation.