Hishambarakat commited on
Commit
62e8962
·
verified ·
1 Parent(s): 98fe18d

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +32 -21
README.md CHANGED
@@ -122,7 +122,7 @@ The dataset was produced through a structured pipeline:
122
  * Controlled synthetic generation to expand coverage while keeping the same voice
123
  * A dialect rule-set (positive/negative constraints) to:
124
 
125
- * encourage Bahraini lexical markers (e.g., وايد، جذي، هني، شلون، عقبها/بعدها، ما ضبط)
126
  * discourage MSA scaffolding and overly formal connectors
127
  * keep responses short and practical
128
  * Template correctness via the ALLaM chat template, with EOS enforcement
@@ -146,26 +146,37 @@ Data was formatted using ALLaM’s chat template:
146
 
147
  Base configuration used during the run:
148
 
149
- * **Max sequence length:** 2048
150
- * **Optimizer:** `adamw_torch`
151
- * **LR:** 2e-5
152
- * **Scheduler:** cosine
153
- * **Warmup:** 0.1 of optimizer steps (computed as `warmup_steps`)
154
- * **Weight decay:** 0.01
155
- * **Max grad norm:** 1.0
156
- * **Batching:** `per_device_train_batch_size=4`, `gradient_accumulation_steps=16`
157
- * **Epochs:** 4
158
- * **Packing:** False
159
- * **Seed:** 42
160
- * **Precision:** fp16 on T4; bf16 on Ampere+
161
- * **Attention impl:** eager
162
- * **Gradient checkpointing:** enabled (`use_reentrant=False`)
163
- * **LoRA:**
164
-
165
- * r=16
166
- * alpha=32
167
- * dropout=0.05
168
- * target modules: `q_proj,k_proj,v_proj,o_proj,gate_proj,up_proj,down_proj`
 
 
 
 
 
 
 
 
 
 
 
169
 
170
  ### Notes on Tokenizer / Special Tokens
171
 
 
122
  * Controlled synthetic generation to expand coverage while keeping the same voice
123
  * A dialect rule-set (positive/negative constraints) to:
124
 
125
+ * encourage Bahraini lexical markers (e.g., وايد، جذي، هني، شلون، عقبها/بعدها)
126
  * discourage MSA scaffolding and overly formal connectors
127
  * keep responses short and practical
128
  * Template correctness via the ALLaM chat template, with EOS enforcement
 
146
 
147
  Base configuration used during the run:
148
 
149
+ ```yaml
150
+ max_seq_length: 2048
151
+ optimizer: adamw_torch
152
+ learning_rate: 2e-5
153
+ lr_scheduler: cosine
154
+ warmup_ratio: 0.1
155
+ weight_decay: 0.01
156
+ max_grad_norm: 1.0
157
+ per_device_train_batch_size: 4
158
+ gradient_accumulation_steps: 16
159
+ num_train_epochs: 4
160
+ packing: false
161
+ seed: 42
162
+ precision: fp16 (T4) / bf16 (Ampere+)
163
+ attention_implementation: eager
164
+ gradient_checkpointing: true
165
+ gradient_checkpointing_kwargs:
166
+ use_reentrant: false
167
+ lora:
168
+ r: 16
169
+ alpha: 32
170
+ dropout: 0.05
171
+ target_modules:
172
+ - q_proj
173
+ - k_proj
174
+ - v_proj
175
+ - o_proj
176
+ - gate_proj
177
+ - up_proj
178
+ - down_proj
179
+ ```
180
 
181
  ### Notes on Tokenizer / Special Tokens
182