Hishambarakat
/

Bahraini_Dialect_LLM

Text Generation

text-generation-inference

Model card Files Files and versions

Hishambarakat commited on 17 days ago

Commit

62e8962

·

verified ·

1 Parent(s): 98fe18d

Upload README.md with huggingface_hub

Files changed (1) hide show

README.md +32 -21

README.md CHANGED Viewed

@@ -122,7 +122,7 @@ The dataset was produced through a structured pipeline:
 * Controlled synthetic generation to expand coverage while keeping the same voice
 * A dialect rule-set (positive/negative constraints) to:
-  * encourage Bahraini lexical markers (e.g., وايد، جذي، هني، شلون، عقبها/بعدها، ما ضبط)
   * discourage MSA scaffolding and overly formal connectors
   * keep responses short and practical
 * Template correctness via the ALLaM chat template, with EOS enforcement
@@ -146,26 +146,37 @@ Data was formatted using ALLaM’s chat template:
 Base configuration used during the run:
-* **Max sequence length:** 2048
-* **Optimizer:** `adamw_torch`
-* **LR:** 2e-5
-* **Scheduler:** cosine
-* **Warmup:** 0.1 of optimizer steps (computed as `warmup_steps`)
-* **Weight decay:** 0.01
-* **Max grad norm:** 1.0
-* **Batching:** `per_device_train_batch_size=4`, `gradient_accumulation_steps=16`
-* **Epochs:** 4
-* **Packing:** False
-* **Seed:** 42
-* **Precision:** fp16 on T4; bf16 on Ampere+
-* **Attention impl:** eager
-* **Gradient checkpointing:** enabled (`use_reentrant=False`)
-* **LoRA:**
-  * r=16
-  * alpha=32
-  * dropout=0.05
-  * target modules: `q_proj,k_proj,v_proj,o_proj,gate_proj,up_proj,down_proj`
 ### Notes on Tokenizer / Special Tokens

 * Controlled synthetic generation to expand coverage while keeping the same voice
 * A dialect rule-set (positive/negative constraints) to:
+  * encourage Bahraini lexical markers (e.g., وايد، جذي، هني، شلون، عقبها/بعدها)
   * discourage MSA scaffolding and overly formal connectors
   * keep responses short and practical
 * Template correctness via the ALLaM chat template, with EOS enforcement
 Base configuration used during the run:
+```yaml
+max_seq_length: 2048
+optimizer: adamw_torch
+learning_rate: 2e-5
+lr_scheduler: cosine
+warmup_ratio: 0.1
+weight_decay: 0.01
+max_grad_norm: 1.0
+per_device_train_batch_size: 4
+gradient_accumulation_steps: 16
+num_train_epochs: 4
+packing: false
+seed: 42
+precision: fp16 (T4) / bf16 (Ampere+)
+attention_implementation: eager
+gradient_checkpointing: true
+gradient_checkpointing_kwargs:
+  use_reentrant: false
+lora:
+  r: 16
+  alpha: 32
+  dropout: 0.05
+  target_modules:
+    - q_proj
+    - k_proj
+    - v_proj
+    - o_proj
+    - gate_proj
+    - up_proj
+    - down_proj
+```
 ### Notes on Tokenizer / Special Tokens