bilalzafar commited on
Commit
2cda2db
·
verified ·
1 Parent(s): d1651dd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -15,12 +15,12 @@ license: mit
15
  ---
16
 
17
  ## Preprocessing & class imbalance
18
- Sentences were **lowercased** (no stemming/lemmatization) and tokenized with the base tokenizer from [`bilalzafar/cb-bert-mlm`](https://huggingface.co/bilalzafar/cb-bert-mlm) using **max\_length=320** with truncation and **dynamic padding** via `DataCollatorWithPadding`. To address imbalance, training used **Focal Loss (γ=1.0)** with **class weights** computed from the *train* split (`class_weight="balanced"`) applied in the loss, plus a **WeightedRandomSampler** with √(inverse-frequency) **per-sample weights**.
19
 
20
  ---
21
 
22
  ## Training procedure
23
- Training used **[`bilalzafar/cb-bert-mlm`](https://huggingface.co/bilalzafar/cb-bert-mlm)** as the base, with a 3-label **`AutoModelForSequenceClassification`** head. Optimization was **AdamW** (HF Trainer) with **learning rate 2e-5**, **batch size 16** (train/eval), and up to **8 epochs** with **early stopping (patience=2)**—best epoch \~**6**. A **warmup ratio of 0.06**, **weight decay 0.01**, and **fp16** precision were applied. Runs were seeded (**42**) and executed on **Google Colab (T4)**.
24
 
25
  ---
26
 
 
15
  ---
16
 
17
  ## Preprocessing & class imbalance
18
+ Sentences were **lowercased** (no stemming/lemmatization) and tokenized with the base tokenizer from [`bilalzafar/cb-bert-mlm`](https://huggingface.co/bilalzafar/cb-bert-mlm) using **max\_length=320** with truncation and **dynamic padding** via `DataCollatorWithPadding`. To address imbalance, training used *Focal Loss (γ=1.0)* with **class weights** computed from the *train* split (`class_weight="balanced"`) applied in the loss, plus a *WeightedRandomSampler* with √(inverse-frequency) *per-sample weights*.
19
 
20
  ---
21
 
22
  ## Training procedure
23
+ Training used **[`bilalzafar/cb-bert-mlm`](https://huggingface.co/bilalzafar/cb-bert-mlm)** as the base, with a 3-label `AutoModelForSequenceClassification` head. Optimization was *AdamW* (HF Trainer) with *learning rate 2e-5*, *batch size 16* (train/eval), and up to *8 epochs* with early stopping (patience=2)*—best epoch \~*6*. A *warmup ratio of 0.06*, *weight decay 0.01*, and *fp16* precision were applied. Runs were seeded (*42*) and executed on *Google Colab (T4)*.
24
 
25
  ---
26