hellosindh commited on 12 days ago

Commit

e0f6e92

verified ·

1 Parent(s): e99b148

Upload folder using huggingface_hub

Browse files

Files changed (17) hide show

README.md +14 -250
checkpoint-3924/config.json +28 -0
checkpoint-3924/model.safetensors +3 -0
checkpoint-3924/optimizer.pt +3 -0
checkpoint-3924/rng_state.pth +3 -0
checkpoint-3924/scheduler.pt +3 -0
checkpoint-3924/trainer_state.json +332 -0
checkpoint-3924/training_args.bin +3 -0
checkpoint-5886/config.json +28 -0
checkpoint-5886/model.safetensors +3 -0
checkpoint-5886/optimizer.pt +3 -0
checkpoint-5886/rng_state.pth +3 -0
checkpoint-5886/scheduler.pt +3 -0
checkpoint-5886/trainer_state.json +473 -0
checkpoint-5886/training_args.bin +3 -0
model.safetensors +1 -1
training_args.bin +1 -1

README.md CHANGED Viewed

@@ -7,272 +7,36 @@ tags:
 - bert
 - masked-language-modeling
 - from-scratch
-- nlp
-model-index:
-- name: sindhi-bert-base
-  results:
-  - task:
-      type: fill-mask
-      name: Masked Language Modeling
-    metrics:
-    - type: perplexity
-      value: 28.46
-      name: Perplexity (Session 3)
 ---
 # Sindhi-BERT-base
-The first BERT-style language model trained **from scratch** on Sindhi text, using a custom Sindhi BPE tokenizer with 32,000 pure Sindhi tokens.
----
 ## Training History
-| Session | Data | Epochs | Perplexity | Fill-Mask Quality | Time |
-|---|---|---|---|---|---|
-| Session 1 | 500K lines | 5 | 78.10 | 50% (5/10) | 301 min |
-| Session 2 | 1.5M lines | 3 | 41.62 | 70% (7/10) | 359 min |
-| **Session 3** | **1.49M lines (589MB clean)** | **2** | **28.46** | **80% (8/10)** | **224 min** |
----
-## Model Details
-| Detail | Value |
-|---|---|
-| Architecture | RoBERTa-base |
-| Vocabulary | 32,000 tokens (pure Sindhi BPE) |
-| Hidden size | 768 |
-| Layers | 12 |
-| Attention heads | 12 |
-| Max length | 512 tokens |
-| Parameters | ~110M |
-| Language | Sindhi (sd) |
-| License | MIT |
----
-## Session 3 Training Details
-| Detail | Value |
-|---|---|
-| Corpus size | 589 MB clean Sindhi text |
-| Total words | ~74 million |
-| Epochs | 2 |
-| Batch size | 64 (effective 256) |
-| Learning rate | 3e-5 |
-| LR scheduler | Cosine decay |
-| Warmup | 5% of total steps |
-| Precision | bf16 (A100) |
-| Gradient clipping | 1.5 |
-| Hardware | H100 GPU |
-| Training time | 224 minutes |
-| Eval loss | 3.348446 |
-| Perplexity | 28.46 |
----
-## Fill-Mask Results — Session 3
-### ✅ Correct Predictions (8/10)
-**1. Language identification**
-```
-Input   : سنڌي [MASK] دنيا جي قديم ٻولين مان ھڪ آھي
-✅ Top 1 : ٻولي (language)  —  40.90%
-   Top 2 : ادب (literature) —   7.86%
-   Top 3 : ٻوليءَ           —   7.20%
-```
-**2. People context**
-```
-Input   : پاڪستان ۾ سنڌي [MASK] گھڻي تعداد ۾ رھن ٿا
-✅ Top 1 : ماڻهو (people)   —  33.47%
-   Top 2 : سنڌي             —   2.65%
-   Top 3 : ٻار (children)   —   2.63%
-```
-**3. City identification**
-```
-Input   : ڪراچي سنڌ جو سڀ کان وڏو [MASK] آھي
-✅ Top 1 : شھر (city)       —  16.72%
-   Top 2 : حصو (part)       —   7.02%
-   Top 3 : ملڪ (country)    —   4.06%
-```
-**4. Direction context**
-```
-Input   : ھو پنھنجي [MASK] ڏانھن ويو
-✅ Top 1 : گهر (home)       —  11.67%
-   Top 2 : ڳوٺ (village)    —   6.63%
-   Top 3 : منزل (destination) — 5.15%
-```
-**5. Poet identification**
-```
-Input   : شاھه لطيف سنڌي [MASK] جو وڏو شاعر آھي
-✅ Top 1 : شاعريءَ (poetry)  —  25.77%
-   Top 2 : ٻوليءَ (language) —  25.76%
-   Top 3 : ادب (literature)  —  13.00%
-```
-**6. History context**
-```
-Input   : سنڌ جي [MASK] ڏاڍي پراڻي آھي
-✅ Top 1 : تاريخ (history)  —  16.04%
-   Top 2 : ٻولي (language)  —   3.88%
-   Top 3 : ڌرتي (land)      —   3.67%
-```
-**7. Grammar word**
-```
-Input   : دنيا [MASK] گھڻي مصروف آھي
-✅ Top 1 : ۾ (in)           —  23.20%
-   Top 2 : کي (to)          —  17.54%
-   Top 3 : جي (of)          —   3.71%
-```
-**8. Education context (close)**
-```
-Input   : استاد شاگردن کي [MASK] سيکاري ٿو
-⚠️ Top 1 : استاد (teacher — repeats subject) — 15.87%
-✅ Top 2 : تعليم (education)                 — 13.70%
-   Top 3 : سبق (lesson)                      —  6.03%
-```
----
-### ❌ Incorrect Predictions (2/10)
-**9. School context (wrong)**
-```
-Input    : ٻار [MASK] ۾ پڙھن ٿا
-❌ Top 1 : گهر (home)     —  2.46%   ← should be اسڪول (school)
-   Top 2 : َ              —  2.33%   ← diacritic noise
-   Top 3 : اکين (eyes)    —  2.26%
-Expected : اسڪول (school) ← model needs more school context data
-```
-**10. River context (close)**
-```
-Input    : سنڌو [MASK] سنڌ جي سڀيتا جو مرڪز رھيو آھي
-⚠️ Top 1 : سڀيتا (civilization) — 15.54% ← repeats next word
-✅ Top 2 : ندي (river)           —  7.19% ← correct answer
-   Top 3 : ۽ (and)               —  5.82%
-Expected : ندي (river) ← correct but at Top 2
-```
----
-## Progress Across Sessions
-| Sentence | Session 1 | Session 2 | Session 3 |
-|---|---|---|---|
-| سنڌي ___ دنيا جي | ✅ ٻولي 15% | ✅ ٻولي 22% | ✅ ٻولي **40.90%** |
-| پاڪستان ۾ سنڌي ___ | ❌ | ✅ ماڻهو 49% | ✅ ماڻهو **33.47%** |
-| ڪراچي سنڌ جو ___ | ✅ Top 3 | ✅ شھر 9% | ✅ شھر **16.72%** |
-| ھو پنھنجي ___ ڏانھن | ⚠️ | ⚠️ | ✅ گهر **11.67%** |
-| شاھه لطيف ___ | ✅ | ✅ | ✅ شاعريءَ **25.77%** |
-| سنڌ جي ___ پراڻي | ✅ Top 2 | ✅ Top 1 | ✅ تاريخ **16.04%** |
-| استاد ___ سيکا��ي | ✅ تعليم | ❌ استاد | ⚠️ Top 2 تعليم |
-| ٻار ___ ۾ پڙھن | ❌ | ❌ | ❌ گهر |
-| دنيا ___ مصروف | ✅ ۾ | ✅ ۾ 38% | ✅ ۾ **23.20%** |
-| سنڌو ___ سنڌ جي | ❌ | ⚠️ Top 4 | ⚠️ Top 2 ندي |
-| **Score** | **50%** | **70%** | **80%** |
----
-## Tokenizer
-Custom Sindhi BPE tokenizer — every Sindhi word stays as ONE token:
-```python
-Input  : سنڌي ٻولي دنيا جي قديم ٻولين مان ھڪ آھي
-Tokens : ['▁سنڌي', '▁ٻولي', '▁دنيا', '▁جي', '▁قديم', '▁ٻولين', '▁مان', '▁ھڪ', '▁آھي']
-Count  : 9 words = 9 tokens ✅
-```
-Unlike mBERT or XLM-R which split Sindhi words into multiple subword pieces, our tokenizer keeps each Sindhi word as a single token.
----
-## Comparison With Other Models
-| Model | Type | Perplexity | Fill-mask Quality |
-|---|---|---|---|
-| mBERT fine-tuned | Multilingual | 4.19 | ❌ Predicts punctuation |
-| XLM-R fine-tuned | Multilingual | 5.88 | ✅ 80% correct |
-| SindhiBERT Session 1 | Sindhi only | 78.10 | ✅ 50% |
-| SindhiBERT Session 2 | Sindhi only | 41.62 | ✅ 70% |
-| **SindhiBERT Session 3** | **Sindhi only** | **28.46** | **✅ 80%** |
-> Note: mBERT/XLM-R perplexity is low because they start from pretrained multilingual weights. SindhiBERT starts from zero and learns pure Sindhi — its predictions are always real Sindhi words, never punctuation or non-Sindhi tokens.
----
 ## Usage
 ```python
-from transformers import AutoModelForMaskedLM
-import sentencepiece as spm
-import torch
 import torch.nn.functional as F
 from huggingface_hub import hf_hub_download
-# Load model
-model   = AutoModelForMaskedLM.from_pretrained('hellosindh/sindhi-bert-base')
-model.eval()
-# Load tokenizer
-sp_path = hf_hub_download('hellosindh/sindhi-bert-base', 'sindhi_bpe_32k.model')
-sp      = spm.SentencePieceProcessor()
-sp.Load(sp_path)
-# Constants
 MASK_ID = 32000
 BOS_ID  = 2
 EOS_ID  = 3
-VOCAB_SIZE = 32000
-def fill_mask(sentence, top_k=5):
-    parts     = sentence.split('[MASK]')
-    left_ids  = sp.EncodeAsIds(parts[0].strip())
-    right_ids = sp.EncodeAsIds(parts[1].strip())
-    input_ids = [BOS_ID] + left_ids + [MASK_ID] + right_ids + [EOS_ID]
-    mask_pos  = len(left_ids) + 1
-    tensor    = torch.tensor([input_ids])
-    with torch.no_grad():
-        logits = model(tensor).logits[0, mask_pos]
-    logits[MASK_ID] = -float('inf')
-    probs = F.softmax(logits[:VOCAB_SIZE], dim=-1)
-    top_probs, top_ids = torch.topk(probs, top_k)
-    for prob, idx in zip(top_probs, top_ids):
-        word = sp.IdToPiece(idx.item()).replace('▁', '')
-        print(f'{word:<20} {prob.item()*100:.2f}%')
-# Example
-fill_mask('سنڌي [MASK] دنيا جي قديم ٻولين مان ھڪ آھي')
-# ٻولي     40.90%
-# ادب       7.86%
 ```
----
-## Roadmap
-- [x] Custom Sindhi BPE tokenizer (32K vocab)
-- [x] Session 1 — 500K lines, 5 epochs, PPL 78.10
-- [x] Session 2 — 1.5M lines, 3 epochs, PPL 41.62
-- [x] Session 3 — 589MB clean corpus, 2 epochs, PPL 28.46
-- [ ] Session 4 — more data + 3 epochs → target PPL ~18
-- [ ] Session 5 — fine-tune lower LR → target PPL ~12
-- [ ] Spell checker fine-tuning
-- [ ] Next word prediction
-- [ ] Named entity recognition
-- [ ] Sindhi chatbot
----
-## About
-The corpus was carefully cleaned using a custom pipeline including Unicode normalization, script standardization, he-character normalization (ھ/ه/ہ), and word-level corrections using a 9,355-entry Sindhi dictionary.

 - bert
 - masked-language-modeling
 - from-scratch
 ---
 # Sindhi-BERT-base
+First BERT-style model trained from scratch on Sindhi text.
 ## Training History
+| Session | Data | Epochs | PPL | Notes |
+|---|---|---|---|---|
+| S1 | 500K lines  | 5 | 78.10 | from scratch |
+| S2 | 1.5M lines  | 3 | 41.62 | continued |
+| S3 | 1.49M lines | 2 | 28.46 | bf16, cosine LR |
+| S4 | 87M words   | 3 | 35.42 | grouped context |
 ## Usage
 ```python
+from transformers import RobertaForMaskedLM
+import sentencepiece as spm, torch
 import torch.nn.functional as F
 from huggingface_hub import hf_hub_download
+REPO = "hellosindh/sindhi-bert-base"
 MASK_ID = 32000
 BOS_ID  = 2
 EOS_ID  = 3
+model   = RobertaForMaskedLM.from_pretrained(REPO)
+sp_path = hf_hub_download(REPO, "sindhi_bpe_32k.model")
+sp      = spm.SentencePieceProcessor()
+sp.Load(sp_path)
 ```

checkpoint-3924/config.json ADDED Viewed

	@@ -0,0 +1,28 @@

+{
+  "add_cross_attention": false,
+  "architectures": [
+    "RobertaForMaskedLM"
+  ],
+  "attention_probs_dropout_prob": 0.1,
+  "bos_token_id": 1,
+  "classifier_dropout": null,
+  "dtype": "float32",
+  "eos_token_id": 2,
+  "hidden_act": "gelu",
+  "hidden_dropout_prob": 0.1,
+  "hidden_size": 768,
+  "initializer_range": 0.02,
+  "intermediate_size": 3072,
+  "is_decoder": false,
+  "layer_norm_eps": 1e-12,
+  "max_position_embeddings": 514,
+  "model_type": "roberta",
+  "num_attention_heads": 12,
+  "num_hidden_layers": 12,
+  "pad_token_id": 0,
+  "tie_word_embeddings": true,
+  "transformers_version": "5.0.0",
+  "type_vocab_size": 1,
+  "use_cache": false,
+  "vocab_size": 32001
+}

checkpoint-3924/model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:66c9b40b4d1b2943a622be928e3f8beb231f2cf80d2acbe19352c740edfa76b9
+size 442633860

checkpoint-3924/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8cdbb31b8e427b2d5c5d5dce127c362cb391d70f8282995b2a405651b6695774
+size 885391563

checkpoint-3924/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:35f5af9b38d87cb532b16dd4de5175c2910bc86cf1976c6ccc3668da1c53606d
+size 14645

checkpoint-3924/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8eef8b1a8fe3ca13b13452c68d049d5772a114b25d47fd7c271209bdd37c174b
+size 1465

checkpoint-3924/trainer_state.json ADDED Viewed

	@@ -0,0 +1,332 @@

+{
+  "best_global_step": 3924,
+  "best_metric": 3.56946063041687,
+  "best_model_checkpoint": "sindhibert_session4/checkpoint-3924",
+  "epoch": 2.0,
+  "eval_steps": 1962,
+  "global_step": 3924,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.05098139179199592,
+      "grad_norm": 4.590001106262207,
+      "learning_rate": 5.609065155807366e-06,
+      "loss": 15.86372314453125,
+      "step": 100
+    },
+    {
+      "epoch": 0.10196278358399184,
+      "grad_norm": 5.000253677368164,
+      "learning_rate": 1.1274787535410765e-05,
+      "loss": 15.6683056640625,
+      "step": 200
+    },
+    {
+      "epoch": 0.15294417537598776,
+      "grad_norm": 5.164661407470703,
+      "learning_rate": 1.6940509915014164e-05,
+      "loss": 15.58547607421875,
+      "step": 300
+    },
+    {
+      "epoch": 0.20392556716798368,
+      "grad_norm": 4.895200729370117,
+      "learning_rate": 1.999658933249201e-05,
+      "loss": 15.5261376953125,
+      "step": 400
+    },
+    {
+      "epoch": 0.2549069589599796,
+      "grad_norm": 5.010247707366943,
+      "learning_rate": 1.9965659596003744e-05,
+      "loss": 15.493291015625,
+      "step": 500
+    },
+    {
+      "epoch": 0.3058883507519755,
+      "grad_norm": 4.85853910446167,
+      "learning_rate": 1.990261043359342e-05,
+      "loss": 15.43971435546875,
+      "step": 600
+    },
+    {
+      "epoch": 0.35686974254397147,
+      "grad_norm": 4.788653373718262,
+      "learning_rate": 1.9807645053376055e-05,
+      "loss": 15.409666748046876,
+      "step": 700
+    },
+    {
+      "epoch": 0.40785113433596737,
+      "grad_norm": 4.742185592651367,
+      "learning_rate": 1.968106952977309e-05,
+      "loss": 15.346304931640624,
+      "step": 800
+    },
+    {
+      "epoch": 0.45883252612796327,
+      "grad_norm": 4.758422374725342,
+      "learning_rate": 1.9523291817031276e-05,
+      "loss": 15.344024658203125,
+      "step": 900
+    },
+    {
+      "epoch": 0.5098139179199592,
+      "grad_norm": 4.854381084442139,
+      "learning_rate": 1.933482043438185e-05,
+      "loss": 15.307811279296875,
+      "step": 1000
+    },
+    {
+      "epoch": 0.5607953097119551,
+      "grad_norm": 4.7934041023254395,
+      "learning_rate": 1.9116262827077703e-05,
+      "loss": 15.254422607421875,
+      "step": 1100
+    },
+    {
+      "epoch": 0.611776701503951,
+      "grad_norm": 4.670731544494629,
+      "learning_rate": 1.88683234085909e-05,
+      "loss": 15.23345703125,
+      "step": 1200
+    },
+    {
+      "epoch": 0.6627580932959469,
+      "grad_norm": 4.993561267852783,
+      "learning_rate": 1.8591801290280664e-05,
+      "loss": 15.2450927734375,
+      "step": 1300
+    },
+    {
+      "epoch": 0.7137394850879429,
+      "grad_norm": 4.720964431762695,
+      "learning_rate": 1.8287587705849013e-05,
+      "loss": 15.1839599609375,
+      "step": 1400
+    },
+    {
+      "epoch": 0.7647208768799388,
+      "grad_norm": 5.050419330596924,
+      "learning_rate": 1.7956663138885173e-05,
+      "loss": 15.164833984375,
+      "step": 1500
+    },
+    {
+      "epoch": 0.8157022686719347,
+      "grad_norm": 4.826648712158203,
+      "learning_rate": 1.760009416275661e-05,
+      "loss": 15.130496826171875,
+      "step": 1600
+    },
+    {
+      "epoch": 0.8666836604639306,
+      "grad_norm": 4.858438014984131,
+      "learning_rate": 1.721903000303185e-05,
+      "loss": 15.125797119140625,
+      "step": 1700
+    },
+    {
+      "epoch": 0.9176650522559265,
+      "grad_norm": 4.9611430168151855,
+      "learning_rate": 1.6814698833514326e-05,
+      "loss": 15.13617431640625,
+      "step": 1800
+    },
+    {
+      "epoch": 0.9686464440479226,
+      "grad_norm": 4.663859844207764,
+      "learning_rate": 1.63884038178253e-05,
+      "loss": 15.072591552734375,
+      "step": 1900
+    },
+    {
+      "epoch": 1.0,
+      "eval_loss": 3.636704444885254,
+      "eval_runtime": 8.0138,
+      "eval_samples_per_second": 632.91,
+      "eval_steps_per_second": 9.983,
+      "step": 1962
+    },
+    {
+      "epoch": 1.0193729288809585,
+      "grad_norm": 4.863068103790283,
+      "learning_rate": 1.5941518909293737e-05,
+      "loss": 14.968798828125,
+      "step": 2000
+    },
+    {
+      "epoch": 1.0703543206729544,
+      "grad_norm": 5.036495685577393,
+      "learning_rate": 1.5475484422690282e-05,
+      "loss": 15.0290869140625,
+      "step": 2100
+    },
+    {
+      "epoch": 1.1213357124649503,
+      "grad_norm": 5.248174667358398,
+      "learning_rate": 1.4991802392077543e-05,
+      "loss": 15.004036865234376,
+      "step": 2200
+    },
+    {
+      "epoch": 1.1723171042569462,
+      "grad_norm": 4.950564384460449,
+      "learning_rate": 1.4492031729738489e-05,
+      "loss": 15.002611083984375,
+      "step": 2300
+    },
+    {
+      "epoch": 1.2232984960489421,
+      "grad_norm": 4.509192943572998,
+      "learning_rate": 1.3977783201785732e-05,
+      "loss": 14.96060302734375,
+      "step": 2400
+    },
+    {
+      "epoch": 1.274279887840938,
+      "grad_norm": 4.900182723999023,
+      "learning_rate": 1.3450714236645352e-05,
+      "loss": 14.971297607421874,
+      "step": 2500
+    },
+    {
+      "epoch": 1.325261279632934,
+      "grad_norm": 5.138764381408691,
+      "learning_rate": 1.2912523583147625e-05,
+      "loss": 14.928385009765625,
+      "step": 2600
+    },
+    {
+      "epoch": 1.3762426714249298,
+      "grad_norm": 4.894199848175049,
+      "learning_rate": 1.2364945835441636e-05,
+      "loss": 14.938167724609375,
+      "step": 2700
+    },
+    {
+      "epoch": 1.4272240632169257,
+      "grad_norm": 4.8737921714782715,
+      "learning_rate": 1.1809745842380042e-05,
+      "loss": 14.923902587890625,
+      "step": 2800
+    },
+    {
+      "epoch": 1.4782054550089216,
+      "grad_norm": 4.8258819580078125,
+      "learning_rate": 1.1248713019392635e-05,
+      "loss": 14.89677001953125,
+      "step": 2900
+    },
+    {
+      "epoch": 1.5291868468009175,
+      "grad_norm": 4.769787788391113,
+      "learning_rate": 1.0683655581181524e-05,
+      "loss": 14.87692626953125,
+      "step": 3000
+    },
+    {
+      "epoch": 1.5801682385929134,
+      "grad_norm": 4.92316198348999,
+      "learning_rate": 1.0116394713826117e-05,
+      "loss": 14.849693603515625,
+      "step": 3100
+    },
+    {
+      "epoch": 1.6311496303849093,
+      "grad_norm": 4.873258590698242,
+      "learning_rate": 9.548758705081177e-06,
+      "loss": 14.833634033203126,
+      "step": 3200
+    },
+    {
+      "epoch": 1.6821310221769055,
+      "grad_norm": 4.738825798034668,
+      "learning_rate": 8.98257705178612e-06,
+      "loss": 14.85665283203125,
+      "step": 3300
+    },
+    {
+      "epoch": 1.7331124139689014,
+      "grad_norm": 4.907736778259277,
+      "learning_rate": 8.419674563377416e-06,
+      "loss": 14.8664599609375,
+      "step": 3400
+    },
+    {
+      "epoch": 1.7840938057608973,
+      "grad_norm": 4.977413177490234,
+      "learning_rate": 7.861865480508541e-06,
+      "loss": 14.83008056640625,
+      "step": 3500
+    },
+    {
+      "epoch": 1.8350751975528932,
+      "grad_norm": 4.792273044586182,
+      "learning_rate": 7.310947627733231e-06,
+      "loss": 14.81404541015625,
+      "step": 3600
+    },
+    {
+      "epoch": 1.886056589344889,
+      "grad_norm": 4.84648323059082,
+      "learning_rate": 6.768696619097996e-06,
+      "loss": 14.831793212890625,
+      "step": 3700
+    },
+    {
+      "epoch": 1.9370379811368852,
+      "grad_norm": 4.854404449462891,
+      "learning_rate": 6.236860135319321e-06,
+      "loss": 14.826976318359375,
+      "step": 3800
+    },
+    {
+      "epoch": 1.988019372928881,
+      "grad_norm": 4.615888595581055,
+      "learning_rate": 5.717152290990302e-06,
+      "loss": 14.767562255859374,
+      "step": 3900
+    },
+    {
+      "epoch": 2.0,
+      "eval_loss": 3.56946063041687,
+      "eval_runtime": 8.0481,
+      "eval_samples_per_second": 630.208,
+      "eval_steps_per_second": 9.94,
+      "step": 3924
+    }
+  ],
+  "logging_steps": 100,
+  "max_steps": 5886,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 3,
+  "save_steps": 1962,
+  "stateful_callbacks": {
+    "EarlyStoppingCallback": {
+      "args": {
+        "early_stopping_patience": 3,
+        "early_stopping_threshold": 0.0
+      },
+      "attributes": {
+        "early_stopping_patience_counter": 0
+      }
+    },
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 2.643322074019246e+17,
+  "train_batch_size": 64,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-3924/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:accc825ca2e280888c9eed825fcb7985700c1fb466ed8b16208ff9e7b14f1318
+size 5137

checkpoint-5886/config.json ADDED Viewed

	@@ -0,0 +1,28 @@

+{
+  "add_cross_attention": false,
+  "architectures": [
+    "RobertaForMaskedLM"
+  ],
+  "attention_probs_dropout_prob": 0.1,
+  "bos_token_id": 1,
+  "classifier_dropout": null,
+  "dtype": "float32",
+  "eos_token_id": 2,
+  "hidden_act": "gelu",
+  "hidden_dropout_prob": 0.1,
+  "hidden_size": 768,
+  "initializer_range": 0.02,
+  "intermediate_size": 3072,
+  "is_decoder": false,
+  "layer_norm_eps": 1e-12,
+  "max_position_embeddings": 514,
+  "model_type": "roberta",
+  "num_attention_heads": 12,
+  "num_hidden_layers": 12,
+  "pad_token_id": 0,
+  "tie_word_embeddings": true,
+  "transformers_version": "5.0.0",
+  "type_vocab_size": 1,
+  "use_cache": false,
+  "vocab_size": 32001
+}

checkpoint-5886/model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:170894fbff2599922589dc645dfc871455543fe1f1fa33d3381f8353cf0b2a5b
+size 442633860

checkpoint-5886/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:49f438e933e34f365171b080043f51c3931028fb9b12b84462700e4fec8ed022
+size 885391563

checkpoint-5886/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5b568051719bceb1b41126c825c8846c1625bce2c01817c9c4450273020cfb29
+size 14645

checkpoint-5886/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a8e0fb9255f3eabc9bbca3c948e4f71fe410e407e554e26add1a06864fa8f902
+size 1465

checkpoint-5886/trainer_state.json ADDED Viewed

	@@ -0,0 +1,473 @@

+{
+  "best_global_step": 5886,
+  "best_metric": 3.5591108798980713,
+  "best_model_checkpoint": "sindhibert_session4/checkpoint-5886",
+  "epoch": 3.0,
+  "eval_steps": 1962,
+  "global_step": 5886,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.05098139179199592,
+      "grad_norm": 4.590001106262207,
+      "learning_rate": 5.609065155807366e-06,
+      "loss": 15.86372314453125,
+      "step": 100
+    },
+    {
+      "epoch": 0.10196278358399184,
+      "grad_norm": 5.000253677368164,
+      "learning_rate": 1.1274787535410765e-05,
+      "loss": 15.6683056640625,
+      "step": 200
+    },
+    {
+      "epoch": 0.15294417537598776,
+      "grad_norm": 5.164661407470703,
+      "learning_rate": 1.6940509915014164e-05,
+      "loss": 15.58547607421875,
+      "step": 300
+    },
+    {
+      "epoch": 0.20392556716798368,
+      "grad_norm": 4.895200729370117,
+      "learning_rate": 1.999658933249201e-05,
+      "loss": 15.5261376953125,
+      "step": 400
+    },
+    {
+      "epoch": 0.2549069589599796,
+      "grad_norm": 5.010247707366943,
+      "learning_rate": 1.9965659596003744e-05,
+      "loss": 15.493291015625,
+      "step": 500
+    },
+    {
+      "epoch": 0.3058883507519755,
+      "grad_norm": 4.85853910446167,
+      "learning_rate": 1.990261043359342e-05,
+      "loss": 15.43971435546875,
+      "step": 600
+    },
+    {
+      "epoch": 0.35686974254397147,
+      "grad_norm": 4.788653373718262,
+      "learning_rate": 1.9807645053376055e-05,
+      "loss": 15.409666748046876,
+      "step": 700
+    },
+    {
+      "epoch": 0.40785113433596737,
+      "grad_norm": 4.742185592651367,
+      "learning_rate": 1.968106952977309e-05,
+      "loss": 15.346304931640624,
+      "step": 800
+    },
+    {
+      "epoch": 0.45883252612796327,
+      "grad_norm": 4.758422374725342,
+      "learning_rate": 1.9523291817031276e-05,
+      "loss": 15.344024658203125,
+      "step": 900
+    },
+    {
+      "epoch": 0.5098139179199592,
+      "grad_norm": 4.854381084442139,
+      "learning_rate": 1.933482043438185e-05,
+      "loss": 15.307811279296875,
+      "step": 1000
+    },
+    {
+      "epoch": 0.5607953097119551,
+      "grad_norm": 4.7934041023254395,
+      "learning_rate": 1.9116262827077703e-05,
+      "loss": 15.254422607421875,
+      "step": 1100
+    },
+    {
+      "epoch": 0.611776701503951,
+      "grad_norm": 4.670731544494629,
+      "learning_rate": 1.88683234085909e-05,
+      "loss": 15.23345703125,
+      "step": 1200
+    },
+    {
+      "epoch": 0.6627580932959469,
+      "grad_norm": 4.993561267852783,
+      "learning_rate": 1.8591801290280664e-05,
+      "loss": 15.2450927734375,
+      "step": 1300
+    },
+    {
+      "epoch": 0.7137394850879429,
+      "grad_norm": 4.720964431762695,
+      "learning_rate": 1.8287587705849013e-05,
+      "loss": 15.1839599609375,
+      "step": 1400
+    },
+    {
+      "epoch": 0.7647208768799388,
+      "grad_norm": 5.050419330596924,
+      "learning_rate": 1.7956663138885173e-05,
+      "loss": 15.164833984375,
+      "step": 1500
+    },
+    {
+      "epoch": 0.8157022686719347,
+      "grad_norm": 4.826648712158203,
+      "learning_rate": 1.760009416275661e-05,
+      "loss": 15.130496826171875,
+      "step": 1600
+    },
+    {
+      "epoch": 0.8666836604639306,
+      "grad_norm": 4.858438014984131,
+      "learning_rate": 1.721903000303185e-05,
+      "loss": 15.125797119140625,
+      "step": 1700
+    },
+    {
+      "epoch": 0.9176650522559265,
+      "grad_norm": 4.9611430168151855,
+      "learning_rate": 1.6814698833514326e-05,
+      "loss": 15.13617431640625,
+      "step": 1800
+    },
+    {
+      "epoch": 0.9686464440479226,
+      "grad_norm": 4.663859844207764,
+      "learning_rate": 1.63884038178253e-05,
+      "loss": 15.072591552734375,
+      "step": 1900
+    },
+    {
+      "epoch": 1.0,
+      "eval_loss": 3.636704444885254,
+      "eval_runtime": 8.0138,
+      "eval_samples_per_second": 632.91,
+      "eval_steps_per_second": 9.983,
+      "step": 1962
+    },
+    {
+      "epoch": 1.0193729288809585,
+      "grad_norm": 4.863068103790283,
+      "learning_rate": 1.5941518909293737e-05,
+      "loss": 14.968798828125,
+      "step": 2000
+    },
+    {
+      "epoch": 1.0703543206729544,
+      "grad_norm": 5.036495685577393,
+      "learning_rate": 1.5475484422690282e-05,
+      "loss": 15.0290869140625,
+      "step": 2100
+    },
+    {
+      "epoch": 1.1213357124649503,
+      "grad_norm": 5.248174667358398,
+      "learning_rate": 1.4991802392077543e-05,
+      "loss": 15.004036865234376,
+      "step": 2200
+    },
+    {
+      "epoch": 1.1723171042569462,
+      "grad_norm": 4.950564384460449,
+      "learning_rate": 1.4492031729738489e-05,
+      "loss": 15.002611083984375,
+      "step": 2300
+    },
+    {
+      "epoch": 1.2232984960489421,
+      "grad_norm": 4.509192943572998,
+      "learning_rate": 1.3977783201785732e-05,
+      "loss": 14.96060302734375,
+      "step": 2400
+    },
+    {
+      "epoch": 1.274279887840938,
+      "grad_norm": 4.900182723999023,
+      "learning_rate": 1.3450714236645352e-05,
+      "loss": 14.971297607421874,
+      "step": 2500
+    },
+    {
+      "epoch": 1.325261279632934,
+      "grad_norm": 5.138764381408691,
+      "learning_rate": 1.2912523583147625e-05,
+      "loss": 14.928385009765625,
+      "step": 2600
+    },
+    {
+      "epoch": 1.3762426714249298,
+      "grad_norm": 4.894199848175049,
+      "learning_rate": 1.2364945835441636e-05,
+      "loss": 14.938167724609375,
+      "step": 2700
+    },
+    {
+      "epoch": 1.4272240632169257,
+      "grad_norm": 4.8737921714782715,
+      "learning_rate": 1.1809745842380042e-05,
+      "loss": 14.923902587890625,
+      "step": 2800
+    },
+    {
+      "epoch": 1.4782054550089216,
+      "grad_norm": 4.8258819580078125,
+      "learning_rate": 1.1248713019392635e-05,
+      "loss": 14.89677001953125,
+      "step": 2900
+    },
+    {
+      "epoch": 1.5291868468009175,
+      "grad_norm": 4.769787788391113,
+      "learning_rate": 1.0683655581181524e-05,
+      "loss": 14.87692626953125,
+      "step": 3000
+    },
+    {
+      "epoch": 1.5801682385929134,
+      "grad_norm": 4.92316198348999,
+      "learning_rate": 1.0116394713826117e-05,
+      "loss": 14.849693603515625,
+      "step": 3100
+    },
+    {
+      "epoch": 1.6311496303849093,
+      "grad_norm": 4.873258590698242,
+      "learning_rate": 9.548758705081177e-06,
+      "loss": 14.833634033203126,
+      "step": 3200
+    },
+    {
+      "epoch": 1.6821310221769055,
+      "grad_norm": 4.738825798034668,
+      "learning_rate": 8.98257705178612e-06,
+      "loss": 14.85665283203125,
+      "step": 3300
+    },
+    {
+      "epoch": 1.7331124139689014,
+      "grad_norm": 4.907736778259277,
+      "learning_rate": 8.419674563377416e-06,
+      "loss": 14.8664599609375,
+      "step": 3400
+    },
+    {
+      "epoch": 1.7840938057608973,
+      "grad_norm": 4.977413177490234,
+      "learning_rate": 7.861865480508541e-06,
+      "loss": 14.83008056640625,
+      "step": 3500
+    },
+    {
+      "epoch": 1.8350751975528932,
+      "grad_norm": 4.792273044586182,
+      "learning_rate": 7.310947627733231e-06,
+      "loss": 14.81404541015625,
+      "step": 3600
+    },
+    {
+      "epoch": 1.886056589344889,
+      "grad_norm": 4.84648323059082,
+      "learning_rate": 6.768696619097996e-06,
+      "loss": 14.831793212890625,
+      "step": 3700
+    },
+    {
+      "epoch": 1.9370379811368852,
+      "grad_norm": 4.854404449462891,
+      "learning_rate": 6.236860135319321e-06,
+      "loss": 14.826976318359375,
+      "step": 3800
+    },
+    {
+      "epoch": 1.988019372928881,
+      "grad_norm": 4.615888595581055,
+      "learning_rate": 5.717152290990302e-06,
+      "loss": 14.767562255859374,
+      "step": 3900
+    },
+    {
+      "epoch": 2.0,
+      "eval_loss": 3.56946063041687,
+      "eval_runtime": 8.0481,
+      "eval_samples_per_second": 630.208,
+      "eval_steps_per_second": 9.94,
+      "step": 3924
+    },
+    {
+      "epoch": 2.038745857761917,
+      "grad_norm": 5.015805721282959,
+      "learning_rate": 5.211248109971254e-06,
+      "loss": 14.695634765625,
+      "step": 4000
+    },
+    {
+      "epoch": 2.089727249553913,
+      "grad_norm": 4.800245761871338,
+      "learning_rate": 4.720778126770141e-06,
+      "loss": 14.764068603515625,
+      "step": 4100
+    },
+    {
+      "epoch": 2.140708641345909,
+      "grad_norm": 4.756154537200928,
+      "learning_rate": 4.247323131312676e-06,
+      "loss": 14.755054931640625,
+      "step": 4200
+    },
+    {
+      "epoch": 2.191690033137905,
+      "grad_norm": 4.989803314208984,
+      "learning_rate": 3.7924090740397178e-06,
+      "loss": 14.760721435546875,
+      "step": 4300
+    },
+    {
+      "epoch": 2.2426714249299007,
+      "grad_norm": 4.568801403045654,
+      "learning_rate": 3.3575021477529313e-06,
+      "loss": 14.72455810546875,
+      "step": 4400
+    },
+    {
+      "epoch": 2.2936528167218966,
+      "grad_norm": 4.871072769165039,
+      "learning_rate": 2.944004062059924e-06,
+      "loss": 14.743800048828126,
+      "step": 4500
+    },
+    {
+      "epoch": 2.3446342085138925,
+      "grad_norm": 4.790256500244141,
+      "learning_rate": 2.5532475256494073e-06,
+      "loss": 14.7241162109375,
+      "step": 4600
+    },
+    {
+      "epoch": 2.3956156003058884,
+      "grad_norm": 4.770144462585449,
+      "learning_rate": 2.186491950957048e-06,
+      "loss": 14.711162109375,
+      "step": 4700
+    },
+    {
+      "epoch": 2.4465969920978843,
+      "grad_norm": 4.44427490234375,
+      "learning_rate": 1.8449193950659018e-06,
+      "loss": 14.72890625,
+      "step": 4800
+    },
+    {
+      "epoch": 2.49757838388988,
+      "grad_norm": 4.664465427398682,
+      "learning_rate": 1.5296307499239903e-06,
+      "loss": 14.713804931640626,
+      "step": 4900
+    },
+    {
+      "epoch": 2.548559775681876,
+      "grad_norm": 4.861291408538818,
+      "learning_rate": 1.2416421941579448e-06,
+      "loss": 14.730694580078126,
+      "step": 5000
+    },
+    {
+      "epoch": 2.599541167473872,
+      "grad_norm": 4.662012577056885,
+      "learning_rate": 9.818819179185713e-07,
+      "loss": 14.70477294921875,
+      "step": 5100
+    },
+    {
+      "epoch": 2.650522559265868,
+      "grad_norm": 4.803001403808594,
+      "learning_rate": 7.511871313142238e-07,
+      "loss": 14.7314208984375,
+      "step": 5200
+    },
+    {
+      "epoch": 2.701503951057864,
+      "grad_norm": 4.746646404266357,
+      "learning_rate": 5.503013660737899e-07,
+      "loss": 14.70580810546875,
+      "step": 5300
+    },
+    {
+      "epoch": 2.7524853428498597,
+      "grad_norm": 4.867108345031738,
+      "learning_rate": 3.798720791360988e-07,
+      "loss": 14.710306396484375,
+      "step": 5400
+    },
+    {
+      "epoch": 2.8034667346418556,
+      "grad_norm": 4.6949992179870605,
+      "learning_rate": 2.404485658893807e-07,
+      "loss": 14.725491943359375,
+      "step": 5500
+    },
+    {
+      "epoch": 2.8544481264338515,
+      "grad_norm": 4.641607284545898,
+      "learning_rate": 1.3248018978643695e-07,
+      "loss": 14.7078369140625,
+      "step": 5600
+    },
+    {
+      "epoch": 2.905429518225848,
+      "grad_norm": 4.756202220916748,
+      "learning_rate": 5.6314934041501455e-08,
+      "loss": 14.697396240234376,
+      "step": 5700
+    },
+    {
+      "epoch": 2.9564109100178433,
+      "grad_norm": 4.691574573516846,
+      "learning_rate": 1.2198280076668455e-08,
+      "loss": 14.694278564453125,
+      "step": 5800
+    },
+    {
+      "epoch": 3.0,
+      "eval_loss": 3.5591108798980713,
+      "eval_runtime": 8.0338,
+      "eval_samples_per_second": 631.333,
+      "eval_steps_per_second": 9.958,
+      "step": 5886
+    }
+  ],
+  "logging_steps": 100,
+  "max_steps": 5886,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 3,
+  "save_steps": 1962,
+  "stateful_callbacks": {
+    "EarlyStoppingCallback": {
+      "args": {
+        "early_stopping_patience": 3,
+        "early_stopping_threshold": 0.0
+      },
+      "attributes": {
+        "early_stopping_patience_counter": 0
+      }
+    },
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": true
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 3.964983111028869e+17,
+  "train_batch_size": 64,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-5886/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:accc825ca2e280888c9eed825fcb7985700c1fb466ed8b16208ff9e7b14f1318
+size 5137

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:af548243f2a2884a4a369c6b04c497110cb9a587cea0a5041e9a0820c72889ef
 size 442633860

 version https://git-lfs.github.com/spec/v1
+oid sha256:170894fbff2599922589dc645dfc871455543fe1f1fa33d3381f8353cf0b2a5b
 size 442633860

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:dc38b2eea3f8755ab49032af3c555b4a3e9c23274e629dd4c763171401716a57
 size 5137

 version https://git-lfs.github.com/spec/v1
+oid sha256:accc825ca2e280888c9eed825fcb7985700c1fb466ed8b16208ff9e7b14f1318
 size 5137