tekkmaven
/

flint-1.2B

Text Generation

small-language-model

thought-action-pretraining

Model card Files Files and versions

Commit History

Training status: step 256000

4e8123c
verified

tekkmaven commited on 2 days ago

Checkpoint weights: step 256000 (1667MB)

4402a06
verified

tekkmaven commited on 2 days ago

Training status: step 240000

2704215
verified

tekkmaven commited on 2 days ago

Checkpoint weights: step 240000 (1667MB)

d69c665
verified

tekkmaven commited on 2 days ago

Training status: step 224000

c9c1faf
verified

tekkmaven commited on 2 days ago

Checkpoint weights: step 224000 (1667MB)

e2fa33d
verified

tekkmaven commited on 2 days ago

Training status: step 208000

01efe26
verified

tekkmaven commited on 2 days ago

Checkpoint weights: step 208000 (1666MB)

e2bef11
verified

tekkmaven commited on 2 days ago

Training status: step 192000

b087da5
verified

tekkmaven commited on 2 days ago

Checkpoint weights: step 192000 (1665MB)

190759f
verified

tekkmaven commited on 2 days ago

Training status: step 176000

ad408e0
verified

tekkmaven commited on 2 days ago

Checkpoint weights: step 176000 (1666MB)

270c549
verified

tekkmaven commited on 2 days ago

Training status: step 160000

7fc6928
verified

tekkmaven commited on 3 days ago

Checkpoint weights: step 160000 (1666MB)

3054872
verified

tekkmaven commited on 3 days ago

Training status: step 144000

dabb0f9
verified

tekkmaven commited on 3 days ago

Checkpoint weights: step 144000 (1666MB)

3f7e2e0
verified

tekkmaven commited on 3 days ago

Training status: step 128000

07469ea
verified

tekkmaven commited on 3 days ago

Checkpoint weights: step 128000 (1666MB)

8d3a460
verified

tekkmaven commited on 3 days ago

Training status: step 112000

107d7e2
verified

tekkmaven commited on 3 days ago

Checkpoint weights: step 112000 (1664MB)

42f2552
verified

tekkmaven commited on 3 days ago

Training status: step 96000

aebd4fb
verified

tekkmaven commited on 13 days ago

Checkpoint weights: step 96000 (1666MB)

7871a7c
verified

tekkmaven commited on 13 days ago

Training status: step 80000

0e4bcd7
verified

tekkmaven commited on 14 days ago

Checkpoint weights: step 80000 (1666MB)

6863f70
verified

tekkmaven commited on 14 days ago

Training status: step 64000

114bae5
verified

tekkmaven commited on 14 days ago

Checkpoint weights: step 64000 (1666MB)

79ca53f
verified

tekkmaven commited on 14 days ago

Training status: step 48000

4bf1029
verified

tekkmaven commited on 14 days ago

Checkpoint weights: step 48000 (1666MB)

9063b83
verified

tekkmaven commited on 14 days ago

Remove stale training_status.json from diverged run

df69066
verified

tekkmaven commited on 14 days ago

Always pass --resume so train.py checks Hub when local is empty"

abb62fc
verified

tekkmaven commited on 14 days ago

Auto-resume from Hub when local checkpoints are cleared by Kaggle\n\nNow --resume checks: local disk first → Hub download fallback → fresh start\nNo more lost progress across Kaggle sessions."

b2c7b9b
verified

tekkmaven commited on 14 days ago

Training status: step 32000

523eb1c
verified

tekkmaven commited on 14 days ago

Checkpoint weights: step 32000 (1668MB)

dcd216f
verified

tekkmaven commited on 14 days ago

Training status: step 16000

b7e64d7
verified

tekkmaven commited on 14 days ago

Checkpoint weights: step 16000 (1671MB)

7c08bcf
verified

tekkmaven commited on 14 days ago

Fix divergence: peak_lr 1e-3 → 3e-4, max_grad_norm 1.0 → 0.5 (batch=8 too small for high LR)"

9492039
verified

tekkmaven commited on 15 days ago

Delete stale checkpoint from diverged run

554b735
verified

tekkmaven commited on 15 days ago

Training status: step 16000

7d6db81
verified

tekkmaven commited on 15 days ago

Checkpoint weights: step 16000 (1688MB)

52f418f
verified

tekkmaven commited on 15 days ago

Fix data.py: handle edge cases, add fallback for failed datasets, fix orca-agentinstruct split names

a92ed74
verified

tekkmaven commited on 15 days ago

CRITICAL FIX: Wire up real data pipeline (was training on random tokens!)\n\nChanges:\n- Replace random.randint batch with TAPDataPipeline streaming real data\n- Add tokenizer initialization with TAP special tokens\n- Add pad_fraction diagnostic to detect data issues\n- Keep upd_rms diagnostic\n- Stage-aware curriculum switching\n- Proper pad token masking in loss (was already in build_model_mtp)\n\nThis fixes the root cause of loss stuck at ln(vocab)=10.85"

bd50131
verified

tekkmaven commited on 15 days ago

Remove stale training_status.json from old run

3f51ef2
verified

tekkmaven commited on 15 days ago

Delete stale checkpoints from stuck training (loss=10.95, bad init + low LR)

59033e2
verified

tekkmaven commited on 15 days ago

Fix config.py default peak_lr to 1e-3

32dd395
verified

tekkmaven commited on 15 days ago

Fix stuck training: peak_lr 5e-4 → 1e-3, increase weight decay for stability

242a666
verified

tekkmaven commited on 15 days ago

Fix stuck loss: increase LR to 1e-3, fix initialization scales, add diagnostics

4f792d0
verified

tekkmaven commited on 15 days ago

Training status: step 176000

f02ec75
verified

tekkmaven commited on 16 days ago

Checkpoint weights: step 176000 (1670MB)

e696556
verified

tekkmaven commited on 16 days ago

Training status: step 160000

25117be
verified

tekkmaven commited on 16 days ago

Checkpoint weights: step 160000 (1678MB)

a3de037
verified

tekkmaven commited on 16 days ago