Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Website
Tasks
HuggingChat
Collections
Languages
Organizations
Community
Blog
Posts
Daily Papers
Learn
Discord
Forum
GitHub
Solutions
Team & Enterprise
Hugging Face PRO
Enterprise Support
Inference Providers
Inference Endpoints
Storage Buckets
Log In
Sign Up
tekkmaven
/
flint-1.2B
like
0
Text Generation
JAX
8 datasets
English
llama
reasoning
tool-use
agentic
thinking
pretrained
tpu
small-language-model
thought-action-pretraining
ml-intern
arxiv:
10 papers
License:
apache-2.0
Model card
Files
Files and versions
xet
Community
main
flint-1.2B
203 kB
Ctrl+K
Ctrl+K
1 contributor
History:
77 commits
tekkmaven
Fix divergence: peak_lr 1e-3 → 3e-4, max_grad_norm 1.0 → 0.5 (batch=8 too small for high LR)"
9492039
verified
about 10 hours ago
configs
Upload configs/flint_40h.yaml
3 days ago
.gitattributes
Safe
1.52 kB
initial commit
4 days ago
README.md
Safe
9.66 kB
Update ML Intern artifact metadata
4 days ago
SCALING_GUIDE.md
Safe
3.43 kB
Upload SCALING_GUIDE.md
3 days ago
checkpointing.py
Safe
9.74 kB
Fix: push full model weights to Hub at hub_push_interval (not just status JSON)"
2 days ago
config.json
Safe
996 Bytes
Fix: config.json vocab_size 49216→49280
3 days ago
config.py
6.63 kB
Fix config.py default peak_lr to 1e-3
about 15 hours ago
config.yaml
3.08 kB
Fix divergence: peak_lr 1e-3 → 3e-4, max_grad_norm 1.0 → 0.5 (batch=8 too small for high LR)"
about 10 hours ago
convert_to_hf.py
Safe
5.82 kB
Upload convert_to_hf.py
3 days ago
data.py
16.3 kB
Fix data.py: handle edge cases, add fallback for failed datasets, fix orca-agentinstruct split names
about 14 hours ago
data_pipeline.py
Safe
19.2 kB
Upload data_pipeline.py
3 days ago
evaluate.py
Safe
11.2 kB
Add evaluation script for benchmarking
4 days ago
export_hf.py
Safe
9.4 kB
Add model export utility (export_hf.py) for converting JAX weights to HF safetensors
4 days ago
muon.py
Safe
11.6 kB
Add Muon optimizer implementation (muon.py)
4 days ago
requirements.txt
Safe
525 Bytes
Upload requirements.txt
3 days ago
run_kaggle.py
Safe
5.78 kB
Fix: detect both old orbax and new numpy checkpoints, warn about incompatible format"
3 days ago
train.py
17.3 kB
CRITICAL FIX: Wire up real data pipeline (was training on random tokens!)\n\nChanges:\n- Replace random.randint batch with TAPDataPipeline streaming real data\n- Add tokenizer initialization with TAP special tokens\n- Add pad_fraction diagnostic to detect data issues\n- Keep upd_rms diagnostic\n- Stage-aware curriculum switching\n- Proper pad token masking in loss (was already in build_model_mtp)\n\nThis fixes the root cause of loss stuck at ln(vocab)=10.85"
about 14 hours ago
train_flint.py
Safe
34.4 kB
Upload train_flint.py
3 days ago
training_status.json
202 Bytes
Training status: step 16000
about 12 hours ago