fhai50032's picture
Update README.md
e03b84d verified
metadata
library_name: transformers
datasets:
  - tinycompany/Instruct-Is-All-You-Need
base_model:
  - tinycompany/BiBo-Mini-v0.9x
license: apache-2.0

Multilingual Superb Performance in Hindi , English and Hinglish

Would Help to set good base for thinking models

Has native thinking in Hinglish and English.

Trained on v4-8 TPU

  • Active Params : 1.7B (Including Embedding Layer)
  • Specialized Tokenizer (fhai50032/QTK-81K) For better Tokenization for Hindi , English, Math & Code
  • Tied Embeddings
  • Torch-XLA (SPMD)
  • Flash-Attention ( Block-Size = 512 )
  • 6B Tokens Trained
  • Training Time = 32h
  • AdamW Optimizer
  • Cosine Schedular
  • Batch_Size = 72
  • Max_Seq_Len = 2048
  • Packed = True
  • Min_lr = 0
  • Max_lr = 3e-4
  • Epoch = 2
  • Final Val_loss = 1.04x
  • Final Running Loss = 0.9x
  • Weight Decay = 0.05
  • LLama Arch

Average Training Throughput

  • 42-000 tokens / second

Evals will be in Dir

Compute Provided by Google ;)

❤️ TRC ❤️Google