MihaiPopa-1's picture
Create README.md
4702bd2 verified
metadata
license: apache-2.0
datasets:
  - OLMo-Coding/starcoder-python-instruct
language:
  - en
pipeline_tag: text-generation
tags:
  - tiny-model
  - cinnabarlm
  - python
  - code
  - tiny-llm
  - tiny-lm
  - tinylm
  - tinyllm

CinnabarLM Python

CinnabarLM Python is a tiny, 4M-parameter code LLM trained for ~38 minutes on a T4 GPU (on Colab)! It's only 16 MB in size and now it's Llama-based!

Why?

Because it's a good idea to make tiny LLMs. Some people already did with MicroLM, Spark 4 5M and Tenete 8M, but not myself!

Differences from Preview

  • Now it's Llama-based, Preview was a custom model
  • And of course, it's stable now (it doesn't generate gibberish / mess of words anymore)!

Model Configurations

Parameter Value
Tokenizer Llama 3's tokenizer (Tiktoken / BPE)
Vocabulary Size 4096 tokens
Batch Size 4 x 8 = 32
Context Window Maybe 2048 tokens
hidden_size 192
intermediate_size 192
num_hidden_layers 6
num_attention_heads 6
max_position_embeddings 2048
rms_norm_eps 1e-5
initializer_range 0.02
use_cache True
tie_word_embeddings False
rope_theta 10000.0

Training Configurations

Hyperparameter Value
output_dir "./cinnabarlm-v2"
max_steps 10000
per_device_train_batch_size 8
gradient_accumulation_steps 4
learning_rate 6e-4
weight_decay 0.01
warmup_steps 500
lr_scheduler_type "cosine"
logging_steps 100
save_steps 2000
fp16 True
save_total_limit 2
prediction_loss_only True
logging_first_step True

Limitations

  • Not Instruction-Tuned: It's only a base model, so it only completes text.
  • Python-Only: It's trained on Python code (The Stack).

Some other details

  • It's trained on ~70 million tokens of The Stack
  • The name "CinnabarLM" that I picked was made by combining "Cinnabar" (the new block from the Chaos Cubed drop in Minecraft) + "LM" (Language Model)