| --- |
| license: apache-2.0 |
| datasets: |
| - OLMo-Coding/starcoder-python-instruct |
| language: |
| - en |
| pipeline_tag: text-generation |
| tags: |
| - tiny-model |
| - cinnabarlm |
| - python |
| - code |
| - tiny-llm |
| - tiny-lm |
| - tinylm |
| - tinyllm |
| --- |
| |
| # CinnabarLM Python |
| CinnabarLM Python is a tiny, 4M-parameter code LLM trained for ~38 minutes on a T4 GPU (on Colab)! It's only 16 MB in size and now it's Llama-based! |
|
|
| # Why? |
| Because it's a good idea to make tiny LLMs. Some people already did with [MicroLM](https://huggingface.co/CromIA/MicroLM-1M), [Spark 4 5M](https://huggingface.co/LH-Tech-AI/Spark-5M-Base-v4) and [Tenete 8M](https://huggingface.co/Harley-ml/Tenete-8M), but not myself! |
|
|
| # Differences from Preview |
| * Now it's Llama-based, Preview was a custom model |
| * And of course, it's stable now (it doesn't generate gibberish / mess of words anymore)! |
|
|
| # Model Configurations |
| | Parameter | Value | |
| |---|---| |
| | Tokenizer | Llama 3's tokenizer (Tiktoken / BPE) | |
| | Vocabulary Size | 4096 tokens | |
| | Batch Size | 4 x 8 = 32 | |
| | Context Window | Maybe 2048 tokens | |
| | `hidden_size` | 192 | |
| | `intermediate_size` | 192 | |
| | `num_hidden_layers` | 6 | |
| | `num_attention_heads` | 6 | |
| | `max_position_embeddings` | 2048 | |
| | `rms_norm_eps` | `1e-5` | |
| | `initializer_range` | 0.02 | |
| | `use_cache` | True |
| | `tie_word_embeddings` | False |
| | `rope_theta` | 10000.0 |
|
|
| # Training Configurations |
| | Hyperparameter | Value | |
| |---|---| |
| | `output_dir` | "./cinnabarlm-v2" | |
| | `max_steps` | 10000 | |
| | `per_device_train_batch_size` | 8 | |
| | `gradient_accumulation_steps` | 4 | |
| | `learning_rate` | 6e-4 | |
| | `weight_decay` | 0.01 | |
| | `warmup_steps` | 500 | |
| | `lr_scheduler_type` | "cosine" | |
| | `logging_steps` | 100 | |
| | `save_steps` | 2000 | |
| | `fp16` | True | |
| | `save_total_limit` | 2 | |
| | `prediction_loss_only` | True | |
| | `logging_first_step` | True | |
|
|
| # Limitations |
| * **Not Instruction-Tuned:** It's only a base model, so it only completes text. |
| * **Python-Only:** It's trained on Python code (The Stack). |
| # Some other details |
| * It's trained on ~70 million tokens of [The Stack](https://huggingface.co/datasets/OLMo-Coding/starcoder-python-instruct) |
| * The name "CinnabarLM" that I picked was made by combining "Cinnabar" (the new block from the Chaos Cubed drop in Minecraft) + "LM" (Language Model) |