Create README.md

4702bd2 verified 6 days ago

2.29 kB

	---
	license: apache-2.0
	datasets:
	- OLMo-Coding/starcoder-python-instruct
	language:
	- en
	pipeline_tag: text-generation
	tags:
	- tiny-model
	- cinnabarlm
	- python
	- code
	- tiny-llm
	- tiny-lm
	- tinylm
	- tinyllm
	---

	# CinnabarLM Python
	CinnabarLM Python is a tiny, 4M-parameter code LLM trained for ~38 minutes on a T4 GPU (on Colab)! It's only 16 MB in size and now it's Llama-based!

	# Why?
	Because it's a good idea to make tiny LLMs. Some people already did with [MicroLM](https://huggingface.co/CromIA/MicroLM-1M), [Spark 4 5M](https://huggingface.co/LH-Tech-AI/Spark-5M-Base-v4) and [Tenete 8M](https://huggingface.co/Harley-ml/Tenete-8M), but not myself!

	# Differences from Preview
	* Now it's Llama-based, Preview was a custom model
	* And of course, it's stable now (it doesn't generate gibberish / mess of words anymore)!

	# Model Configurations
	\| Parameter \| Value \|
	\|---\|---\|
	\| Tokenizer \| Llama 3's tokenizer (Tiktoken / BPE) \|
	\| Vocabulary Size \| 4096 tokens \|
	\| Batch Size \| 4 x 8 = 32 \|
	\| Context Window \| Maybe 2048 tokens \|
	\| `hidden_size` \| 192 \|
	\| `intermediate_size` \| 192 \|
	\| `num_hidden_layers` \| 6 \|
	\| `num_attention_heads` \| 6 \|
	\| `max_position_embeddings` \| 2048 \|
	\| `rms_norm_eps` \| `1e-5` \|
	\| `initializer_range` \| 0.02 \|
	\| `use_cache` \| True
	\| `tie_word_embeddings` \| False
	\| `rope_theta` \| 10000.0

	# Training Configurations
	\| Hyperparameter \| Value \|
	\|---\|---\|
	\| `output_dir` \| "./cinnabarlm-v2" \|
	\| `max_steps` \| 10000 \|
	\| `per_device_train_batch_size` \| 8 \|
	\| `gradient_accumulation_steps` \| 4 \|
	\| `learning_rate` \| 6e-4 \|
	\| `weight_decay` \| 0.01 \|
	\| `warmup_steps` \| 500 \|
	\| `lr_scheduler_type` \| "cosine" \|
	\| `logging_steps` \| 100 \|
	\| `save_steps` \| 2000 \|
	\| `fp16` \| True \|
	\| `save_total_limit` \| 2 \|
	\| `prediction_loss_only` \| True \|
	\| `logging_first_step` \| True \|

	# Limitations
	* Not Instruction-Tuned: It's only a base model, so it only completes text.
	* Python-Only: It's trained on Python code (The Stack).
	# Some other details
	* It's trained on ~70 million tokens of [The Stack](https://huggingface.co/datasets/OLMo-Coding/starcoder-python-instruct)
	* The name "CinnabarLM" that I picked was made by combining "Cinnabar" (the new block from the Chaos Cubed drop in Minecraft) + "LM" (Language Model)

	---
	license: apache-2.0
	datasets:
	- OLMo-Coding/starcoder-python-instruct
	language:
	- en
	pipeline_tag: text-generation
	tags:
	- tiny-model
	- cinnabarlm
	- python
	- code
	- tiny-llm
	- tiny-lm
	- tinylm
	- tinyllm
	---

	# CinnabarLM Python
	CinnabarLM Python is a tiny, 4M-parameter code LLM trained for ~38 minutes on a T4 GPU (on Colab)! It's only 16 MB in size and now it's Llama-based!

	# Why?
	Because it's a good idea to make tiny LLMs. Some people already did with [MicroLM](https://huggingface.co/CromIA/MicroLM-1M), [Spark 4 5M](https://huggingface.co/LH-Tech-AI/Spark-5M-Base-v4) and [Tenete 8M](https://huggingface.co/Harley-ml/Tenete-8M), but not myself!

	# Differences from Preview
	* Now it's Llama-based, Preview was a custom model
	* And of course, it's stable now (it doesn't generate gibberish / mess of words anymore)!

	# Model Configurations
	\| Parameter \| Value \|
	\|---\|---\|
	\| Tokenizer \| Llama 3's tokenizer (Tiktoken / BPE) \|
	\| Vocabulary Size \| 4096 tokens \|
	\| Batch Size \| 4 x 8 = 32 \|
	\| Context Window \| Maybe 2048 tokens \|
	\| `hidden_size` \| 192 \|
	\| `intermediate_size` \| 192 \|
	\| `num_hidden_layers` \| 6 \|
	\| `num_attention_heads` \| 6 \|
	\| `max_position_embeddings` \| 2048 \|
	\| `rms_norm_eps` \| `1e-5` \|
	\| `initializer_range` \| 0.02 \|
	\| `use_cache` \| True
	\| `tie_word_embeddings` \| False
	\| `rope_theta` \| 10000.0

	# Training Configurations
	\| Hyperparameter \| Value \|
	\|---\|---\|
	\| `output_dir` \| "./cinnabarlm-v2" \|
	\| `max_steps` \| 10000 \|
	\| `per_device_train_batch_size` \| 8 \|
	\| `gradient_accumulation_steps` \| 4 \|
	\| `learning_rate` \| 6e-4 \|
	\| `weight_decay` \| 0.01 \|
	\| `warmup_steps` \| 500 \|
	\| `lr_scheduler_type` \| "cosine" \|
	\| `logging_steps` \| 100 \|
	\| `save_steps` \| 2000 \|
	\| `fp16` \| True \|
	\| `save_total_limit` \| 2 \|
	\| `prediction_loss_only` \| True \|
	\| `logging_first_step` \| True \|

	# Limitations
	* Not Instruction-Tuned: It's only a base model, so it only completes text.
	* Python-Only: It's trained on Python code (The Stack).
	# Some other details
	* It's trained on ~70 million tokens of [The Stack](https://huggingface.co/datasets/OLMo-Coding/starcoder-python-instruct)
	* The name "CinnabarLM" that I picked was made by combining "Cinnabar" (the new block from the Chaos Cubed drop in Minecraft) + "LM" (Language Model)