--- license: mit datasets: - tokyotech-llm/swallow-code-v2 base_model: - meta-llama/Llama-3.2-1B - dudeperf3ct/codellm-tokenizer --- Writeup: https://dudeperf3ct.github.io/projects/train_llm_part2/ Repo: https://github.com/dudeperf3ct/minicode-llm/tree/main/codellm_pretrain/torch_titan This contains checkpoints every 5k steps for pretraining run 9.8B tokens using - Custom tokenizer: https://dudeperf3ct.github.io/projects/train_llm_part1/ - Dataset: [`tokyotech-llm/swallow-code-v2`](https://huggingface.co/datasets/tokyotech-llm/swallow-code-v2) - Model Architecture: Llama 3.2 1B (1 billion parameter) The repository contains detailed step on how to run evaluation using PyTorch DCP checkpoints.