codellm_pretrain / README.md
dudeperf3ct's picture
Update README.md
37df457 verified
metadata
license: mit
datasets:
  - tokyotech-llm/swallow-code-v2
base_model:
  - meta-llama/Llama-3.2-1B
  - dudeperf3ct/codellm-tokenizer

Writeup: https://dudeperf3ct.github.io/projects/train_llm_part2/

Repo: https://github.com/dudeperf3ct/minicode-llm/tree/main/codellm_pretrain/torch_titan

This contains checkpoints every 5k steps for pretraining run 9.8B tokens using

The repository contains detailed step on how to run evaluation using PyTorch DCP checkpoints.