dudeperf3ct
/

codellm_pretrain

Model card Files Files and versions

codellm_pretrain / README.md

dudeperf3ct's picture

Update README.md

37df457 verified 2 days ago

|

history blame contribute delete

705 Bytes

	---
	license: mit
	datasets:
	- tokyotech-llm/swallow-code-v2
	base_model:
	- meta-llama/Llama-3.2-1B
	- dudeperf3ct/codellm-tokenizer
	---

	Writeup: https://dudeperf3ct.github.io/projects/train_llm_part2/

	Repo: https://github.com/dudeperf3ct/minicode-llm/tree/main/codellm_pretrain/torch_titan

	This contains checkpoints every 5k steps for pretraining run 9.8B tokens using

	- Custom tokenizer: https://dudeperf3ct.github.io/projects/train_llm_part1/
	- Dataset: [`tokyotech-llm/swallow-code-v2`](https://huggingface.co/datasets/tokyotech-llm/swallow-code-v2)
	- Model Architecture: Llama 3.2 1B (1 billion parameter)

	The repository contains detailed step on how to run evaluation using PyTorch DCP checkpoints.