dudeperf3ct
/

codellm_pretrain

Model card Files Files and versions

dudeperf3ct commited on Jan 10

Commit

37df457

·

verified ·

1 Parent(s): 6b7afa9

Update README.md

Files changed (1) hide show

README.md +20 -3

README.md CHANGED Viewed

@@ -1,3 +1,20 @@
----
-license: mit
----

+---
+license: mit
+datasets:
+- tokyotech-llm/swallow-code-v2
+base_model:
+- meta-llama/Llama-3.2-1B
+- dudeperf3ct/codellm-tokenizer
+---
+Writeup: https://dudeperf3ct.github.io/projects/train_llm_part2/
+Repo: https://github.com/dudeperf3ct/minicode-llm/tree/main/codellm_pretrain/torch_titan
+This contains checkpoints every 5k steps for pretraining run 9.8B tokens using
+- Custom tokenizer: https://dudeperf3ct.github.io/projects/train_llm_part1/
+- Dataset: [`tokyotech-llm/swallow-code-v2`](https://huggingface.co/datasets/tokyotech-llm/swallow-code-v2)
+- Model Architecture: Llama 3.2 1B (1 billion parameter)
+The repository contains detailed step on how to run evaluation using PyTorch DCP checkpoints.