dudeperf3ct commited on
Commit
37df457
·
verified ·
1 Parent(s): 6b7afa9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +20 -3
README.md CHANGED
@@ -1,3 +1,20 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ datasets:
4
+ - tokyotech-llm/swallow-code-v2
5
+ base_model:
6
+ - meta-llama/Llama-3.2-1B
7
+ - dudeperf3ct/codellm-tokenizer
8
+ ---
9
+
10
+ Writeup: https://dudeperf3ct.github.io/projects/train_llm_part2/
11
+
12
+ Repo: https://github.com/dudeperf3ct/minicode-llm/tree/main/codellm_pretrain/torch_titan
13
+
14
+ This contains checkpoints every 5k steps for pretraining run 9.8B tokens using
15
+
16
+ - Custom tokenizer: https://dudeperf3ct.github.io/projects/train_llm_part1/
17
+ - Dataset: [`tokyotech-llm/swallow-code-v2`](https://huggingface.co/datasets/tokyotech-llm/swallow-code-v2)
18
+ - Model Architecture: Llama 3.2 1B (1 billion parameter)
19
+
20
+ The repository contains detailed step on how to run evaluation using PyTorch DCP checkpoints.