File size: 705 Bytes
37df457
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
---
license: mit
datasets:
- tokyotech-llm/swallow-code-v2
base_model:
- meta-llama/Llama-3.2-1B
- dudeperf3ct/codellm-tokenizer
---

Writeup: https://dudeperf3ct.github.io/projects/train_llm_part2/

Repo: https://github.com/dudeperf3ct/minicode-llm/tree/main/codellm_pretrain/torch_titan

This contains checkpoints every 5k steps for pretraining run 9.8B tokens using

- Custom tokenizer: https://dudeperf3ct.github.io/projects/train_llm_part1/
- Dataset: [`tokyotech-llm/swallow-code-v2`](https://huggingface.co/datasets/tokyotech-llm/swallow-code-v2)
- Model Architecture: Llama 3.2 1B (1 billion parameter)

The repository contains detailed step on how to run evaluation using PyTorch DCP checkpoints.