|
|
--- |
|
|
license: mit |
|
|
datasets: |
|
|
- tokyotech-llm/swallow-code-v2 |
|
|
base_model: |
|
|
- meta-llama/Llama-3.2-1B |
|
|
- dudeperf3ct/codellm-tokenizer |
|
|
--- |
|
|
|
|
|
Writeup: https://dudeperf3ct.github.io/projects/train_llm_part2/ |
|
|
|
|
|
Repo: https://github.com/dudeperf3ct/minicode-llm/tree/main/codellm_pretrain/torch_titan |
|
|
|
|
|
This contains checkpoints every 5k steps for pretraining run 9.8B tokens using |
|
|
|
|
|
- Custom tokenizer: https://dudeperf3ct.github.io/projects/train_llm_part1/ |
|
|
- Dataset: [`tokyotech-llm/swallow-code-v2`](https://huggingface.co/datasets/tokyotech-llm/swallow-code-v2) |
|
|
- Model Architecture: Llama 3.2 1B (1 billion parameter) |
|
|
|
|
|
The repository contains detailed step on how to run evaluation using PyTorch DCP checkpoints. |