CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
μμ μμΉ β ννλ μ΄
λ³λ ¬ μ²λ¦¬ κ°λ₯ν μμ μ νμ μλΈ μμ΄μ νΈλ‘ λΆλ°°νλ€.
- 볡μ‘ν μ½λ μμ± / μ€κ³ νλ¨ β
model: sonnet - λΉ λ₯Έ νμ Β· μ‘°ν Β· κ°λ¨ν νμΌ μμ± β
model: haiku - μμ΄μ νΈ μλ£ ν κ²°κ³Ό νμ; νμ μ
resumeμΌλ‘ μ¬νΈμΆ - μ: λͺ¨λΈ ꡬν(sonnet) + λ°μ΄ν° μ€ν¬λ¦½νΈ(sonnet) + μ€μ νμΌ(haiku) λμ μ€ν
νλ‘μ νΈ λͺ©μ
μκ·λͺ¨ LLM(Large Language Model) μ€ν νλ‘μ νΈ. 8Γ NVIDIA B200 GPU νκ²½μμ LLM μ¬μ νμ΅(pretraining) λλ νμΈνλ(fine-tuning) μ μ§μ ꡬννκ³ μ€ννλ€.
νλμ¨μ΄ νκ²½
| νλͺ© | μ¬μ |
|---|---|
| GPU | 8Γ NVIDIA B200 (183 GB VRAM each, ~1.47 TB total) |
| RAM | 2.2 TB |
| CUDA | 13.0 |
| Storage (μμ ) | /PROJECT/0325120031_A/ghong/taketimes/ β 3.5 TB, μ¬μ 2.2 TB |
| Storage (ν) | /home/ghong β 5 GB (μκ·λͺ¨ μ½λλ§ μ μ₯) |
μ£Όμ: 체ν¬ν¬μΈνΈ, λ°μ΄ν°μ
λ± λμ©λ νμΌμ λ°λμ /PROJECT/0325120031_A/ghong/taketimes/llm-bang/ νμμ μ μ₯ν κ². ν λλ ν 리(/home/ghong) μ©λ μ΄κ³Ό μ£Όμ.
μ¬μ μ€μΉλ λΌμ΄λΈλ¬λ¦¬
torch 2.10.0a0+b4e4ee81d3.nv25.12 # NV 컀μ€ν
λΉλ (B200 μ΅μ ν)
flash_attn 2.7.4.post1+25.12 # FlashAttention-2 μ¬μ© κ°λ₯
datasets 4.4.1
tokenizers 0.22.1
huggingface_hub 1.2.3
κ²½κ³ : PyTorchλ NVIDIA 컀μ€ν λΉλ(
nv25.12)κ° μ€μΉλ¨.pip install torchλ‘ μ¬μ€μΉνλ©΄ B200 μ΅μ νκ° κΉ¨μ§ μ μμ β PyTorch μ¬μ€μΉ κΈμ§.
μΆκ° μ€μΉ νμ λΌμ΄λΈλ¬λ¦¬
pip install transformers accelerate peft trl deepspeed bitsandbytes sentencepiece wandb
κΆμ₯ νλ‘μ νΈ κ΅¬μ‘°
llm-bang/
βββ CLAUDE.md
βββ data/ # νμ΅ λ°μ΄ν° (μλ³Έ ν
μ€νΈ, μ μ²λ¦¬ μλ£λ³Έ)
βββ tokenizer/ # ν ν¬λμ΄μ νμ΅Β·μ μ₯
βββ model/ # λͺ¨λΈ μν€ν
μ² μ μ (nn.Module)
βββ train/ # νμ΅ μ€ν¬λ¦½νΈ (λ¨μΌ GPU / DDP / FSDP)
βββ eval/ # νκ° μ€ν¬λ¦½νΈ (perplexity, downstream task)
βββ configs/ # YAML/JSON νμ΅ μ€μ νμΌ
βββ checkpoints/ # λͺ¨λΈ 체ν¬ν¬μΈνΈ (λμ©λ)
λ©ν°-GPU νμ΅ μ€ν ν¨ν΄
# torchrun (DDP) β 8 GPU
torchrun --nproc_per_node=8 train/pretrain.py --config configs/small_lm.yaml
# λ¨μΌ GPU ν
μ€νΈ
python train/pretrain.py --config configs/small_lm.yaml --device cuda:0
# FSDP (λͺ¨λΈ μ€λ©, λν λͺ¨λΈ)
torchrun --nproc_per_node=8 train/pretrain.py --config configs/large_lm.yaml --strategy fsdp
λͺ¨λΈ κ·λͺ¨ κ°μ΄λ (νλμ¨μ΄ κΈ°μ€)
| λͺ¨λΈ ν¬κΈ° | μΆμ² μ λ΅ | μ΅μ GPU μ |
|---|---|---|
| ~1B param | DDP, bf16 | 1 GPU |
| ~7B param | DDP λλ FSDP, bf16 | 2β4 GPU |
| ~13B param | FSDP, bf16/fp8 | 4 GPU |
| ~70B param | FSDP + ZeRO-3, bf16/fp8 | 8 GPU |
B200μ FP8 λ€μ΄ν°λΈ μ§μ β νμ΅ μ torch.float8_e4m3fn νμ© κ°λ₯.
μ°Έκ³ (μ΄μ νλ‘μ νΈ)
/PROJECT/0325120031_A/ghong/taketimes/_deprecated/work/ β 2CRM λκ» μ€μΈ‘κ° μμΈ‘(LightGBM, ClickHouse) νλ‘μ νΈ.
λλ©μΈ λ°μ΄ν°(곡μ₯ μΌμ, μ½μΌ κ·Έλ μ΄λ) νμ μ μ°Έκ³ .