frankenstallm / source /CLAUDE.md
pathcosmos's picture
Upload source/CLAUDE.md with huggingface_hub (#31)
7bc922d
|
raw
history blame
3.53 kB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

μž‘μ—… 원칙 β€” νŒ€ν”Œλ ˆμ΄

병렬 처리 κ°€λŠ₯ν•œ μž‘μ—…μ€ 항상 μ„œλΈŒ μ—μ΄μ „νŠΈλ‘œ λΆ„λ°°ν•œλ‹€.

  • λ³΅μž‘ν•œ μ½”λ“œ μž‘μ„± / 섀계 νŒλ‹¨ β†’ model: sonnet
  • λΉ λ₯Έ 탐색 Β· 쑰회 Β· κ°„λ‹¨ν•œ 파일 μž‘μ„± β†’ model: haiku
  • μ—μ΄μ „νŠΈ μ™„λ£Œ ν›„ κ²°κ³Ό 회수; ν•„μš” μ‹œ resume 으둜 재호좜
  • 예: λͺ¨λΈ κ΅¬ν˜„(sonnet) + 데이터 슀크립트(sonnet) + μ„€μ • 파일(haiku) λ™μ‹œ μ‹€ν–‰

ν”„λ‘œμ νŠΈ λͺ©μ 

μ†Œκ·œλͺ¨ LLM(Large Language Model) μ‹€ν—˜ ν”„λ‘œμ νŠΈ. 8Γ— NVIDIA B200 GPU ν™˜κ²½μ—μ„œ LLM μ‚¬μ „ν•™μŠ΅(pretraining) λ˜λŠ” νŒŒμΈνŠœλ‹(fine-tuning) 을 직접 κ΅¬ν˜„ν•˜κ³  μ‹€ν—˜ν•œλ‹€.


ν•˜λ“œμ›¨μ–΄ ν™˜κ²½

ν•­λͺ© 사양
GPU 8Γ— NVIDIA B200 (183 GB VRAM each, ~1.47 TB total)
RAM 2.2 TB
CUDA 13.0
Storage (μž‘μ—…) /PROJECT/0325120031_A/ghong/taketimes/ β†’ 3.5 TB, μ—¬μœ  2.2 TB
Storage (ν™ˆ) /home/ghong β†’ 5 GB (μ†Œκ·œλͺ¨ μ½”λ“œλ§Œ μ €μž₯)

주의: 체크포인트, 데이터셋 λ“± λŒ€μš©λŸ‰ νŒŒμΌμ€ λ°˜λ“œμ‹œ /PROJECT/0325120031_A/ghong/taketimes/llm-bang/ ν•˜μœ„μ— μ €μž₯ν•  것. ν™ˆ 디렉토리(/home/ghong) μš©λŸ‰ 초과 주의.


사전 μ„€μΉ˜λœ 라이브러리

torch          2.10.0a0+b4e4ee81d3.nv25.12   # NV μ»€μŠ€ν…€ λΉŒλ“œ (B200 μ΅œμ ν™”)
flash_attn     2.7.4.post1+25.12              # FlashAttention-2 μ‚¬μš© κ°€λŠ₯
datasets       4.4.1
tokenizers     0.22.1
huggingface_hub 1.2.3

κ²½κ³ : PyTorchλŠ” NVIDIA μ»€μŠ€ν…€ λΉŒλ“œ(nv25.12)κ°€ μ„€μΉ˜λ¨. pip install torch 둜 μž¬μ„€μΉ˜ν•˜λ©΄ B200 μ΅œμ ν™”κ°€ 깨질 수 있음 β€” PyTorch μž¬μ„€μΉ˜ κΈˆμ§€.

μΆ”κ°€ μ„€μΉ˜ ν•„μš” 라이브러리

pip install transformers accelerate peft trl deepspeed bitsandbytes sentencepiece wandb

ꢌμž₯ ν”„λ‘œμ νŠΈ ꡬ쑰

llm-bang/
β”œβ”€β”€ CLAUDE.md
β”œβ”€β”€ data/               # ν•™μŠ΅ 데이터 (원본 ν…μŠ€νŠΈ, μ „μ²˜λ¦¬ μ™„λ£Œλ³Έ)
β”œβ”€β”€ tokenizer/          # ν† ν¬λ‚˜μ΄μ € ν•™μŠ΅Β·μ €μž₯
β”œβ”€β”€ model/              # λͺ¨λΈ μ•„ν‚€ν…μ²˜ μ •μ˜ (nn.Module)
β”œβ”€β”€ train/              # ν•™μŠ΅ 슀크립트 (단일 GPU / DDP / FSDP)
β”œβ”€β”€ eval/               # 평가 슀크립트 (perplexity, downstream task)
β”œβ”€β”€ configs/            # YAML/JSON ν•™μŠ΅ μ„€μ • 파일
└── checkpoints/        # λͺ¨λΈ 체크포인트 (λŒ€μš©λŸ‰)

λ©€ν‹°-GPU ν•™μŠ΅ μ‹€ν–‰ νŒ¨ν„΄

# torchrun (DDP) β€” 8 GPU
torchrun --nproc_per_node=8 train/pretrain.py --config configs/small_lm.yaml

# 단일 GPU ν…ŒμŠ€νŠΈ
python train/pretrain.py --config configs/small_lm.yaml --device cuda:0

# FSDP (λͺ¨λΈ 샀딩, λŒ€ν˜• λͺ¨λΈ)
torchrun --nproc_per_node=8 train/pretrain.py --config configs/large_lm.yaml --strategy fsdp

λͺ¨λΈ 규λͺ¨ κ°€μ΄λ“œ (ν•˜λ“œμ›¨μ–΄ κΈ°μ€€)

λͺ¨λΈ 크기 μΆ”μ²œ μ „λž΅ μ΅œμ†Œ GPU 수
~1B param DDP, bf16 1 GPU
~7B param DDP λ˜λŠ” FSDP, bf16 2–4 GPU
~13B param FSDP, bf16/fp8 4 GPU
~70B param FSDP + ZeRO-3, bf16/fp8 8 GPU

B200은 FP8 λ„€μ΄ν‹°λΈŒ 지원 β†’ ν•™μŠ΅ μ‹œ torch.float8_e4m3fn ν™œμš© κ°€λŠ₯.


μ°Έκ³  (이전 ν”„λ‘œμ νŠΈ)

/PROJECT/0325120031_A/ghong/taketimes/_deprecated/work/ β€” 2CRM λ‘κ»˜ μ‹€μΈ‘κ°’ 예츑(LightGBM, ClickHouse) ν”„λ‘œμ νŠΈ. 도메인 데이터(곡μž₯ μ„Όμ„œ, 코일 κ·Έλ ˆμ΄λ“œ) ν•„μš” μ‹œ μ°Έκ³ .