Upload source/CLAUDE.md with huggingface_hub

#31
Files changed (1) hide show
  1. source/CLAUDE.md +104 -0
source/CLAUDE.md ADDED
@@ -0,0 +1,104 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # CLAUDE.md
2
+
3
+ This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4
+
5
+ ## μž‘μ—… 원칙 β€” νŒ€ν”Œλ ˆμ΄
6
+
7
+ **병렬 처리 κ°€λŠ₯ν•œ μž‘μ—…μ€ 항상 μ„œλΈŒ μ—μ΄μ „νŠΈλ‘œ λΆ„λ°°ν•œλ‹€.**
8
+
9
+ - λ³΅μž‘ν•œ μ½”λ“œ μž‘μ„± / 섀계 νŒλ‹¨ β†’ `model: sonnet`
10
+ - λΉ λ₯Έ 탐색 Β· 쑰회 Β· κ°„λ‹¨ν•œ 파일 μž‘μ„± β†’ `model: haiku`
11
+ - μ—μ΄μ „νŠΈ μ™„λ£Œ ν›„ κ²°κ³Ό 회수; ν•„μš” μ‹œ `resume` 으둜 재호좜
12
+ - 예: λͺ¨λΈ κ΅¬ν˜„(sonnet) + 데이터 슀크립트(sonnet) + μ„€μ • 파일(haiku) λ™μ‹œ μ‹€ν–‰
13
+
14
+ ---
15
+
16
+ ## ν”„λ‘œμ νŠΈ λͺ©μ 
17
+
18
+ μ†Œκ·œλͺ¨ LLM(Large Language Model) μ‹€ν—˜ ν”„λ‘œμ νŠΈ.
19
+ 8Γ— NVIDIA B200 GPU ν™˜κ²½μ—μ„œ LLM **μ‚¬μ „ν•™μŠ΅(pretraining)** λ˜λŠ” **νŒŒμΈνŠœλ‹(fine-tuning)** 을 직접 κ΅¬ν˜„ν•˜κ³  μ‹€ν—˜ν•œλ‹€.
20
+
21
+ ---
22
+
23
+ ## ν•˜λ“œμ›¨μ–΄ ν™˜κ²½
24
+
25
+ | ν•­λͺ© | 사양 |
26
+ |------|------|
27
+ | GPU | 8Γ— NVIDIA B200 (183 GB VRAM each, **~1.47 TB total**) |
28
+ | RAM | 2.2 TB |
29
+ | CUDA | 13.0 |
30
+ | Storage (μž‘μ—…) | `/PROJECT/0325120031_A/ghong/taketimes/` β†’ 3.5 TB, μ—¬μœ  2.2 TB |
31
+ | Storage (ν™ˆ) | `/home/ghong` β†’ 5 GB (μ†Œκ·œλͺ¨ μ½”λ“œλ§Œ μ €μž₯) |
32
+
33
+ **주의**: 체크포인트, 데이터셋 λ“± λŒ€μš©λŸ‰ νŒŒμΌμ€ λ°˜λ“œμ‹œ `/PROJECT/0325120031_A/ghong/taketimes/llm-bang/` ν•˜μœ„μ— μ €μž₯ν•  것. ν™ˆ 디렉토리(`/home/ghong`) μš©λŸ‰ 초과 주의.
34
+
35
+ ---
36
+
37
+ ## 사전 μ„€μΉ˜λœ 라이브러리
38
+
39
+ ```
40
+ torch 2.10.0a0+b4e4ee81d3.nv25.12 # NV μ»€μŠ€ν…€ λΉŒλ“œ (B200 μ΅œμ ν™”)
41
+ flash_attn 2.7.4.post1+25.12 # FlashAttention-2 μ‚¬μš© κ°€λŠ₯
42
+ datasets 4.4.1
43
+ tokenizers 0.22.1
44
+ huggingface_hub 1.2.3
45
+ ```
46
+
47
+ > **κ²½κ³ **: PyTorchλŠ” NVIDIA μ»€μŠ€ν…€ λΉŒλ“œ(`nv25.12`)κ°€ μ„€μΉ˜λ¨. `pip install torch` 둜 μž¬μ„€μΉ˜ν•˜λ©΄ B200 μ΅œμ ν™”κ°€ 깨질 수 있음 β€” PyTorch μž¬μ„€μΉ˜ κΈˆμ§€.
48
+
49
+ ## μΆ”κ°€ μ„€μΉ˜ ν•„μš” 라이브러리
50
+
51
+ ```bash
52
+ pip install transformers accelerate peft trl deepspeed bitsandbytes sentencepiece wandb
53
+ ```
54
+
55
+ ---
56
+
57
+ ## ꢌμž₯ ν”„λ‘œμ νŠΈ ꡬ쑰
58
+
59
+ ```
60
+ llm-bang/
61
+ β”œβ”€β”€ CLAUDE.md
62
+ β”œβ”€β”€ data/ # ν•™μŠ΅ 데이터 (원본 ν…μŠ€νŠΈ, μ „μ²˜λ¦¬ μ™„λ£Œλ³Έ)
63
+ β”œβ”€β”€ tokenizer/ # ν† ν¬λ‚˜μ΄μ € ν•™μŠ΅Β·μ €μž₯
64
+ β”œβ”€β”€ model/ # λͺ¨λΈ μ•„ν‚€ν…μ²˜ μ •μ˜ (nn.Module)
65
+ β”œβ”€β”€ train/ # ν•™μŠ΅ 슀크립트 (단일 GPU / DDP / FSDP)
66
+ β”œβ”€β”€ eval/ # 평가 슀크립트 (perplexity, downstream task)
67
+ β”œβ”€β”€ configs/ # YAML/JSON ν•™μŠ΅ μ„€μ • 파일
68
+ └── checkpoints/ # λͺ¨λΈ 체크포인트 (λŒ€μš©λŸ‰)
69
+ ```
70
+
71
+ ---
72
+
73
+ ## λ©€ν‹°-GPU ν•™μŠ΅ μ‹€ν–‰ νŒ¨ν„΄
74
+
75
+ ```bash
76
+ # torchrun (DDP) β€” 8 GPU
77
+ torchrun --nproc_per_node=8 train/pretrain.py --config configs/small_lm.yaml
78
+
79
+ # 단일 GPU ν…ŒμŠ€νŠΈ
80
+ python train/pretrain.py --config configs/small_lm.yaml --device cuda:0
81
+
82
+ # FSDP (λͺ¨λΈ 샀딩, λŒ€ν˜• λͺ¨λΈ)
83
+ torchrun --nproc_per_node=8 train/pretrain.py --config configs/large_lm.yaml --strategy fsdp
84
+ ```
85
+
86
+ ---
87
+
88
+ ## λͺ¨λΈ 규λͺ¨ κ°€μ΄λ“œ (ν•˜λ“œμ›¨μ–΄ κΈ°μ€€)
89
+
90
+ | λͺ¨λΈ 크기 | μΆ”μ²œ μ „λž΅ | μ΅œμ†Œ GPU 수 |
91
+ |-----------|-----------|------------|
92
+ | ~1B param | DDP, bf16 | 1 GPU |
93
+ | ~7B param | DDP λ˜λŠ” FSDP, bf16 | 2–4 GPU |
94
+ | ~13B param | FSDP, bf16/fp8 | 4 GPU |
95
+ | ~70B param | FSDP + ZeRO-3, bf16/fp8 | 8 GPU |
96
+
97
+ B200은 FP8 λ„€μ΄ν‹°λΈŒ 지원 β†’ ν•™μŠ΅ μ‹œ `torch.float8_e4m3fn` ν™œμš© κ°€λŠ₯.
98
+
99
+ ---
100
+
101
+ ## μ°Έκ³  (이전 ν”„λ‘œμ νŠΈ)
102
+
103
+ `/PROJECT/0325120031_A/ghong/taketimes/_deprecated/work/` β€” 2CRM λ‘κ»˜ μ‹€μΈ‘κ°’ 예츑(LightGBM, ClickHouse) ν”„λ‘œμ νŠΈ.
104
+ 도메인 데이터(곡μž₯ μ„Όμ„œ, 코일 κ·Έλ ˆμ΄λ“œ) ν•„μš” μ‹œ μ°Έκ³ .