Upload source/CLAUDE.md with huggingface_hub
#31
by somebody-to-love - opened
- source/CLAUDE.md +104 -0
source/CLAUDE.md
ADDED
|
@@ -0,0 +1,104 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# CLAUDE.md
|
| 2 |
+
|
| 3 |
+
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
| 4 |
+
|
| 5 |
+
## μμ
μμΉ β ννλ μ΄
|
| 6 |
+
|
| 7 |
+
**λ³λ ¬ μ²λ¦¬ κ°λ₯ν μμ
μ νμ μλΈ μμ΄μ νΈλ‘ λΆλ°°νλ€.**
|
| 8 |
+
|
| 9 |
+
- 볡μ‘ν μ½λ μμ± / μ€κ³ νλ¨ β `model: sonnet`
|
| 10 |
+
- λΉ λ₯Έ νμ Β· μ‘°ν Β· κ°λ¨ν νμΌ μμ± β `model: haiku`
|
| 11 |
+
- μμ΄μ νΈ μλ£ ν κ²°κ³Ό νμ; νμ μ `resume` μΌλ‘ μ¬νΈμΆ
|
| 12 |
+
- μ: λͺ¨λΈ ꡬν(sonnet) + λ°μ΄ν° μ€ν¬λ¦½νΈ(sonnet) + μ€μ νμΌ(haiku) λμ μ€ν
|
| 13 |
+
|
| 14 |
+
---
|
| 15 |
+
|
| 16 |
+
## νλ‘μ νΈ λͺ©μ
|
| 17 |
+
|
| 18 |
+
μκ·λͺ¨ LLM(Large Language Model) μ€ν νλ‘μ νΈ.
|
| 19 |
+
8Γ NVIDIA B200 GPU νκ²½μμ LLM **μ¬μ νμ΅(pretraining)** λλ **νμΈνλ(fine-tuning)** μ μ§μ ꡬννκ³ μ€ννλ€.
|
| 20 |
+
|
| 21 |
+
---
|
| 22 |
+
|
| 23 |
+
## νλμ¨μ΄ νκ²½
|
| 24 |
+
|
| 25 |
+
| νλͺ© | μ¬μ |
|
| 26 |
+
|------|------|
|
| 27 |
+
| GPU | 8Γ NVIDIA B200 (183 GB VRAM each, **~1.47 TB total**) |
|
| 28 |
+
| RAM | 2.2 TB |
|
| 29 |
+
| CUDA | 13.0 |
|
| 30 |
+
| Storage (μμ
) | `/PROJECT/0325120031_A/ghong/taketimes/` β 3.5 TB, μ¬μ 2.2 TB |
|
| 31 |
+
| Storage (ν) | `/home/ghong` β 5 GB (μκ·λͺ¨ μ½λλ§ μ μ₯) |
|
| 32 |
+
|
| 33 |
+
**μ£Όμ**: 체ν¬ν¬μΈνΈ, λ°μ΄ν°μ
λ± λμ©λ νμΌμ λ°λμ `/PROJECT/0325120031_A/ghong/taketimes/llm-bang/` νμμ μ μ₯ν κ². ν λλ ν 리(`/home/ghong`) μ©λ μ΄κ³Ό μ£Όμ.
|
| 34 |
+
|
| 35 |
+
---
|
| 36 |
+
|
| 37 |
+
## μ¬μ μ€μΉλ λΌμ΄λΈλ¬λ¦¬
|
| 38 |
+
|
| 39 |
+
```
|
| 40 |
+
torch 2.10.0a0+b4e4ee81d3.nv25.12 # NV 컀μ€ν
λΉλ (B200 μ΅μ ν)
|
| 41 |
+
flash_attn 2.7.4.post1+25.12 # FlashAttention-2 μ¬μ© κ°λ₯
|
| 42 |
+
datasets 4.4.1
|
| 43 |
+
tokenizers 0.22.1
|
| 44 |
+
huggingface_hub 1.2.3
|
| 45 |
+
```
|
| 46 |
+
|
| 47 |
+
> **κ²½κ³ **: PyTorchλ NVIDIA 컀μ€ν
λΉλ(`nv25.12`)κ° μ€μΉλ¨. `pip install torch` λ‘ μ¬μ€μΉνλ©΄ B200 μ΅μ νκ° κΉ¨μ§ μ μμ β PyTorch μ¬μ€μΉ κΈμ§.
|
| 48 |
+
|
| 49 |
+
## μΆκ° μ€μΉ νμ λΌμ΄λΈλ¬λ¦¬
|
| 50 |
+
|
| 51 |
+
```bash
|
| 52 |
+
pip install transformers accelerate peft trl deepspeed bitsandbytes sentencepiece wandb
|
| 53 |
+
```
|
| 54 |
+
|
| 55 |
+
---
|
| 56 |
+
|
| 57 |
+
## κΆμ₯ νλ‘μ νΈ κ΅¬μ‘°
|
| 58 |
+
|
| 59 |
+
```
|
| 60 |
+
llm-bang/
|
| 61 |
+
βββ CLAUDE.md
|
| 62 |
+
βββ data/ # νμ΅ λ°μ΄ν° (μλ³Έ ν
μ€νΈ, μ μ²λ¦¬ μλ£λ³Έ)
|
| 63 |
+
βββ tokenizer/ # ν ν¬λμ΄μ νμ΅Β·μ μ₯
|
| 64 |
+
βββ model/ # λͺ¨λΈ μν€ν
μ² μ μ (nn.Module)
|
| 65 |
+
βββ train/ # νμ΅ μ€ν¬λ¦½νΈ (λ¨μΌ GPU / DDP / FSDP)
|
| 66 |
+
βββ eval/ # νκ° μ€ν¬λ¦½νΈ (perplexity, downstream task)
|
| 67 |
+
βββ configs/ # YAML/JSON νμ΅ μ€μ νμΌ
|
| 68 |
+
βββ checkpoints/ # λͺ¨λΈ 체ν¬ν¬μΈνΈ (λμ©λ)
|
| 69 |
+
```
|
| 70 |
+
|
| 71 |
+
---
|
| 72 |
+
|
| 73 |
+
## λ©ν°-GPU νμ΅ μ€ν ν¨ν΄
|
| 74 |
+
|
| 75 |
+
```bash
|
| 76 |
+
# torchrun (DDP) β 8 GPU
|
| 77 |
+
torchrun --nproc_per_node=8 train/pretrain.py --config configs/small_lm.yaml
|
| 78 |
+
|
| 79 |
+
# λ¨μΌ GPU ν
μ€νΈ
|
| 80 |
+
python train/pretrain.py --config configs/small_lm.yaml --device cuda:0
|
| 81 |
+
|
| 82 |
+
# FSDP (λͺ¨λΈ μ€λ©, λν λͺ¨λΈ)
|
| 83 |
+
torchrun --nproc_per_node=8 train/pretrain.py --config configs/large_lm.yaml --strategy fsdp
|
| 84 |
+
```
|
| 85 |
+
|
| 86 |
+
---
|
| 87 |
+
|
| 88 |
+
## λͺ¨λΈ κ·λͺ¨ κ°μ΄λ (νλμ¨μ΄ κΈ°μ€)
|
| 89 |
+
|
| 90 |
+
| λͺ¨λΈ ν¬κΈ° | μΆμ² μ λ΅ | μ΅μ GPU μ |
|
| 91 |
+
|-----------|-----------|------------|
|
| 92 |
+
| ~1B param | DDP, bf16 | 1 GPU |
|
| 93 |
+
| ~7B param | DDP λλ FSDP, bf16 | 2β4 GPU |
|
| 94 |
+
| ~13B param | FSDP, bf16/fp8 | 4 GPU |
|
| 95 |
+
| ~70B param | FSDP + ZeRO-3, bf16/fp8 | 8 GPU |
|
| 96 |
+
|
| 97 |
+
B200μ FP8 λ€μ΄ν°λΈ μ§μ β νμ΅ μ `torch.float8_e4m3fn` νμ© κ°λ₯.
|
| 98 |
+
|
| 99 |
+
---
|
| 100 |
+
|
| 101 |
+
## μ°Έκ³ (μ΄μ νλ‘μ νΈ)
|
| 102 |
+
|
| 103 |
+
`/PROJECT/0325120031_A/ghong/taketimes/_deprecated/work/` β 2CRM λκ» μ€μΈ‘κ° μμΈ‘(LightGBM, ClickHouse) νλ‘μ νΈ.
|
| 104 |
+
λλ©μΈ λ°μ΄ν°(곡μ₯ μΌμ, μ½μΌ κ·Έλ μ΄λ) νμ μ μ°Έκ³ .
|