frankenstallm / source /eval /eos_audit_report.md
pathcosmos's picture
Upload folder using huggingface_hub (#29)
5b1ff4d
|
raw
history blame
5.14 kB

EOS ํ† ํฐ ์ฒ˜๋ฆฌ ์ „์ˆ˜ ๊ฐ์‚ฌ ๋ณด๊ณ ์„œ

๋‚ ์งœ: 2026-02-26
๊ฐ์‚ฌ ๋Œ€์ƒ: /PROJECT/0325120031_A/ghong/taketimes/llm-bang/
๋ฌธ์ œ: SFT ๋ชจ๋ธ์ด "### ๋‹ต๋ณ€:" ์ดํ›„ "### ์งˆ๋ฌธ:"์„ ๋ฐ˜๋ณต (๋ฐ˜๋ณต๋ฅ  57%)


๊ฒฐ๋ก  ์š”์•ฝ

๐Ÿ”ด ๊ทผ๋ณธ ์›์ธ: ์ถ”๋ก  ์‹œ ํ”„๋กฌํ”„ํŠธ ํ…œํ”Œ๋ฆฟ ๋ถˆ์ผ์น˜ (EOS ๋ฒ„๊ทธ ์•„๋‹˜)

ํ•ญ๋ชฉ ํ•™์Šต ํ…œํ”Œ๋ฆฟ ์ถ”๋ก  ํ…œํ”Œ๋ฆฟ (test_generation_params.py)
์‚ฌ์šฉ์ž ํƒœ๊ทธ <|user|>\n{instruction}\n ### ์งˆ๋ฌธ: {instruction}\n
์–ด์‹œ์Šคํ„ดํŠธ ํƒœ๊ทธ <|assistant|>\n ### ๋‹ต๋ณ€:
์ข…๋ฃŒ ํ† ํฐ </s> (EOS, id=2) ์—†์Œ (stop_strings๋กœ ๋Œ€์ฒด ์‹œ๋„)

๋ชจ๋ธ์€ <|user|> / <|assistant|> ํฌ๋งท์œผ๋กœ ํ•™์Šต๋์œผ๋‚˜, ์ถ”๋ก  ์‹œ ### ์งˆ๋ฌธ: / ### ๋‹ต๋ณ€: ํฌ๋งท์œผ๋กœ ํ˜ธ์ถœ๋จ.
๋ชจ๋ธ ์ž…์žฅ์—์„œ ### ์งˆ๋ฌธ: ### ๋‹ต๋ณ€:์€ ์ผ๋ฐ˜ ํ…์ŠคํŠธ โ€” EOS๋ฅผ ์ถœ๋ ฅํ•  ์ด์œ ๊ฐ€ ์—†์œผ๋ฏ€๋กœ ๋ฌดํ•œ ๋ฐ˜๋ณต.


์ƒ์„ธ ๊ฐ์‚ฌ ๊ฒฐ๊ณผ

โœ… ์ฒดํฌํฌ์ธํŠธ 1: SFTDataset โ€” response ๋ EOS ํ† ํฐ ๋ถ€์ฐฉ

๊ฒฐ๊ณผ: ์ •์ƒ

sft_dataset.py Line ~52, ~87:

response = f"{output}{_EOS_STRING}"   # _EOS_STRING = "</s>"
response = f"{content}{_EOS_STRING}"  # conversation format๋„ ๋™์ผ

์‹ค์ œ ๊ฒ€์ฆ: response_ids[-1] == 2 (EOS) โœ“

โœ… ์ฒดํฌํฌ์ธํŠธ 2: EOS ํ† ํฐ label = ํ•™์Šต ๋Œ€์ƒ

๊ฒฐ๊ณผ: ์ •์ƒ

sft_dataset.py Line ~144-152:

resp_label_start = max(0, resp_start - 1)  # 1์นธ ์™ผ์ชฝ ์‹œํ”„ํŠธ (causal LM ๊ด€๋ก€)
resp_label_end = resp_label_start + len(response_ids)
labels[resp_label_start:resp_label_end] = response_ids
  • labels[resp_label_end - 1] = EOS (2) โ€” EOS๊ฐ€ ํ•™์Šต ๋Œ€์ƒ์— ํฌํ•จ๋จ โœ“
  • logits[๋งˆ์ง€๋ง‰ ์‘๋‹ต ํ† ํฐ ์œ„์น˜] โ†’ EOS ์˜ˆ์ธกํ•˜๋„๋ก ํ•™์Šต๋จ โœ“

โœ… ์ฒดํฌํฌ์ธํŠธ 3: prompt ๋ถ€๋ถ„ label = -1 (๋ฌด์‹œ)

๊ฒฐ๊ณผ: ์ •์ƒ

labels ์ดˆ๊ธฐ๊ฐ’์ด -1์ด๊ณ , response ์˜์—ญ๋งŒ ๋ฎ์–ด์“ฐ๋ฏ€๋กœ prompt ์ „์ฒด๋Š” -1 โœ“

โœ… ์ฒดํฌํฌ์ธํŠธ 4: ํŠธ๋ ์ผ€์ด์…˜์œผ๋กœ EOS ์†์‹ค

๊ฒฐ๊ณผ: ๋ฌด์‹œ ๊ฐ€๋Šฅ ์ˆ˜์ค€

  • ์ „์ฒด 159,125 ์ƒ˜ํ”Œ ์ค‘ 61๊ฐœ (0.04%)๋งŒ max_seq_len=4096 ์ดˆ๊ณผ
  • ์ด 61๊ฐœ์—์„œ๋งŒ EOS๊ฐ€ ์ž˜๋ฆด ์ˆ˜ ์žˆ์Œ โ€” ๋ฐ˜๋ณต๋ฅ  57%์™€ ๋ฌด๊ด€

โš ๏ธ ์ฒดํฌํฌ์ธํŠธ 5: ํ† ํฌ๋‚˜์ด์ € ํŠน์ˆ˜ ํ† ํฐ ๋ฏธ๋“ฑ๋ก

๊ฒฐ๊ณผ: ๊ฒฝ๋ฏธํ•œ ๋ฌธ์ œ

  • <|user|> โ†’ token_to_id() = None (ํŠน์ˆ˜ ํ† ํฐ ์•„๋‹˜, ์„œ๋ธŒ์›Œ๋“œ๋กœ ๋ถ„ํ• ๋จ)
  • <|assistant|> โ†’ None (๋™์ผ)
  • </s> โ†’ id=2 โœ“ (์ •์ƒ ๋“ฑ๋ก)

<|user|> / <|assistant|>๊ฐ€ ๋‹จ์ผ ํ† ํฐ์ด ์•„๋‹ˆ๋ผ ์„œ๋ธŒ์›Œ๋“œ ์กฐ๊ฐ์œผ๋กœ ๋ถ„ํ• ๋จ.
ํ•™์Šต/์ถ”๋ก  ๋ชจ๋‘ ๊ฐ™์€ ํ† ํฌ๋‚˜์ด์ €๋ฅผ ์“ฐ๋ฉด ๋™์ž‘์€ ํ•˜์ง€๋งŒ, ๋‹จ์ผ ํŠน์ˆ˜ ํ† ํฐ์œผ๋กœ ๋“ฑ๋กํ•˜๋Š” ๊ฒƒ์ด ๋” robust.

๐Ÿ”ด ์ฒดํฌํฌ์ธํŠธ 6: ์ถ”๋ก  ํ”„๋กฌํ”„ํŠธ ํฌ๋งท ๋ถˆ์ผ์น˜ (๊ทผ๋ณธ ์›์ธ)

eval/test_generation_params.py:

"### ์งˆ๋ฌธ: ํ•œ๊ตญ์˜ ์ˆ˜๋„๋Š” ์–ด๋””์ธ๊ฐ€์š”?\n### ๋‹ต๋ณ€:",

eval/comprehensive_eval.py:

"ํ•œ๊ตญ์˜ ์ˆ˜๋„๋Š”",  # ํ…œํ”Œ๋ฆฟ ์—†์ด raw text

ํ•™์Šต๋œ ํฌ๋งท:

<|user|>
ํ•œ๊ตญ์˜ ์ˆ˜๋„๋Š” ์–ด๋””์ธ๊ฐ€์š”?
<|assistant|>
์„œ์šธ์ž…๋‹ˆ๋‹ค.</s>

์ถ”๋ก  ์‹œ ์˜ฌ๋ฐ”๋ฅธ ํ”„๋กฌํ”„ํŠธ:

<|user|>
ํ•œ๊ตญ์˜ ์ˆ˜๋„๋Š” ์–ด๋””์ธ๊ฐ€์š”?
<|assistant|>

์ˆ˜์ • ์‚ฌํ•ญ

Fix 1: ์ถ”๋ก  ํ”„๋กฌํ”„ํŠธ ํ…œํ”Œ๋ฆฟ ์ˆ˜์ • (ํ•„์ˆ˜, ์žฌํ•™์Šต ๋ถˆํ•„์š”)

eval/test_generation_params.py์™€ eval/comprehensive_eval.py์—์„œ ํ”„๋กฌํ”„ํŠธ๋ฅผ SFT ํ•™์Šต ํ…œํ”Œ๋ฆฟ์— ๋งž๊ฒŒ ๋ณ€๊ฒฝ:

# Before (WRONG)
prompt = "### ์งˆ๋ฌธ: ํ•œ๊ตญ์˜ ์ˆ˜๋„๋Š” ์–ด๋””์ธ๊ฐ€์š”?\n### ๋‹ต๋ณ€:"

# After (CORRECT)
prompt = "<|user|>\nํ•œ๊ตญ์˜ ์ˆ˜๋„๋Š” ์–ด๋””์ธ๊ฐ€์š”?\n<|assistant|>\n"

Fix 2: ํŠธ๋ ์ผ€์ด์…˜ ์‹œ EOS ๋ณด์žฅ (๊ถŒ์žฅ, ์žฌํ•™์Šต ํ•„์š”)

sft_dataset.py์—์„œ truncation ํ›„ EOS๋ฅผ ๊ฐ•์ œ ์‚ฝ์ž…:

# ํ˜„์žฌ (truncation ์‹œ EOS ์†์‹ค ๊ฐ€๋Šฅ)
response_ids = response_ids[:allowed_response]

# ์ˆ˜์ •์•ˆ (truncation ํ›„ EOS ๊ฐ•์ œ)
response_ids = response_ids[:allowed_response]
if response_ids and response_ids[-1] != self.eos_token_id:
    response_ids[-1] = self.eos_token_id  # ๋งˆ์ง€๋ง‰ ํ† ํฐ์„ EOS๋กœ ๊ต์ฒด

Fix 3: <|user|> / <|assistant|> ํŠน์ˆ˜ ํ† ํฐ ๋“ฑ๋ก (์„ ํƒ, ์žฌํ•™์Šต ํ•„์š”)

ํ† ํฌ๋‚˜์ด์ €์— ํŠน์ˆ˜ ํ† ํฐ์œผ๋กœ ์ถ”๊ฐ€ํ•˜๋ฉด ๋‹จ์ผ ํ† ํฐ์œผ๋กœ ์ธ์ฝ”๋”ฉ๋˜์–ด ๋” ์•ˆ์ •์ :

tokenizer.add_special_tokens(["<|user|>", "<|assistant|>"])

์žฌํ•™์Šต ํ•„์š” ์—ฌ๋ถ€

์ˆ˜์ • ์žฌํ•™์Šต ํ•„์š” ํšจ๊ณผ
Fix 1: ์ถ”๋ก  ํ…œํ”Œ๋ฆฟ ์ˆ˜์ • โŒ ๋ฐ˜๋ณต ๋ฌธ์ œ ํ•ด๊ฒฐ ์˜ˆ์ƒ (๊ทผ๋ณธ ์›์ธ)
Fix 2: ํŠธ๋ ์ผ€์ด์…˜ EOS ๋ณด์žฅ โญ• (0.04%๋งŒ ํ•ด๋‹น) ๋ฏธ๋ฏธ
Fix 3: ํŠน์ˆ˜ ํ† ํฐ ๋“ฑ๋ก โญ• ์žฅ๊ธฐ์  ์•ˆ์ •์„ฑ ํ–ฅ์ƒ

์ฆ‰์‹œ ์กฐ์น˜: Fix 1๋งŒ์œผ๋กœ ๋ฐ˜๋ณต ๋ฌธ์ œ ํ•ด๊ฒฐ ๊ฐ€๋Šฅ. ์žฌํ•™์Šต ๋ถˆํ•„์š”.


๊ฒ€์ฆ ๋ฐฉ๋ฒ•

python eval/generate.py \
    --checkpoint checkpoints/korean_1b_sft \
    --prompt $'<|user|>\nํ•œ๊ตญ์˜ ์ˆ˜๋„๋Š” ์–ด๋””์ธ๊ฐ€์š”?\n<|assistant|>\n' \
    --max_new_tokens 200 \
    --temperature 0.7

๋ฐ˜๋ณต์ด ๋ฉˆ์ถ”๊ณ  </s> (EOS)์—์„œ ์ •์ƒ ์ข…๋ฃŒ๋˜๋ฉด Fix 1 ์„ฑ๊ณต.