Qwen3-4B-KRX-Finance

Qwen3-4B-KRX-Finance๋Š” ํ•œ๊ตญ์–ด ๊ธˆ์œต ๋„๋ฉ”์ธ ์งˆ์˜์‘๋‹ต ํ’ˆ์งˆ ํ–ฅ์ƒ์„ ๋ชฉํ‘œ๋กœ
Qwen3-4B-Instruct ๋ชจ๋ธ์„ ๊ธฐ๋ฐ˜์œผ๋กœ SFT(Supervised Fine-Tuning) ์ดํ›„
DPO(Direct Preference Optimization) ๋ฐฉ์‹์œผ๋กœ ์ถ”๊ฐ€ ์ •๋ ฌํ•œ PEFT ์–ด๋Œ‘ํ„ฐ(Adapter) ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.

๋ณธ ์ €์žฅ์†Œ์—๋Š” DPO ํ•™์Šต์ด ์™„๋ฃŒ๋œ ์–ด๋Œ‘ํ„ฐ๋งŒ ํฌํ•จ๋˜์–ด ์žˆ์œผ๋ฉฐ,
๋ฒ ์ด์Šค ๋ชจ๋ธ์€ ๋ณ„๋„๋กœ ๋กœ๋“œํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.


1. ๋ชจ๋ธ ๊ฐœ์š”

  • Base Model
    unsloth/Qwen3-4B-Instruct-2507

  • Fine-tuning Pipeline

    1. Supervised Fine-Tuning (SFT)
    2. Direct Preference Optimization (DPO)
  • Release Type
    PEFT Adapter only

  • Target Domain
    ํ•œ๊ตญ์–ด ๊ธˆ์œต ์งˆ์˜์‘๋‹ต (๊ธˆ๋ฆฌ, ์ฑ„๊ถŒ, ๊ฑฐ์‹œ๊ฒฝ์ œ, ๊ธˆ์œต ๊ฐœ๋… ์„ค๋ช… ์ค‘์‹ฌ)


2. ํ•™์Šต ๋ชฉ์ 

๋ณธ ๋ชจ๋ธ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๋ชฉ์ ์„ ๊ฐ€์ง€๊ณ  ํ•™์Šต๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

  • ํ•œ๊ตญ์–ด ๊ธˆ์œต ์งˆ๋ฌธ์— ๋Œ€ํ•œ ์„ค๋ช… ์ •ํ™•๋„ ๋ฐ ์ผ๊ด€์„ฑ ํ–ฅ์ƒ
  • ๋‹ต๋ณ€์„ ๋‹จ์ˆœ ๋‚˜์—ด์ด ์•„๋‹Œ ๋…ผ๋ฆฌ์ ์œผ๋กœ ๊ตฌ์กฐํ™”๋œ ํ˜•ํƒœ๋กœ ์ƒ์„ฑ
  • ๋™์ผ ์งˆ๋ฌธ์— ๋Œ€ํ•ด ์„ ํ˜ธ๋˜๋Š” ๋‹ต๋ณ€ ์Šคํƒ€์ผ๋กœ ์ •๋ ฌ
  • ๊ธˆ์œต ํ•™์Šตยท๋ฆฌ์„œ์น˜ยท์„ค๋ช…์šฉ์œผ๋กœ ํ™œ์šฉ ๊ฐ€๋Šฅํ•œ ์•ˆ์ •์ ์ธ ์–ธ์–ด ๋ชจ๋ธ ๊ตฌ์ถ•

3. SFT (Supervised Fine-Tuning)

3.1 SFT ํ•™์Šต ๊ฐœ์š”

SFT ๋‹จ๊ณ„์—์„œ๋Š” ๋ชจ๋ธ์ด
ํ•œ๊ตญ์–ด ๊ธˆ์œต ์งˆ์˜์‘๋‹ต์˜ ๊ธฐ๋ณธ์ ์ธ ํ˜•์‹๊ณผ ๋„๋ฉ”์ธ ์ง€์‹์„ ์Šต๋“ํ•˜๋„๋ก ํ•™์Šต๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

  • ๊ธˆ์œต ๊ฐœ๋… ์„ค๋ช… ์ค‘์‹ฌ QA
  • ํ•œ๊ตญ์–ด ์งˆ๋ฌธ์— ๋Œ€ํ•œ ์ง์ ‘์ ์ด๊ณ  ์„ค๋ช…์ ์ธ ์‘๋‹ต ํ•™์Šต

3.2 SFT ํ•™์Šต ๋ฐ์ดํ„ฐ

  • Dataset
    https://huggingface.co/datasets/KRX-Data/Won-Instruct

  • ๋ฐ์ดํ„ฐ ๋ถ„๋ฆฌ ๋ฐฉ์‹

    • test_size = 0.05
    • Train / Test ๋ถ„๋ฆฌ ํ›„ Train ๋ฐ์ดํ„ฐ๋งŒ SFT ํ•™์Šต์— ์‚ฌ์šฉ
  • ๋ฐ์ดํ„ฐ ํŠน์ง•

    • ํ•œ๊ตญ์–ด ๊ธˆ์œต ์งˆ์˜์‘๋‹ต ๋ฐ์ดํ„ฐ
    • ๊ธˆ๋ฆฌ, ์ฑ„๊ถŒ, ๊ฑฐ์‹œ๊ฒฝ์ œ, ๊ธˆ์œต ์ œ๋„ ๋ฐ ๊ฐœ๋… ์„ค๋ช… ์ค‘์‹ฌ
    • ์„ค๋ช…ํ˜• ์‘๋‹ต ์œ„์ฃผ ๊ตฌ์„ฑ

3.3 SFT ๊ฒฐ๊ณผ ์š”์•ฝ

SFT ์ดํ›„ ๋ชจ๋ธ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ํŠน์„ฑ์„ ๊ฐ–๊ฒŒ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

  • ํ•œ๊ตญ์–ด ๊ธˆ์œต ์งˆ๋ฌธ์— ๋Œ€ํ•ด ์•ˆ์ •์ ์ธ ๊ธฐ๋ณธ ์‘๋‹ต ์ƒ์„ฑ
  • ๊ธˆ์œต ๋„๋ฉ”์ธ ์šฉ์–ด ์ดํ•ด๋„ ํ–ฅ์ƒ
  • ๋‹ค๋งŒ,
    • ๋‹ต๋ณ€ ๊ธธ์ด ํŽธ์ฐจ
    • ์ค‘๋ณต ์„œ์ˆ 
    • ์„ ํ˜ธ๋˜์ง€ ์•Š๋Š” ์„œ์ˆ  ๋ฐฉ์‹์ด ์ผ๋ถ€ ์กด์žฌ

์ด๋ฅผ ๋ณด์™„ํ•˜๊ธฐ ์œ„ํ•ด DPO ํ•™์Šต์„ ์ง„ํ–‰ํ–ˆ์Šต๋‹ˆ๋‹ค.


4. DPO (Direct Preference Optimization)

4.1 DPO ํ•™์Šต ๋ชฉ์ 

DPO ๋‹จ๊ณ„์˜ ๋ชฉ์ ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

  • ๋™์ผ ์งˆ๋ฌธ์— ๋Œ€ํ•ด ๋” ๋‚˜์€ ๋‹ต๋ณ€(chosen) ์„ ์„ ํƒํ•˜๋„๋ก ์ •๋ ฌ
  • ์„ค๋ช…์˜ ๋ช…ํ™•์„ฑ, ๊ฐ„๊ฒฐ์„ฑ, ๊ตฌ์กฐ์  ์™„๊ฒฐ์„ฑ ๊ฐ•ํ™”
  • SFT ๋ชจ๋ธ ์ถœ๋ ฅ์˜ ํ’ˆ์งˆ ํŽธ์ฐจ ๊ฐ์†Œ

4.2 DPO ํ•™์Šต ๋ฐ์ดํ„ฐ

  • Base Dataset
    https://huggingface.co/datasets/aiqwe/FinShibainu

  • Prompt ๊ตฌ์„ฑ

    • ์ „์ฒด ๋ฐ์ดํ„ฐ ์ค‘ ๋ฌด์ž‘์œ„๋กœ 1,200๊ฐœ prompt ์ถ”์ถœ
    • ๊ธˆ์œต ๊ด€๋ จ ์งˆ๋ฌธ ์ค‘์‹ฌ
  • Chosen ์‘๋‹ต

    • gemini-3-flash-preview API๋ฅผ ํ™œ์šฉํ•˜์—ฌ ์ƒ์„ฑ
    • ์„ค๋ช…์˜ ์™„๊ฒฐ์„ฑ, ์ค‘๋ณต ์ œ๊ฑฐ, ๋…ผ๋ฆฌ์  ๊ตฌ์กฐ๋ฅผ ๊ธฐ์ค€์œผ๋กœ ์‚ฌ์šฉ
  • Rejected ์‘๋‹ต

    • Base Qwen3-4B-Instruct ๋ชจ๋ธ ์ถœ๋ ฅ ์‚ฌ์šฉ
    • ์ƒ๋Œ€์ ์œผ๋กœ ๊ตฌ์กฐ๊ฐ€ ๋œ ์ •์ œ๋œ ์‘๋‹ต

4.3 DPO ํ•™์Šต ์„ค์ • ์š”์•ฝ

  • Policy Model
    SFT ์™„๋ฃŒ ๋ชจ๋ธ

  • Reference Model
    Base Qwen3-4B-Instruct (๊ณ ์ •)

  • Beta
    0.2

  • Epoch
    2

  • Max Sequence Length
    2048

  • Optimization Method
    Direct Preference Optimization (DPO)


5. ์‚ฌ์šฉ ๋ฐฉ๋ฒ•

5.1 Base ๋ชจ๋ธ ๋กœ๋“œ

from unsloth import FastLanguageModel

base_model, tokenizer = FastLanguageModel.from_pretrained(
    "unsloth/Qwen3-4B-Instruct-2507",
    max_seq_length=2048,
    load_in_4bit=True,
)

from peft import PeftModel

model = PeftModel.from_pretrained(
    base_model,
    "mjun/Qwen3-4B-KRX-Finance"
)

model.eval()
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for mjun/Qwen3-4B-KRX-Finance

Finetuned
(329)
this model