pfox1995's picture
Add README.md (server.py + restart_server.sh + Korean README expansion)
722e573 verified
metadata
language:
  - ko
license: apache-2.0
library_name: peft
pipeline_tag: image-text-to-text
base_model: unsloth/Qwen3.5-9B
base_model_relation: adapter
datasets:
  - Himedia-AI-01/pest-detection-korean
tags:
  - lora
  - peft
  - vision
  - image-classification
  - vision-language
  - korean
  - pest-detection
  - agriculture
  - qwen
  - qwen3.5
  - unsloth
  - multimodal
inference: false
model-index:
  - name: pest-detector-deploy
    results:
      - task:
          type: image-classification
          name: Korean Pest Image Classification
        dataset:
          type: Himedia-AI-01/pest-detection-korean
          name: Korean Pest Detection (19-class)
        metrics:
          - type: accuracy
            value: 0.9136
            name: Accuracy (1595-sample validation)
          - type: f1
            value: 0.9032
            name: F1 (macro)
          - type: f1
            value: 0.9134
            name: F1 (weighted)
          - type: precision
            value: 0.9088
            name: Precision (macro)
          - type: recall
            value: 0.9101
            name: Recall (macro)

Pest Detector โ€” ํ•œ๊ตญ์–ด 19๋ถ„๋ฅ˜ ๋น„์ „-์–ธ์–ด ๋ถ„๋ฅ˜๊ธฐ

unsloth/Qwen3.5-9B ๊ธฐ๋ฐ˜ LoRA ์–ด๋Œ‘ํ„ฐ๋กœ, Himedia-AI-01/pest-detection-korean ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ ํ•™์Šตํ–ˆ์Šต๋‹ˆ๋‹ค. ์‚ฌ์ง„์„ ์ž…๋ ฅํ•˜๋ฉด 18์ข…์˜ ์ž‘๋ฌผ ํ•ด์ถฉ ๋˜๋Š” "์ •์ƒ"(ํ•ด์ถฉ ์—†์Œ) ์ค‘ ํ•˜๋‚˜๋ฅผ ํ•œ๊ตญ์–ด๋กœ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.

์ง€ํ‘œ ๊ฐ’ ์ถœ์ฒ˜
๊ฒ€์ฆ ์ •ํ™•๋„ (1595 ์ƒ˜ํ”Œ, FP16) 91.36 % ํ•™์Šต ์‹œ์  ํ‰๊ฐ€
57์ƒ˜ํ”Œ ๋ฒค์น˜ (FP16, ๋Ÿฐํƒ€์ž„ PEFT) 84.2 % ์•„๋ž˜ ์žฌํ˜„ ๊ฐ€๋Šฅํ•œ ๋ ˆ์‹œํ”ผ
57์ƒ˜ํ”Œ ๋ฒค์น˜ (bnb NF4, ๋Ÿฐํƒ€์ž„ PEFT) 80 % (10์ƒ˜ํ”Œ ํ”„๋กœ๋ธŒ ๊ธฐ์ค€) FP16๊ณผ ์‰ฌ์šด ํด๋ž˜์Šค์—์„œ ๋น„ํŠธ ๋‹จ์œ„ ๋™์ผ
VRAM (FP16 + LoRA) ์•ฝ 19.5 GB RTX A5000 / 4090
VRAM (bnb 4-bit + LoRA) ์•ฝ 8.7 GB RTX 3060 12GB / 4070

โš  ๋ฐฐํฌ ์ „์— ๋ฐ˜๋“œ์‹œ ์ฝ์–ด์•ผ ํ•  ๋‹จ ํ•œ ๊ฐ€์ง€

์ด LoRA๋Š” GGUF / llama.cpp / Ollama ๊ฒฝ๋กœ๋กœ ๋ฐฐํฌํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค. ๊ฒ‰๋ณด๊ธฐ์—๋Š” ๋™์ž‘ํ•  ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค โ€” convert_hf_to_gguf.py๋„ ์˜ค๋ฅ˜ ์—†์ด ์‹คํ–‰๋˜๊ณ , GGUF ํŒŒ์ผ๋„ ์ •์ƒ์ ์œผ๋กœ ๋กœ๋“œ๋˜๋ฉฐ, ์„œ๋ฒ„๋„ ์‹œ์ž‘๋ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ถœ๋ ฅ์€ <think>\n\n</think>\n\n?adgeadgeadge... (ํ† ํฐ ID 58659๋กœ์˜ ํ‡ดํ™” ์–ดํŠธ๋ž™ํ„ฐ) ๊ฐ€ ๋ฉ๋‹ˆ๋‹ค. F16, Q5_K_M, Q4_K_M, pre-permute ์ ์šฉ/๋ฏธ์ ์šฉ, linear_attn LoRA 0์œผ๋กœ ๋น„์šฐ๊ธฐ โ€” ๋ชจ๋“  ์กฐํ•ฉ์—์„œ ๊ฐ™์€ ์ฆ์ƒ์„ ํ™•์ธํ–ˆ์Šต๋‹ˆ๋‹ค.

์›์ธ (๊ทธ๋ƒฅ ๋™์ž‘๋งŒ ์‹œํ‚ค๊ณ  ์‹ถ๋‹ค๋ฉด ๊ฑด๋„ˆ๋›ฐ์…”๋„ ๋ฉ๋‹ˆ๋‹ค): ์ด ์–ด๋Œ‘ํ„ฐ๋Š” linear_attn.in_proj_qkv|in_proj_z|in_proj_a|in_proj_b|out_proj (Qwen3.5 ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ์•„ํ‚คํ…์ฒ˜ ๋‚ด Gated DeltaNet ํˆฌ์˜ โ€” ์ „์ฒด ๋ ˆ์ด์–ด์˜ 75 %)๋ฅผ ํ•™์Šต ๋Œ€์ƒ์œผ๋กœ ํ•ฉ๋‹ˆ๋‹ค. convert_hf_to_gguf.py:_reorder_v_heads ๋Š” ggml CUDA ์ปค๋„์ด ํšจ์œจ์ ์œผ๋กœ repeat-๋ธŒ๋กœ๋“œ์บ์ŠคํŠธํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•ด๋‹น ํ…์„œ๋“ค์˜ V-row ๋ ˆ์ด์•„์›ƒ์„ ์ˆœ์—ด ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค. ๋ฒ ์ด์Šค ๋ชจ๋ธ ์ž์ฒด๋Š” ํŠน์ • ์ดํ•ญ ์—ฐ์‚ฐ ํŒจํ„ด ํ•˜์—์„œ ์ด ์ˆœ์—ด์— ๋ถˆ๋ณ€์ด์ง€๋งŒ, LoRA ๋ธํƒ€๋Š” ๊ทธ๋ ‡์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ์ˆœ์—ด ํ›„ ๋ธํƒ€๊ฐ€ ํ—ค๋“œ ์œ„์น˜๋ฅผ ์–ด๊ธ‹๋‚œ ์ฑ„ ์ ์šฉ๋˜์–ด ํ† ํฐ ๋ถ•๊ดด๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค. merge โ†’ GGUF ๊ฒฝ๋กœ๋Š” ๋ชจ๋‘ ์ด ๊ตฌ์กฐ์  ๊ฒฐํ•จ์„ ๊ทธ๋Œ€๋กœ ๋ฌผ๋ ค๋ฐ›์Šต๋‹ˆ๋‹ค.

์ œ๋Œ€๋กœ ๋™์ž‘ํ•˜๋Š” ๋ฐฐํฌ ๊ฒฝ๋กœ๋Š” unsloth.FastVisionModel + peft.PeftModel.from_pretrained (๋Ÿฐํƒ€์ž„ LoRA, ๋ณ‘ํ•ฉ ์—†์Œ, GGUF ๋ณ€ํ™˜ ์—†์Œ) ์ž…๋‹ˆ๋‹ค. ์•„๋ž˜์˜ ๋ ˆ์‹œํ”ผ๊ฐ€ ํ‘œ์ค€ ์„ค์ •์ž…๋‹ˆ๋‹ค.


๋น ๋ฅธ ์‹œ์ž‘ (FP16, ์•ฝ 20 GB VRAM)

from unsloth import FastVisionModel
from peft import PeftModel
from PIL import Image

# 1. Unsloth๋กœ ๋ฒ ์ด์Šค ๋กœ๋“œ (transformers.AutoModelForImageTextToText ๊ฐ€ ์•„๋‹™๋‹ˆ๋‹ค โ€”
#    ํ•™์Šต ์‹œ์ ์˜ linear_attn forward ๊ฒฝ๋กœ์™€ ์ผ์น˜์‹œํ‚ค๋ ค๋ฉด Unsloth์˜ monkey-patch๊ฐ€
#    ๋ฐ˜๋“œ์‹œ ์ ์šฉ๋˜์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค).
model, tokenizer = FastVisionModel.from_pretrained(
    "unsloth/Qwen3.5-9B",
    load_in_4bit=False,  # 4-bit ์‚ฌ์šฉ ์‹œ ์ •ํ™•๋„ ์†์‹ค ์—†์ด ์•ฝ 8.7 GB VRAM
)

# 2. LoRA๋ฅผ ๋Ÿฐํƒ€์ž„ ํ›…์œผ๋กœ ๋ถ€์ฐฉ. model.merge_and_unload() ๋Š” ํ˜ธ์ถœํ•˜๋ฉด ์•ˆ ๋ฉ๋‹ˆ๋‹ค.
#    ๋ณ‘ํ•ฉ ์‹œ linear_attn ๋ธํƒ€๊ฐ€ ์กฐ์šฉํžˆ ์†์ƒ๋˜์–ด ์ถœ๋ ฅ์ด "adgeadge"๋กœ ๋–จ์–ด์ง‘๋‹ˆ๋‹ค.
model = PeftModel.from_pretrained(model, "pfox1995/pest-detector-final")

# 3. ๋งค์šฐ ์ค‘์š” โ€” ๋‚ด๋ถ€ ๋ชจ๋“œ๋ฅผ ์ถ”๋ก ์šฉ์œผ๋กœ ์ „ํ™˜ํ•ฉ๋‹ˆ๋‹ค. ์ด ํ˜ธ์ถœ์ด ์—†์œผ๋ฉด
#    ๋‹ค๋ฅธ ๋ชจ๋“  ์„ค์ •์ด ์˜ณ๋”๋ผ๋„ ์ถœ๋ ฅ์ด ๋ง๊ฐ€์ง‘๋‹ˆ๋‹ค.
FastVisionModel.for_inference(model)
model.eval()

# 4. ์ถ”๋ก 
image = Image.open("pest.jpg").convert("RGB")
image = letterbox(image, 512)  # ํšŒ์ƒ‰ ํŒจ๋”ฉ letterbox 512ร—512 โ€” ์•„๋ž˜ ์ฐธ์กฐ

messages = [
    {"role": "system", "content": [{"type": "text", "text": SYSTEM_MSG}]},
    {"role": "user", "content": [
        {"type": "image", "image": image},
        {"type": "text",  "text": "์ด ์‚ฌ์ง„์— ์žˆ๋Š” ํ•ด์ถฉ์˜ ์ด๋ฆ„์„ ์•Œ๋ ค์ฃผ์„ธ์š”."},
    ]},
]
text = tokenizer.apply_chat_template(messages, add_generation_prompt=True)
inputs = tokenizer(image, text, add_special_tokens=False, return_tensors="pt").to("cuda")

with __import__("torch").inference_mode():
    out = model.generate(
        **inputs,
        max_new_tokens=10,             # 16 ์ด์ƒ์ด๋ฉด ์•ˆ ๋จ โ€” ๋ชจ๋ธ์€ "<ํด๋ž˜์Šค>\n"์„ ์ถœ๋ ฅํ•˜๋ฏ€๋กœ ๋” ๊ธธ๊ฒŒ ๊ฐ•์ œํ•˜๋ฉด ์“ฐ๋ ˆ๊ธฐ
        use_cache=True,
        stop_strings=["\n"],           # ๋ชจ๋ธ์˜ ์ž์—ฐ์Šค๋Ÿฌ์šด ์ •์ง€ ์‹ ํ˜ธ
        tokenizer=tokenizer.tokenizer, # stop_strings๋ฅผ ์œ„ํ•ด ๋ฐ˜๋“œ์‹œ ํ•„์š”
    )

prediction = tokenizer.decode(
    out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True
).strip()
print(prediction)  # PEST_CLASSES ์ค‘ ํ•˜๋‚˜ (ํ•œ๊ตญ์–ด)

์ „์ฒด ์‹คํ–‰ ๊ฐ€๋Šฅํ•œ ์Šคํฌ๋ฆฝํŠธ(์ด๋ฏธ์ง€ ์ „์ฒ˜๋ฆฌ, ๋ฐฐ์น˜ ํ‰๊ฐ€, ๋ฉ”ํŠธ๋ฆญ ํฌํ•จ)๋Š” inference.py ๋ฅผ ์ฐธ๊ณ ํ•˜์„ธ์š”.


HTTP ์„œ๋ฒ„๋กœ ์„œ๋น™ํ•˜๊ธฐ (server.py)

์œ„์˜ ์ถ”๋ก  ๋ ˆ์‹œํ”ผ๋ฅผ ๊ทธ๋Œ€๋กœ ๊ฐ์‹ผ FastAPI ์„œ๋ฒ„์ž…๋‹ˆ๋‹ค. ๊ฒ€์ฆ๋œ ๋ชจ๋“  ์•ˆ์ „์žฅ์น˜(for_inference, enable_thinking=False, stop_strings=["\n"], letterbox 512, max_new_tokens=10)๊ฐ€ ๋‚ด์žฅ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

์—”๋“œํฌ์ธํŠธ:

๋ฉ”์„œ๋“œ ๊ฒฝ๋กœ ์šฉ๋„
GET /health {"status":"ok","model_loaded":true}
GET /classes 19๊ฐœ ํด๋ž˜์Šค ๋ชฉ๋ก
GET / ๋ธŒ๋ผ์šฐ์ €์šฉ ์—…๋กœ๋“œ ํŽ˜์ด์ง€ (ํ•œ๊ตญ์–ด UI)
POST /classify multipart ํŒŒ์ผ ์—…๋กœ๋“œ
POST /classify_b64 JSON {"image":"<base64>"}

์‹œ์ž‘:

pip install fastapi uvicorn python-multipart
HF_TOKEN=... ADAPTER=pfox1995/pest-detector-deploy LOAD_IN_4BIT=true PORT=8080 \
  python3 server.py

ํด๋ผ์ด์–ธํŠธ ์‚ฌ์šฉ ์˜ˆ:

curl -F file=@pest.jpg http://localhost:8080/classify
# โ†’ {"pred":"๊ฒ€๊ฑฐ์„ธ๋ฏธ๋ฐค๋‚˜๋ฐฉ","raw":"๊ฒ€๊ฑฐ์„ธ๋ฏธ๋ฐค๋‚˜๋ฐฉ","elapsed_s":2.3}
import requests
r = requests.post(
    "http://localhost:8080/classify",
    files={"file": open("pest.jpg", "rb")},
    timeout=60,
)
print(r.json()["pred"])

RunPod ํ•œ ๋ฒˆ์— ๋„์šฐ๊ธฐ (restart_server.sh)

RunPod ์ปจํ…Œ์ด๋„ˆ๊ฐ€ ์žฌ์‹œ์ž‘๋˜๋ฉด ์ปจํ…Œ์ด๋„ˆ ๋””์Šคํฌ๊ฐ€ ์ดˆ๊ธฐํ™”๋˜์–ด pip ํŒจํ‚ค์ง€๊ฐ€ ์‚ฌ๋ผ์ง‘๋‹ˆ๋‹ค (/workspace ๋ณผ๋ฅจ์€ ์œ ์ง€๋จ). ์ด ์ƒํ™ฉ์„ ํ•œ ๋ฒˆ์— ๋ณต๊ตฌํ•ด์ฃผ๋Š” ์Šคํฌ๋ฆฝํŠธ:

# /workspace ์— ์ด ์ €์žฅ์†Œ๋ฅผ ๋ฐ›์•„๋‘์—ˆ๋‹ค๊ณ  ๊ฐ€์ •
bash /workspace/restart_server.sh

์Šคํฌ๋ฆฝํŠธ๊ฐ€ ์ž๋™์œผ๋กœ ์ฒ˜๋ฆฌํ•˜๋Š” ๋‹จ๊ณ„:

  1. ์˜์กด์„ฑ ์Šคํƒ ์ ๊ฒ€ (unsloth, peft, fastapi, bitsandbytes, flash-linear-attention) โ†’ ์—†์œผ๋ฉด ์„ค์น˜
  2. causal_conv1d ์ ๊ฒ€ โ†’ ์—†์œผ๋ฉด ์‚ฌ์ „ ๋นŒ๋“œ๋œ wheel ์„ค์น˜ (์†Œ์Šค ๋นŒ๋“œ๋Š” 9๊ฐœ GPU ์•„ํ‚คํ…์ฒ˜ ์ปดํŒŒ์ผ ๋•Œ๋ฌธ์— ๋ฉ”๋ชจ๋ฆฌ ๋ถ€์กฑ์œผ๋กœ ์‹คํŒจํ•จ)
  3. ๊ธฐ์กด pest tmux ์„ธ์…˜ ์ข…๋ฃŒ ํ›„ ์ƒˆ๋กœ ์‹œ์ž‘
  4. /health ๊ฐ€ 200์„ ๋ฐ˜ํ™˜ํ•  ๋•Œ๊นŒ์ง€ ๋Œ€๊ธฐ (์•ฝ 90~100์ดˆ)
  5. ๊ณต๊ฐœ ํ”„๋ก์‹œ URL ์ถœ๋ ฅ โ€” RunPod์˜ $RUNPOD_POD_ID ํ™˜๊ฒฝ๋ณ€์ˆ˜์—์„œ ์ž๋™์œผ๋กœ ์ถ”๋ก 

ํ™˜๊ฒฝ๋ณ€์ˆ˜๋กœ ๋™์ž‘ ๋ณ€๊ฒฝ ๊ฐ€๋Šฅ:

LOAD_IN_4BIT=false   bash /workspace/restart_server.sh    # FP16, ์•ฝ 19.5 GB VRAM, ์ถ”๋ก  ์†๋„ ์•ฝ 2๋ฐฐ
PORT=9000            bash /workspace/restart_server.sh    # ๋‹ค๋ฅธ ํฌํŠธ๋กœ
ADAPTER=...          bash /workspace/restart_server.sh    # ๋‹ค๋ฅธ ์–ด๋Œ‘ํ„ฐ๋กœ
PUBLIC_URL=...       bash /workspace/restart_server.sh    # ์ž๋™ ๊ฐ์ง€ ๊ฒฐ๊ณผ ๋ฎ์–ด์“ฐ๊ธฐ

RunPod ์—์„œ 8080 ํฌํŠธ๋ฅผ ์™ธ๋ถ€์— ๋…ธ์ถœํ•˜๋ ค๋ฉด:

  1. RunPod ๋Œ€์‹œ๋ณด๋“œ โ†’ ํ•ด๋‹น Pod ์„ ํƒ โ†’ Edit Pod โ†’ HTTP Ports ์—์„œ 8080 ์ถ”๊ฐ€
  2. ์ €์žฅ โ†’ ์ž๋™์œผ๋กœ https://<POD_ID>-8080.proxy.runpod.net/ ๊ฐ€ ํ™œ์„ฑํ™”๋จ
  3. ์ฒซ ๋ฒˆ์งธ ์š”์ฒญ์€ ์•ฝ 12์ดˆ (Triton JIT ์ปดํŒŒ์ผ), ์ดํ›„๋Š” ์ •์ƒ ์ƒํƒœ ์•ฝ 2~3์ดˆ/์ด๋ฏธ์ง€

โš  Edit Pod ๋Š” ๋‚ด๋ถ€์ ์œผ๋กœ ์ปจํ…Œ์ด๋„ˆ๋ฅผ ์žฌ์‹œ์ž‘ํ•ฉ๋‹ˆ๋‹ค. ์ปจํ…Œ์ด๋„ˆ ๋””์Šคํฌ๊ฐ€ ์ดˆ๊ธฐํ™”๋˜๋ฏ€๋กœ ๋‹ค์‹œ restart_server.sh ๋ฅผ ์‹คํ–‰ํ•ด ์ฃผ์„ธ์š”. ๋ณผ๋ฅจ(/workspace) ์˜ ํŒŒ์ผ๊ณผ ๋ชจ๋ธ ์บ์‹œ๋Š” ์œ ์ง€๋ฉ๋‹ˆ๋‹ค. SSH ์˜ publicPort ๋„ ๋ฐ”๋€Œ๋ฏ€๋กœ GraphQL ๋กœ ์ƒˆ ํฌํŠธ๋ฅผ ๋‹ค์‹œ ์กฐํšŒํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.


์ถœ๋ ฅ์„ ๋ง๊ฐ€๋œจ๋ฆฌ๋Š” 9๊ฐ€์ง€ ํ•จ์ • (๊ฐ๊ฐ ์ง์ ‘ ๋ถ€๋”ชํžŒ ๊ฒƒ๋“ค)

์ถœ๋ ฅ์ด ์ด์ƒํ•˜๋ฉด ์•„๋ž˜ ํ•ญ๋ชฉ์„ ์ˆœ์„œ๋Œ€๋กœ ์ ๊ฒ€ํ•˜์„ธ์š”. ๋ชจ๋“  ํ•ญ๋ชฉ์ด ์‹ค์ œ๋กœ adgeadge ๋˜๋Š” <think>...assistant<think>... ๋ฐ˜๋ณต์„ ์ผ์œผํ‚จ ์‚ฌ๋ก€์ž…๋‹ˆ๋‹ค.

1. ์ž˜๋ชป๋œ ๋กœ๋”

unsloth.FastVisionModel.from_pretrained ๋ฅผ ์‚ฌ์šฉํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. transformers.AutoModelForImageTextToText.from_pretrained ๋Š” ์•ˆ ๋ฉ๋‹ˆ๋‹ค. ๊ฐ™์€ ๊ฐ€์ค‘์น˜๋ฅผ ๋กœ๋“œํ•˜์ง€๋งŒ ํ›„์ž๋Š” ํ•™์Šต ์‹œ์ ์— ์‚ฌ์šฉ๋œ Unsloth์˜ Gated DeltaNet ํŒจ์น˜๋ฅผ ์ ์šฉํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

2. LoRA ๋ณ‘ํ•ฉ

model.merge_and_unload() ๋˜๋Š” model.save_pretrained_merged(...) ๋ฅผ ํ˜ธ์ถœํ•˜์ง€ ๋งˆ์„ธ์š”. PEFT์˜ ๋ณ‘ํ•ฉ์€ ์ด ์•„ํ‚คํ…์ฒ˜์˜ linear_attn ๋ชจ๋“ˆ์— ๋Œ€ํ•ด ์กฐ์šฉํžˆ ์ž˜๋ชป๋œ ๊ฐ€์ค‘์น˜๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. PeftModel.from_pretrained ๋ฅผ ํ†ตํ•ด LoRA๋ฅผ ๋Ÿฐํƒ€์ž„ ํ›…์œผ๋กœ ์œ ์ง€ํ•˜์„ธ์š”.

3. FastVisionModel.for_inference(model) ๋ˆ„๋ฝ

์ด ํ˜ธ์ถœ์ด Unsloth์˜ ๋‚ด๋ถ€ ์บ์‹œ/์ถ”๋ก  ๋ชจ๋“œ๋ฅผ ์ „ํ™˜ํ•ฉ๋‹ˆ๋‹ค. ๋น ๋œจ๋ฆฌ๋ฉด ์ฒซ ํ† ํฐ์€ ์ •์ƒ์ด๋‹ค๊ฐ€ ์ด์–ด์ง€๋Š” ํ† ํฐ์ด adge ์–ดํŠธ๋ž™ํ„ฐ๋กœ ๋น ์งˆ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

4. max_new_tokens ๊ฐ€ ๋„ˆ๋ฌด ํผ

๋ชจ๋ธ์€ <ํด๋ž˜์Šค>\n (์˜ˆ: ๊ฒ€๊ฑฐ์„ธ๋ฏธ๋ฐค๋‚˜๋ฐฉ\n) ์„ ์ถœ๋ ฅํ•˜๊ณ  ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ๋ฉˆ์ถฅ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ 16 ํ† ํฐ ์ด์ƒ์„ ์š”๊ตฌํ•˜๋ฉด \n ์„ ์ง€๋‚˜์„œ ๊ณ„์† ์ƒ์„ฑํ•˜๋‹ค adge ์–ดํŠธ๋ž™ํ„ฐ์— ๋น ์ง‘๋‹ˆ๋‹ค. ํด๋ž˜์Šค๋ช… ์ถœ๋ ฅ์—๋Š” max_new_tokens=10 ์„ ์‚ฌ์šฉํ•˜์„ธ์š”. min_new_tokens ๋Š” ์„ค์ •ํ•˜์ง€ ๋งˆ์„ธ์š”(EOS ์ดํ›„๋กœ ๊ฐ•์ œ ์ƒ์„ฑํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค).

5. stop_strings=["\n"] ๋ˆ„๋ฝ

max_new_tokens=10 ์ด๋ผ๋„ ๋ชจ๋ธ์€ <ํด๋ž˜์Šค>\nassistant\n<think> ๋ฅผ ์ถœ๋ ฅํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. model.generate ์— stop_strings=["\n"] ์™€ ํ•จ๊ป˜ tokenizer=tokenizer.tokenizer ๋„ ์ „๋‹ฌํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. tokenizer= ์ธ์ž๊ฐ€ ์—†์œผ๋ฉด stop_strings ๊ฐ€ ๋™์ž‘ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

6. enable_thinking ํ˜ธ์ถœ ๋ฐฉ์‹ ์˜ค๋ฅ˜

์ฑ— ํ…œํ”Œ๋ฆฟ์ด enable_thinking ๋ณ€์ˆ˜์— ๋”ฐ๋ผ ๋ถ„๊ธฐ๋ฉ๋‹ˆ๋‹ค. ํ•™์Šต์€ thinking ๋น„ํ™œ์„ฑ ๋ชจ๋“œ๋กœ ์ง„ํ–‰๋์Šต๋‹ˆ๋‹ค. tokenizer.apply_chat_template(...) ์— enable_thinking=False ๋ฅผ ์ง์ ‘ ํ‚ค์›Œ๋“œ ์ธ์ž๋กœ ์ „๋‹ฌํ•˜์„ธ์š”. chat_template_kwargs={"enable_thinking": False} ๋กœ ๊ฐ์‹ธ๋ฉด transformers โ‰ฅ 5.0 ์˜ VLM ํ”„๋กœ์„ธ์„œ์—์„œ ์กฐ์šฉํžˆ ๋ฌด์‹œ๋ฉ๋‹ˆ๋‹ค.

7. ์ž˜๋ชป๋œ ์‹œ์Šคํ…œ ํ”„๋กฌํ”„ํŠธ

ํ•™์Šต ์‹œ ์‚ฌ์šฉํ•œ ์‹œ์Šคํ…œ ๋ฉ”์‹œ์ง€๋ฅผ ์ •ํ™•ํžˆ ๊ทธ๋Œ€๋กœ ์‚ฌ์šฉํ•˜์„ธ์š”(์•„๋ž˜ SYSTEM_MSG ์ฐธ์กฐ). 19๊ฐœ ํด๋ž˜์Šค๋ฅผ ๋‚˜์—ดํ•œ ๋” ๊ธด ๋ฒ„์ „์„ ์‹œ๋„ํ•ด ๋ดค๋Š”๋ฐ ์ถœ๋ ฅ์— ํŽธํ–ฅ์ด ์ƒ๊ฒผ์Šต๋‹ˆ๋‹ค.

8. ์ž˜๋ชป๋œ ์ด๋ฏธ์ง€ ํฌ๊ธฐ

๋ชจ๋ธ์€ ํšŒ์ƒ‰(RGB 128, 128, 128) ํŒจ๋”ฉ์œผ๋กœ letterbox ์ฒ˜๋ฆฌ๋œ 512ร—512 ์ด๋ฏธ์ง€๋กœ ํ•™์Šต๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ๋‹ค๋ฅธ ํ•ด์ƒ๋„๋‚˜ ํŒจ๋”ฉ ์ƒ‰์ƒ์„ ์“ฐ๋ฉด ์ •ํ™•๋„๊ฐ€ ๋–จ์–ด์ง‘๋‹ˆ๋‹ค.

9. ๋น ๋ฅธ linear-attn ๊ฒฝ๋กœ์šฉ ์˜์กด์„ฑ ๋ˆ„๋ฝ

pip install flash-linear-attention causal-conv1d ๋กœ transformers ๊ฐ€ Gated DeltaNet ์˜ ๋น ๋ฅธ Triton ์ปค๋„์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•˜์„ธ์š”. ์—†์œผ๋ฉด torch ํด๋ฐฑ ๊ฒฝ๋กœ๋ฅผ ํƒ€๊ฒŒ ๋˜๋Š”๋ฐ, FP16 ๋ˆ„์‚ฐ ์ˆœ์„œ๊ฐ€ ๋‹ฌ๋ผ์„œ ์ ์‘๋œ ๊ฐ€์ค‘์น˜์—์„œ๋Š” ์ถœ๋ ฅ์ด ํ‘œ๋ฅ˜ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. (torch 2.8 + cu128 ์šฉ ์‚ฌ์ „ ๋นŒ๋“œ๋œ wheel ์ด ์žˆ์Šต๋‹ˆ๋‹ค.)


์‹œ์Šคํ…œ ํ”„๋กฌํ”„ํŠธ ๋ฐ ์ƒ์ˆ˜ (ํ•™์Šต ์‹œ์ ๊ณผ ๋™์ผ)

SYSTEM_MSG = (
    "๋‹น์‹ ์€ ์ž‘๋ฌผ ํ•ด์ถฉ ์‹๋ณ„ ์ „๋ฌธ๊ฐ€์ž…๋‹ˆ๋‹ค. "
    "์‚ฌ์ง„์„ ๋ณด๊ณ  ํ•ด์ถฉ์˜ ์ด๋ฆ„๋งŒ ํ•œ๊ตญ์–ด๋กœ ๋‹ตํ•˜์„ธ์š”. "
    'ํ•ด์ถฉ์ด ์—†์œผ๋ฉด "์ •์ƒ"์ด๋ผ๊ณ ๋งŒ ๋‹ตํ•˜์„ธ์š”. '
    "๋ถ€๊ฐ€ ์„ค๋ช… ์—†์ด ์ด๋ฆ„๋งŒ ์ถœ๋ ฅํ•˜์„ธ์š”."
)
USER_PROMPT = "์ด ์‚ฌ์ง„์— ์žˆ๋Š” ํ•ด์ถฉ์˜ ์ด๋ฆ„์„ ์•Œ๋ ค์ฃผ์„ธ์š”."

PEST_CLASSES = [
    "๊ฒ€๊ฑฐ์„ธ๋ฏธ๋ฐค๋‚˜๋ฐฉ", "๊ฝƒ๋…ธ๋ž‘์ด์ฑ„๋ฒŒ๋ ˆ", "๋‹ด๋ฐฐ๊ฐ€๋ฃจ์ด", "๋‹ด๋ฐฐ๊ฑฐ์„ธ๋ฏธ๋‚˜๋ฐฉ",
    "๋‹ด๋ฐฐ๋‚˜๋ฐฉ", "๋„๋‘‘๋‚˜๋ฐฉ", "๋จน๋…ธ๋ฆฐ์žฌ", "๋ชฉํ™”๋ฐ”๋‘‘๋ช…๋‚˜๋ฐฉ", "๋ฌด์žŽ๋ฒŒ",
    "๋ฐฐ์ถ”์ข€๋‚˜๋ฐฉ", "๋ฐฐ์ถ”ํฐ๋‚˜๋น„", "๋ฒผ๋ฃฉ์žŽ๋ฒŒ๋ ˆ", "๋น„๋‹จ๋…ธ๋ฆฐ์žฌ", "์ฉ๋ฉ๋‚˜๋ฌด๋…ธ๋ฆฐ์žฌ",
    "์•Œ๋ฝ์ˆ˜์—ผ๋…ธ๋ฆฐ์žฌ", "์ •์ƒ", "ํฐ28์ ๋ฐ•์ด๋ฌด๋‹น๋ฒŒ๋ ˆ", "ํ†ฑ๋‹ค๋ฆฌ๊ฐœ๋ฏธํ—ˆ๋ฆฌ๋…ธ๋ฆฐ์žฌ",
    "ํŒŒ๋ฐค๋‚˜๋ฐฉ",
]

19๊ฐœ ํด๋ž˜์Šค๋Š” 18์ข…์˜ ํ•ด์ถฉ + ์ •์ƒ ์œผ๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค. ํ•™์Šต ๋ฐ์ดํ„ฐ ๋ถ„ํฌ๋Š” ํด๋ž˜์Šค ๊ฐ„ ๋Œ€์ฒด๋กœ ๊ท ํ˜•์„ ์ด๋ฃน๋‹ˆ๋‹ค.


์ด๋ฏธ์ง€ ์ „์ฒ˜๋ฆฌ (letterbox)

from PIL import Image

def letterbox(img: Image.Image, size: int = 512) -> Image.Image:
    """์ข…ํšก๋น„๋ฅผ ์œ ์ง€ํ•˜๋ฉฐ ๋ฆฌ์‚ฌ์ด์ฆˆํ•œ ํ›„, ํšŒ์ƒ‰์œผ๋กœ ํŒจ๋”ฉํ•˜์—ฌ ์ •์‚ฌ๊ฐํ˜•์œผ๋กœ ๋งŒ๋“ญ๋‹ˆ๋‹ค."""
    img = img.convert("RGB")
    w, h = img.size
    scale = size / max(w, h)
    nw, nh = int(round(w * scale)), int(round(h * scale))
    resized = img.resize((nw, nh), Image.Resampling.LANCZOS)
    canvas = Image.new("RGB", (size, size), (128, 128, 128))
    canvas.paste(resized, ((size - nw) // 2, (size - nh) // 2))
    return canvas

์˜์กด์„ฑ ๋ฒ„์ „ ๊ณ ์ •

ํ•™์Šต ํ™˜๊ฒฝ์ด ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค. ์•„๋ž˜ ๋ฒ„์ „๋“ค์ด ์žฌํ˜„ ๊ฐ€๋Šฅํ•œ ์ถœ๋ ฅ์„ ๋ณด์žฅํ•ฉ๋‹ˆ๋‹ค:

torch==2.8.0+cu128
transformers>=5.2,<6.0    # 5.5.0 ๊ฒ€์ฆ๋จ
peft==0.19.1              # adapter_config.json ์˜ ๋ฒ„์ „๊ณผ ๋ฐ˜๋“œ์‹œ ์ผ์น˜
unsloth==2026.4.8         # ๋” ์ด์ „ ๋ฒ„์ „์€ cu128-torch280 extra ๋ฅผ ์‚ฌ์šฉ
xformers>=0.0.32          # FA2 ํด๋ฐฑ์œผ๋กœ ์ถฉ๋ถ„
Pillow>=10.0
flash-linear-attention    # Gated DeltaNet ๋น ๋ฅธ ๊ฒฝ๋กœ
causal-conv1d             # fla ์™€ ํ•จ๊ป˜ ์‚ฌ์šฉ

์„ค์น˜ ๋ช…๋ น:

pip install "unsloth[cu128-torch280]" "transformers>=5.2,<6.0" "peft==0.19.1"
pip install flash-linear-attention causal-conv1d --no-build-isolation

causal-conv1d ๊ฐ€ ์‚ฌ์šฉ ์ค‘์ธ CUDA ๋ฒ„์ „์—์„œ ๋นŒ๋“œ ์‹คํŒจํ•˜๋ฉด Unsloth ๊ฐ€ broken ์œผ๋กœ ํ‘œ์‹œํ•˜๊ณ  ๊ณ„์† ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค โ€” ์ •ํ™•๋„๋Š” ์˜ํ–ฅ ์—†๊ณ  ์ฒ˜๋ฆฌ๋Ÿ‰๋งŒ ์•ฝ๊ฐ„ ๋–จ์–ด์ง‘๋‹ˆ๋‹ค.


bnb 4-bit (NF4) ๋กœ VRAM ์ค„์ด๊ธฐ

๋ฒ ์ด์Šค๋ฅผ 4-bit ๋กœ ๋กœ๋“œํ•˜๋ ค๋ฉด ํ•œ ์ค„๋งŒ ๋ฐ”๊พธ๋ฉด ๋ฉ๋‹ˆ๋‹ค:

model, tokenizer = FastVisionModel.from_pretrained(
    "unsloth/Qwen3.5-9B",
    load_in_4bit=True,    # ์ด์ „ ๊ฐ’: False
)
# ๋‚˜๋จธ์ง€ ์„ค์ •์€ ๋ชจ๋‘ ๋™์ผ

์ธก์ • ๊ฒฐ๊ณผ:

  • VRAM: 19.5 GB โ†’ 8.7 GB (55 % ๊ฐ์†Œ)
  • ๋””์Šคํฌ: ~18 GB โ†’ ~5 GB
  • ์ •ํ™•๋„: ๋น„ํŠธ ๋‹จ์œ„ ๋™์ผ โ€” 10์ƒ˜ํ”Œ ํ”„๋กœ๋ธŒ์—์„œ 8/10 ๋กœ ์ผ์น˜ (ํ‹€๋ฆฐ 2๊ฐœ๋Š” ๋‘ ์„ค์ • ๋ชจ๋‘์—์„œ ๋™์ผํ•œ ์–ด๋ ค์šด ์‚ฌ๋ก€ ๋‹ด๋ฐฐ๊ฐ€๋ฃจ์ด โ†’ ์ •์ƒ ํ˜ผ๋™)

์ด LoRA ์—์„œ ์ž‘๋™ํ•˜๋Š” ์œ ์ผํ•œ ์–‘์žํ™” ๋ฐฉ์‹์ธ๋ฐ, LoRA ๊ฐ€ ์–‘์žํ™”๋œ ๊ฐ€์ค‘์น˜์— ๊ตฝํžˆ์ง€ ์•Š๊ณ  PEFT ํ›…์„ ํ†ตํ•ด ๋Ÿฐํƒ€์ž„์—์„œ ์œ ์ง€๋˜๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.

load_in_8bit=True (LLM.int8) ๋„ ๋™์ž‘ํ•˜์ง€๋งŒ ์ด ์ž‘์—…์—์„œ๋Š” NF4 ๋Œ€๋น„ ํ’ˆ์งˆ ์ด๋“ ์—†์ด VRAM ์•ฝ 13 GB ๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.


๋™์ž‘ํ•˜์ง€ ์•Š๋Š” ๊ฒฝ๋กœ๋“ค (์‹œ๋„ํ•˜์ง€ ๋งˆ์„ธ์š”)

  • model.save_pretrained_gguf(...) โ€” ์ •ํ™•๋„ 0 %
  • model.save_pretrained_merged(...) ํ›„ convert_hf_to_gguf.py โ€” 0 %
  • ๋ณ€ํ™˜ ์ „ _reorder_v_heads ๋ฏธ๋ฆฌ ์ ์šฉ โ€” 5.3 % (๋ถ•๊ดด)
  • ๋ณ‘ํ•ฉ โ†’ GGUF ์ „์— linear_attn LoRA ๋ฅผ 0์œผ๋กœ โ€” 35.1 % (์ ์‘ ์†์‹ค)
  • ๋ณ‘ํ•ฉ๋œ ๋ชจ๋ธ์— AutoAWQ โ€” ๊ฐ™์€ ๋ณ‘ํ•ฉ ๋ฒ„๊ทธ
  • peft.merge_and_unload() ํ›„ ๋ณ‘ํ•ฉ๋œ ๋””๋ ‰ํ† ๋ฆฌ์— transformers.AutoModelForImageTextToText.from_pretrained โ€” adgeadge
  • ๋ณ‘ํ•ฉ๋œ ๋ชจ๋ธ๋กœ vLLM โ€” ์œ„์™€ ๋™์ผ

ํŒจํ„ด์€ ๋ช…ํ™•ํ•ฉ๋‹ˆ๋‹ค: linear_attn ์˜ LoRA ๋ธํƒ€๋ฅผ FastVisionModel ๋Ÿฐํƒ€์ž„ ๊ฒฝ๋กœ ์™ธ์˜ ๋ฐฉ์‹์œผ๋กœ ๊ฑด๋“œ๋ฆฌ๋Š” ๋ชจ๋“  ์‹œ๋„๊ฐ€ ์“ฐ๋ ˆ๊ธฐ ์ถœ๋ ฅ์„ ๋งŒ๋“ญ๋‹ˆ๋‹ค. ๊ตฌ์กฐ์  ๊ฒฐํ•จ์€ convert_hf_to_gguf.py:_reorder_v_heads ์— ์žˆ๊ณ , ์ƒ์œ„ ์ด์Šˆ๋Š” llama.cpp#21125 ์ž…๋‹ˆ๋‹ค.


์•Œ๋ ค์ง„ ๋ถ„๋ฅ˜ ์˜ค๋ฅ˜

์•„๋ž˜๋Š” ๋ฐฐํฌ ๋ฒ„๊ทธ๊ฐ€ ์•„๋‹Œ ์‹ค์ œ ๋ชจ๋ธ์˜ ๋ถ„๋ฅ˜ ์˜ค๋ฅ˜ ์ž…๋‹ˆ๋‹ค โ€” ๋ชจ๋ธ์ด ์˜ฌ๋ฐ”๋ฅธ ํ•œ๊ตญ์–ด ํด๋ž˜์Šค๋ช…์„ ์ถœ๋ ฅํ•˜์ง€๋งŒ ์ •๋‹ต์ด ์•„๋‹Œ ๊ฒฝ์šฐ์ž…๋‹ˆ๋‹ค:

์ •๋‹ต โ†’ ์˜ˆ์ธก ๋น„์œจ (57์ƒ˜ํ”Œ ๋ฒค์น˜) ์ด์œ 
๋„๋‘‘๋‚˜๋ฐฉ โ†’ ๊ฒ€๊ฑฐ์„ธ๋ฏธ๋ฐค๋‚˜๋ฐฉ 2/3 ๋‘˜ ๋‹ค ์–ด๋‘์šด ์ƒ‰ ๋‚˜๋ฐฉ, ์‹œ๊ฐ์ ์œผ๋กœ ์œ ์‚ฌ
๋น„๋‹จ๋…ธ๋ฆฐ์žฌ โ†’ ๋ชฉํ™”๋ฐ”๋‘‘๋ช…๋‚˜๋ฐฉ 2/3 ์ž‘๊ณ  ์–ผ๋ฃฉ๋ฌด๋Šฌ๊ฐ€ ์žˆ๋Š” ๊ณค์ถฉ
๋‹ด๋ฐฐ๊ฐ€๋ฃจ์ด โ†’ ์ •์ƒ 2/3 ์ž‘์€ ํฐ ํ•ด์ถฉ, ๋†“์น˜๊ธฐ ์‰ฌ์›€
์•Œ๋ฝ์ˆ˜์—ผ๋…ธ๋ฆฐ์žฌ โ†’ ์ •์ƒ 1/3 ์œ„์™€ ์œ ์‚ฌ
๋ฒผ๋ฃฉ์žŽ๋ฒŒ๋ ˆ โ†’ ๋ชฉํ™”๋ฐ”๋‘‘๋ช…๋‚˜๋ฐฉ / ๋ฐฐ์ถ”์ข€๋‚˜๋ฐฉ 1/3 ์”ฉ ์ž‘๊ณ  ์–ด๋‘์šด ๊ณค์ถฉ

19๊ฐœ ํด๋ž˜์Šค ์ค‘ 14๊ฐœ๋Š” ๋ฒค์น˜์—์„œ 100 % (3/3) ์ ์ค‘ํ•ฉ๋‹ˆ๋‹ค. ์•ฝํ•œ ํด๋ž˜์Šค๋ฅผ ๊ฐœ์„ ํ•˜๋ ค๋ฉด ํ•ด๋‹น ์ข…์˜ ์ƒ˜ํ”Œ์„ ์ถ”๊ฐ€ํ•˜๊ฑฐ๋‚˜ ๋” ๋ณ€๋ณ„๋ ฅ ์žˆ๋Š” augmentation ์œผ๋กœ ์žฌํ•™์Šตํ•˜์„ธ์š”.


์ธ์šฉ / ์ฐธ๊ณ 

  • ๋ฒ ์ด์Šค ๋ชจ๋ธ: unsloth/Qwen3.5-9B
  • ๋ฐ์ดํ„ฐ์…‹: Himedia-AI-01/pest-detection-korean
  • LoRA ํ•™์Šต: Unsloth FastVisionModel, rank=64, alpha=128, RS-LoRA, target_modules ์ •๊ทœ์‹์— q/k/v/o_proj, gate/up/down_proj, ๊ทธ๋ฆฌ๊ณ  in_proj_qkv/z/a/b/out_proj (Gated DeltaNet ํˆฌ์˜ โ€” GGUF ๋ฐฐํฌ๊ฐ€ ๋ถˆ๊ฐ€๋Šฅํ•œ ์ด์œ )
  • ์ตœ์ข… eval loss: 0.023164 (step 850)
  • 1595 ์ƒ˜ํ”Œ ๊ฒ€์ฆ ์ •ํ™•๋„: 91.36 %

๋ผ์ด์„ ์Šค

๋ฒ ์ด์Šค ๋ชจ๋ธ ๋ฐ ๋ฐ์ดํ„ฐ์…‹์˜ ๋ผ์ด์„ ์Šค๋ฅผ ๋”ฐ๋ฆ…๋‹ˆ๋‹ค. ์ž์„ธํ•œ ์•ฝ๊ด€์€ unsloth/Qwen3.5-9B ์™€ Himedia-AI-01/pest-detection-korean ํŽ˜์ด์ง€๋ฅผ ํ™•์ธํ•˜์„ธ์š”.