ํ•œ๊ตญ์–ด ์ž‘๋ฌผ ํ•ด์ถฉ ๋ถ„๋ฅ˜๊ธฐ โ€” RunPod ๋ฐฐํฌ ๋ฒˆ๋“ค

WizWix/kor-pest-detector LoRA ์–ด๋Œ‘ํ„ฐ์™€ RunPod ์ปจํ…Œ์ด๋„ˆ์—์„œ ๋น ๋ฅด๊ฒŒ ์„œ๋ฒ„๋ฅผ ๊ธฐ๋™ํ•˜๊ธฐ ์œ„ํ•œ ๋ชจ๋“  ๋ฐฐํฌ ์Šคํฌ๋ฆฝํŠธ๋ฅผ ํ•œ ์ €์žฅ์†Œ์— ๋ฌถ์—ˆ์Šต๋‹ˆ๋‹ค. ํŒŒ๋“œ๋ฅผ ์ž์ฃผ ์ƒ์„ฑยท์ข…๋ฃŒํ•˜๋Š” ์šด์˜ ํŒจํ„ด์— ๋งž์ถฐ ์„ค๊ณ„๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

ํ•ญ๋ชฉ ๊ฐ’
๊ฒ€์ฆ ์ •ํ™•๋„ (1,535์ƒ˜ํ”Œ) 99.48 % (์ถœ์ฒ˜: WizWix evaluation)
ํด๋ž˜์Šค ์ˆ˜ 20 (์ •์ƒ + 19์ข… ํ•ด์ถฉ)
VRAM (4-bit) ์•ฝ 8.7 GB
๋””์Šคํฌ (๋ฒ ์ด์Šค + ์–ด๋Œ‘ํ„ฐ) ์•ฝ 19 GB
์ฝœ๋“œ ๋ถ€ํŠธ (๋ชจ๋ธ ์บ์‹œ ์—†์Œ) ์•ฝ 8 ~ 10๋ถ„
์›œ ๋ถ€ํŠธ (HF ์บ์‹œ ๋ณด์กด ์‹œ) ์•ฝ 90์ดˆ

์„ค๊ณ„ ์˜๋„

์ด ๋ฒˆ๋“ค์€ ๋‹ค์Œ ์‹œ๋‚˜๋ฆฌ์˜ค๋ฅผ ์œ„ํ•ด ๋งŒ๋“ค์–ด์กŒ์Šต๋‹ˆ๋‹ค:

๋น„์šฉ ์ ˆ๊ฐ์„ ์œ„ํ•ด RunPod ํŒŒ๋“œ๋ฅผ ์ž์ฃผ ์ผœ๊ณ  ๋„๋Š” ํ™˜๊ฒฝ. ํŒŒ๋“œ๋Š” ๋ณผ๋ฅจ์ด ์—†๊ฑฐ๋‚˜, ์ปจํ…Œ์ด๋„ˆ ๋””์Šคํฌ๊ฐ€ ๋งค๋ฒˆ ์ดˆ๊ธฐํ™”๋˜๋Š” ์ƒํ™ฉ.

๋”ฐ๋ผ์„œ ๋ชจ๋“  ์˜์กด์„ฑยท์Šคํฌ๋ฆฝํŠธยท์–ด๋Œ‘ํ„ฐ ๊ฐ€์ค‘์น˜๋ฅผ ์ด ์ €์žฅ์†Œ ํ•˜๋‚˜๋กœ ํ’€์–ด์„œ, git clone โ†’ bash restart_server.sh ๋‘ ์ค„์ด๋ฉด ์–ด๋””์„œ๋“  ์„œ๋ฒ„๊ฐ€ ๋œจ๋„๋ก ํ–ˆ์Šต๋‹ˆ๋‹ค.


๋น ๋ฅธ ์‹œ์ž‘ (RunPod)

1. ํŒŒ๋“œ ์‚ฌ์–‘

ํ•ญ๋ชฉ ๊ถŒ์žฅ๊ฐ’
GPU VRAM 8 GB ์ด์ƒ (RTX 3060 12GB, 4070, RTX 2000 Ada ๋“ฑ)
์ปจํ…Œ์ด๋„ˆ ๋””์Šคํฌ 30 GB ์ด์ƒ (๋ชจ๋ธ ์บ์‹œ ์•ฝ 19 GB + ํŒจํ‚ค์ง€ ์•ฝ 4 GB)
๋ฒ ์ด์Šค ์ด๋ฏธ์ง€ PyTorch 2.8 + CUDA 12.8 + Python 3.12 (RunPod ๊ณต์‹ PyTorch ํ…œํ”Œ๋ฆฟ ๊ถŒ์žฅ)
HTTP ํฌํŠธ 8888 (RunPod ์˜ JupyterLab ๊ธฐ๋ณธ ํฌํŠธ์™€ ๋™์ผํ•˜๋ฏ€๋กœ ๋ณ„๋„ ์ถ”๊ฐ€ ๋…ธ์ถœ ๋ถˆํ•„์š”)

โš  ํฌํŠธ 8888 ์€ RunPod ์ปจํ…Œ์ด๋„ˆ์—์„œ ๊ธฐ๋ณธ์ ์œผ๋กœ JupyterLab ์ด ์ ์œ ํ•ฉ๋‹ˆ๋‹ค. ๋ณธ ๋ฒˆ๋“ค์˜ restart_server.sh ๋Š” ์‹œ์ž‘ ์‹œ JupyterLab ์„ ์ข…๋ฃŒํ•˜๊ณ  8888 ์„ ์ ์œ ํ•ฉ๋‹ˆ๋‹ค. JupyterLab ์ด ํ•„์š”ํ•˜๋ฉด ๋‹ค๋ฅธ ํฌํŠธ๋กœ ์ถ”๊ฐ€ ๋…ธ์ถœํ•˜์„ธ์š”.

2. ์„ค์น˜ ๋ฐ ๊ธฐ๋™ (ํ•œ ๋ฒˆ์—)

SSH ๋˜๋Š” ์›น ํ„ฐ๋ฏธ๋„์—์„œ ๋‹จ ํ•œ ์ค„:

mkdir -p /workspace/deploy && cd /workspace/deploy && \
  wget -q https://huggingface.co/pfox1995/pest-detector-runpod/resolve/main/restart_server.sh && \
  bash restart_server.sh

์ด ํ•œ ์ค„์ด ์ž๋™ ์ฒ˜๋ฆฌํ•ฉ๋‹ˆ๋‹ค:

  • restart_server.sh ๋งŒ ๋จผ์ € ๋ฐ›์Œ (5 KB)
  • ์Šคํฌ๋ฆฝํŠธ๊ฐ€ huggingface_hub.snapshot_download ์œผ๋กœ ๋‚˜๋จธ์ง€ ๋ฒˆ๋“ค (์–ด๋Œ‘ํ„ฐ ๊ฐ€์ค‘์น˜ ~693 MB, ํ† ํฌ๋‚˜์ด์ € ~20 MB, server.py ๋“ฑ) ์„ ๊ฐ™์€ ๋””๋ ‰ํ† ๋ฆฌ์— ๋ฐ›์Œ
  • pip ์„ค์น˜ โ†’ ๋ฒ ์ด์Šค ๋ชจ๋ธ ๋‹ค์šด๋กœ๋“œ โ†’ ์„œ๋ฒ„ ๊ธฐ๋™

โš  git clone ์€ ๊ถŒ์žฅํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. HuggingFace ์˜ ๋Œ€์šฉ๋Ÿ‰ ํŒŒ์ผ์€ Git LFS ๋ฅผ ์‚ฌ์šฉํ•˜๋ฏ€๋กœ ์ผ๋ฐ˜ git clone ์€ 134 ๋ฐ”์ดํŠธ์งœ๋ฆฌ ํฌ์ธํ„ฐ ํŒŒ์ผ ๋งŒ ๋ฐ›์•„์˜ต๋‹ˆ๋‹ค. ๋ณธ ์Šคํฌ๋ฆฝํŠธ๋Š” ์ด ๊ฒฝ์šฐ๋ฅผ ์ž๋™ ๊ฐ์ง€ํ•˜๊ณ  huggingface_hub ๋กœ ๋‹ค์‹œ ๋ฐ›์ง€๋งŒ, ์‹œ๊ฐ„์ด ๋‘ ๋ฐฐ๋กœ ๋“ญ๋‹ˆ๋‹ค. ์œ„์˜ wget ํ•œ ์ค„์ด ๊ฐ€์žฅ ๊น”๋”ํ•ฉ๋‹ˆ๋‹ค.

restart_server.sh ๊ฐ€ ์ž๋™ ์ฒ˜๋ฆฌํ•˜๋Š” ๋‹จ๊ณ„:

๋‹จ๊ณ„ ์†Œ์š” ์‹œ๊ฐ„ ๋‚ด์šฉ
1. Python ์˜์กด์„ฑ ์„ค์น˜ ~3๋ถ„ unsloth, peft, fastapi, bitsandbytes, flash-linear-attention ๋“ฑ
2. causal_conv1d ์‚ฌ์ „๋นŒ๋“œ wheel ์„ค์น˜ ~30์ดˆ torch 2.8 + cu12 + py312 ์šฉ
3. JupyterLab ์ข…๋ฃŒ + tmux pest ์„ธ์…˜ ๊ธฐ๋™ ~5์ดˆ ํฌํŠธ 8888 ์ ์œ 
4. ๋ฒ ์ด์Šค ๋ชจ๋ธ ๋‹ค์šด๋กœ๋“œ ~4 ~ 5๋ถ„ unsloth/Qwen3.5-9B ์•ฝ 19 GB (HF cache ๋ฏธ์กด์žฌ ์‹œ)
5. LoRA ์–ด๋Œ‘ํ„ฐ ๋ถ€์ฐฉ ~5์ดˆ ๋ฒˆ๋“ค ๋‚ด๋ถ€ ํŒŒ์ผ ์ง์ ‘ ์‚ฌ์šฉ (์žฌ๋‹ค์šด๋กœ๋“œ ์—†์Œ)
6. Triton JIT ์›Œ๋ฐ์—… (์ฒซ ํ˜ธ์ถœ ์‹œ ~10์ดˆ) ์ฒซ /classify ํ˜ธ์ถœ์—์„œ๋งŒ ๋ฐœ์ƒ

3. ๋™์ž‘ ํ™•์ธ

์„ค์น˜๊ฐ€ ๋๋‚˜๋ฉด RunPod ๊ฐ€ ์ž๋™์œผ๋กœ ๊ณต๊ฐœ ํ”„๋ก์‹œ URL ์„ ๋ถ€์—ฌํ•ฉ๋‹ˆ๋‹ค:

curl https://<POD_ID>-8888.proxy.runpod.net/health
# {"status":"ok","model_loaded":true}

๋ธŒ๋ผ์šฐ์ €๋กœ https://<POD_ID>-8888.proxy.runpod.net/ ์— ์ ‘์†ํ•˜๋ฉด ํ•œ๊ตญ์–ด ์—…๋กœ๋“œ UI ๊ฐ€ ํ‘œ์‹œ๋ฉ๋‹ˆ๋‹ค.


๋งค๋ฒˆ ๋„๊ณ  ์ผœ๋Š” ์šด์˜ ๊ฐ€์ด๋“œ

RunPod ํŒŒ๋“œ์˜ ์ปจํ…Œ์ด๋„ˆ๋ฅผ ์žฌ์‹œ์ž‘ยท์žฌ์ƒ์„ฑํ•˜๋ฉด:

  • โŒ ์ปจํ…Œ์ด๋„ˆ ๋””์Šคํฌ๊ฐ€ ์ดˆ๊ธฐํ™”๋จ (/workspace ๊ฐ€ ๋ณผ๋ฅจ์ด ์•„๋‹ˆ๋ฉด ํ•จ๊ป˜ ์‚ฌ๋ผ์ง)
  • โŒ ์„ค์น˜๋œ pip ํŒจํ‚ค์ง€๊ฐ€ ๋ชจ๋‘ ์‚ฌ๋ผ์ง
  • โŒ ๋ชจ๋ธ ์บ์‹œ (~/.cache/huggingface) ๊ฐ€ ์‚ฌ๋ผ์ง โ† ์ด๊ฒŒ ์ฝœ๋“œ ๋ถ€ํŠธ์˜ ๊ฐ€์žฅ ํฐ ๋น„์šฉ

๋”ฐ๋ผ์„œ ๋งค๋ฒˆ ์œ„ ๋น ๋ฅธ ์‹œ์ž‘ 2๋ฒˆ ์ ˆ์ฐจ๋ฅผ ๋‹ค์‹œ ์‹คํ–‰ํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค (git clone ๋ถ€ํ„ฐ).

์ฝœ๋“œ ๋ถ€ํŠธ ๋น„์šฉ ์ค„์ด๊ธฐ (์„ ํƒ)

๋ฐฉ๋ฒ• ์ ˆ๊ฐ ํšจ๊ณผ ์ถ”๊ฐ€ ๋น„์šฉ
/workspace ์— RunPod ๋„คํŠธ์›Œํฌ ๋ณผ๋ฅจ ๋งˆ์šดํŠธ ์ฝœ๋“œ ๋ถ€ํŠธ โ†’ ์•ฝ 90์ดˆ ๋ณผ๋ฅจ ์ž„๋Œ€๋ฃŒ (์›” ์•ฝ GB๋‹น $0.07, 30 GB โ‰ˆ $2.1/์›”)
์‚ฌ์ „ ๋นŒ๋“œ Docker ์ด๋ฏธ์ง€ ์‚ฌ์šฉ pip ๋‹จ๊ณ„ ์ƒ๋žต ์ด๋ฏธ์ง€ ๋นŒ๋“œ/์œ ์ง€ ๋ถ€๋‹ด
unsloth/Qwen3.5-9B-bnb-4bit ์‚ฌ์šฉ ๋ฒ ์ด์Šค ๋‹ค์šด๋กœ๋“œ 19 GB โ†’ 5 GB BASE_MODEL ํ™˜๊ฒฝ๋ณ€์ˆ˜๋กœ ๋ณ€๊ฒฝ ๊ฐ€๋Šฅ, ์ •ํ™•๋„ ๋™์ผ

๊ฐ€์žฅ ๋น„์šฉ ๋Œ€๋น„ ํšจ๊ณผ๊ฐ€ ์ข‹์€ ์˜ต์…˜์€ ๋ฒ ์ด์Šค ๋ชจ๋ธ์„ 4-bit ์‚ฌ์ „ ์–‘์žํ™” ๋ฒ„์ „์œผ๋กœ ๋ณ€๊ฒฝํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ฝœ๋“œ ๋ถ€ํŠธ ์‹œ๊ฐ„์ด ์•ฝ ์ ˆ๋ฐ˜(8๋ถ„ โ†’ 4๋ถ„)์œผ๋กœ ์ค„๊ณ  ์ถ”๊ฐ€ ๋น„์šฉ๋„ ์—†์Šต๋‹ˆ๋‹ค.


API

์„œ๋ฒ„๋Š” FastAPI ๊ธฐ๋ฐ˜์ด๋ฉฐ 5๊ฐœ ์—”๋“œํฌ์ธํŠธ๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

๋ฉ”์„œ๋“œ ๊ฒฝ๋กœ ์„ค๋ช…
GET /health {"status":"ok","model_loaded":true}
GET /classes 20๊ฐœ ํด๋ž˜์Šค ๋ชฉ๋ก๊ณผ ๊ฐœ์ˆ˜
GET / ๋ธŒ๋ผ์šฐ์ €์šฉ ํ•œ๊ตญ์–ด ์—…๋กœ๋“œ ํŽ˜์ด์ง€
POST /classify multipart file=... ์—…๋กœ๋“œ โ†’ ์˜ˆ์ธก ๊ฒฐ๊ณผ
POST /classify_b64 JSON {"image":"<base64>"} โ†’ ์˜ˆ์ธก ๊ฒฐ๊ณผ

์‘๋‹ต ์˜ˆ์‹œ

curl -F file=@pest.jpg https://<POD_ID>-8888.proxy.runpod.net/classify
# {"pred":"๊ฒ€๊ฑฐ์„ธ๋ฏธ๋ฐค๋‚˜๋ฐฉ","raw":"๊ฒ€๊ฑฐ์„ธ๋ฏธ๋ฐค๋‚˜๋ฐฉ","elapsed_s":2.3}

ํŒŒ์ด์ฌ ํด๋ผ์ด์–ธํŠธ ์˜ˆ์‹œ

import requests

resp = requests.post(
    "https://<POD_ID>-8888.proxy.runpod.net/classify",
    files={"file": open("pest.jpg", "rb")},
    timeout=60,
)
print(resp.json()["pred"])  # ํ•œ๊ตญ์–ด ํด๋ž˜์Šค๋ช…

ํ™˜๊ฒฝ๋ณ€์ˆ˜

bash restart_server.sh ์‹คํ–‰ ์‹œ ํ™˜๊ฒฝ๋ณ€์ˆ˜๋กœ ๋™์ž‘์„ ๋ณ€๊ฒฝํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋ณ€์ˆ˜ ๊ธฐ๋ณธ๊ฐ’ ์„ค๋ช…
PORT 8888 ์„œ๋ฒ„ ์ฒญ์ทจ ํฌํŠธ (RunPod ์—์„œ ์™ธ๋ถ€ ๋…ธ์ถœ๋˜์–ด์•ผ ํ•จ)
ADAPTER (์ž๋™ ๊ฐ์ง€) ๋ณธ ๋ฒˆ๋“ค ๋””๋ ‰ํ† ๋ฆฌ ์•ˆ์— adapter_config.json ์ด ์žˆ์œผ๋ฉด ๊ทธ ๊ฒฝ๋กœ, ์—†์œผ๋ฉด pfox1995/pest-detector-runpod HF ์ €์žฅ์†Œ
BASE_MODEL unsloth/Qwen3.5-9B ๋ฒ ์ด์Šค VLM. 4-bit ์‚ฌ์ „ ์–‘์žํ™” ๋ณ€ํ˜•์œผ๋กœ ๋ณ€๊ฒฝํ•˜๋ ค๋ฉด unsloth/Qwen3.5-9B-bnb-4bit
LOAD_IN_4BIT true bnb NF4 4-bit ๋กœ ๋กœ๋“œ. false ๋ฉด FP16 (VRAM ์•ฝ 19.5 GB ํ•„์š”)
HF_TOKEN (์—†์Œ) private ์ €์žฅ์†Œ ์‚ฌ์šฉ ์‹œ์—๋งŒ ํ•„์š”
PUBLIC_URL ์ž๋™ ๊ฐ์ง€ ์ถœ๋ ฅ์šฉ ๊ณต๊ฐœ URL. RunPod ์˜ RUNPOD_POD_ID ํ™˜๊ฒฝ๋ณ€์ˆ˜์—์„œ ์ž๋™ ์ถ”๋ก 
# ์˜ˆ: FP16 ๋กœ ๋„์šฐ๊ธฐ (VRAM 20 GB ์ด์ƒ GPU)
LOAD_IN_4BIT=false bash restart_server.sh

# ์˜ˆ: ๋‹ค๋ฅธ ํฌํŠธ๋กœ
PORT=9000 bash restart_server.sh

# ์˜ˆ: ๋‹ค๋ฅธ ์–ด๋Œ‘ํ„ฐ๋กœ (ํ˜ธํ™˜๋˜๋Š” ๊ฒฝ์šฐ๋งŒ)
ADAPTER=๋‹ค๋ฅธ๊ณ„์ •/๋‹ค๋ฅธ์–ด๋Œ‘ํ„ฐ bash restart_server.sh

ํด๋ž˜์Šค ๋ชฉ๋ก (20๊ฐœ)

์ •์ƒ ์„ ํฌํ•จํ•œ 20๊ฐœ ํ•œ๊ตญ์–ด ํด๋ž˜์Šค:

์ •์ƒ, ๊ฒ€๊ฑฐ์„ธ๋ฏธ๋ฐค๋‚˜๋ฐฉ, ๊ฝƒ๋…ธ๋ž‘์ด์ฑ„๋ฒŒ๋ ˆ, ๋‹ด๋ฐฐ๊ฐ€๋ฃจ์ด, ๋‹ด๋ฐฐ๊ฑฐ์„ธ๋ฏธ๋‚˜๋ฐฉ,
๋‹ด๋ฐฐ๋‚˜๋ฐฉ, ๋„๋‘‘๋‚˜๋ฐฉ, ๋จน๋…ธ๋ฆฐ์žฌ, ๋ชฉํ™”๋ฐ”๋‘‘๋ช…๋‚˜๋ฐฉ, ๋ฌด์žŽ๋ฒŒ,
๋ฐฐ์ถ”์ข€๋‚˜๋ฐฉ, ๋ฐฐ์ถ”ํฐ๋‚˜๋น„, ๋ฒผ๋ฃฉ์žŽ๋ฒŒ๋ ˆ, ๋ณต์ˆญ์•„ํ˜น์ง„๋”ง๋ฌผ, ์ฉ๋ฉ๋‚˜๋ฌด๋…ธ๋ฆฐ์žฌ,
์—ด๋Œ€๊ฑฐ์„ธ๋ฏธ๋‚˜๋ฐฉ, ํฐ28์ ๋ฐ•์ด๋ฌด๋‹น๋ฒŒ๋ ˆ, ํ†ฑ๋‹ค๋ฆฌ๊ฐœ๋ฏธํ—ˆ๋ฆฌ๋…ธ๋ฆฐ์žฌ, ํŒŒ๋ฐค๋‚˜๋ฐฉ, ํ™๋น„๋‹จ๋…ธ๋ฆฐ์žฌ

/classes ์—”๋“œํฌ์ธํŠธ๊ฐ€ ํ•ญ์ƒ ์บ๋…ธ๋‹ˆ์ปฌ ์ •๋ ฌ๋œ ๋ชฉ๋ก์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค (์ •๋ ฌ์€ ํ•œ๊ธ€ ์‚ฌ์ „์ˆœ, ์ •์ƒ ์€ ํด๋ž˜์Šค ์–ดํœ˜ ์•ˆ์—์„œ๋Š” ์•ŒํŒŒ๋ฒณ ์ˆœ์„œ๋Œ€๋กœ ๋“ค์–ด๊ฐ).


ํŠธ๋Ÿฌ๋ธ”์ŠˆํŒ…

์„œ๋ฒ„๊ฐ€ ์•ˆ ๋œธ / /health ๊ฐ€ 200 ์•ˆ ๋ณด๋ƒ„

# tmux ์„ธ์…˜ ๋กœ๊ทธ ํ™•์ธ
tmux capture-pane -t pest -p | tail -50
# ๋˜๋Š”
tail -100 /workspace/pest_server.log

/classify ๊ฐ€ adgeadge... ๊ฐ™์€ ์“ฐ๋ ˆ๊ธฐ ํ…์ŠคํŠธ๋ฅผ ๋ฐ˜ํ™˜

merge_and_unload() ๋ฅผ ํ˜ธ์ถœํ–ˆ๊ฑฐ๋‚˜ transformers.AutoModelForImageTextToText ๋กœ ๋ฒ ์ด์Šค๋ฅผ ๋กœ๋“œํ•œ ๊ฒฝ์šฐ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค. ๋ฐ˜๋“œ์‹œ unsloth.FastVisionModel.from_pretrained + peft.PeftModel.from_pretrained ๋Ÿฐํƒ€์ž„ ํ›… ์œผ๋กœ๋งŒ ๋กœ๋“œํ•˜์„ธ์š”. ๋ณธ ๋ฒˆ๋“ค์˜ server.py ๋Š” ์ด๋ฏธ ์˜ฌ๋ฐ”๋ฅด๊ฒŒ ์„ค์ •๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

CUDA OOM

LOAD_IN_4BIT=true ๋กœ ๋‹ค์‹œ ๋„์šฐ์„ธ์š” (๊ธฐ๋ณธ๊ฐ’). ๊ทธ๋ž˜๋„ ๋ถ€์กฑํ•˜๋ฉด GPU ๋ฅผ ๋” ํฐ ๊ฒƒ์œผ๋กœ ๋ณ€๊ฒฝํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค (FP16 ์€ ์•ฝ 20 GB, 4-bit ์€ ์•ฝ 9 GB ํ•„์š”).

์ฒซ ํ˜ธ์ถœ์ด ๋„ˆ๋ฌด ์˜ค๋ž˜ ๊ฑธ๋ฆผ (10์ดˆ+)

์ •์ƒ์ž…๋‹ˆ๋‹ค. Triton JIT ๊ฐ€ Gated DeltaNet ์ปค๋„์„ ์ปดํŒŒ์ผํ•ฉ๋‹ˆ๋‹ค. ๋‘ ๋ฒˆ์งธ ํ˜ธ์ถœ๋ถ€ํ„ฐ๋Š” ์•ฝ 2์ดˆ์ž…๋‹ˆ๋‹ค. JIT ์บ์‹œ๋Š” ์ปจํ…Œ์ด๋„ˆ ์žฌ์‹œ์ž‘ ์‹œ ์‚ฌ๋ผ์ง‘๋‹ˆ๋‹ค.

causal_conv1d ๋นŒ๋“œ ์‹คํŒจ

์‚ฌ์ „ ๋นŒ๋“œ๋œ wheel ์ด ์žˆ์œผ๋‹ˆ restart_server.sh ์—์„œ ์ž๋™ ์„ค์น˜ํ•ฉ๋‹ˆ๋‹ค. ๋งŒ์•ฝ ์ง์ ‘ ๋นŒ๋“œํ•˜๋ฉด 9๊ฐœ GPU ์•„ํ‚คํ…์ฒ˜ ์ปดํŒŒ์ผ๋กœ OOM ์ด ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค.


GGUF / Ollama / llama.cpp ๋กœ ๋ฐฐํฌ ๋ถˆ๊ฐ€

๋ณธ ์–ด๋Œ‘ํ„ฐ๋Š” Qwen3.5 ์˜ Gated DeltaNet ๋ชจ๋“ˆ (linear_attn.in_proj_*) ์„ ํ•™์Šต ๋Œ€์ƒ์œผ๋กœ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค. convert_hf_to_gguf.py ์˜ V-row ์ˆœ์—ด์ด LoRA ๋ธํƒ€์™€ ํ˜ธํ™˜๋˜์ง€ ์•Š์•„, GGUF ๋ณ€ํ™˜ ํ›„ ์ถœ๋ ฅ์ด ํ† ํฐ ๋ถ•๊ดด (adgeadge...) ๋ฉ๋‹ˆ๋‹ค.

์ƒ์„ธ ์›์ธ ๋ถ„์„์€ llama.cpp#21125 ์ฐธ๊ณ . ๋ณธ ๋ฒˆ๋“ค์€ PEFT ๋Ÿฐํƒ€์ž„ ํ›… ๋ฐฉ์‹๋งŒ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.


์ถœ์ฒ˜ ๋ฐ ๋ผ์ด์„ ์Šค

๊ตฌ์„ฑ ์ถœ์ฒ˜ ๋ผ์ด์„ ์Šค
LoRA ์–ด๋Œ‘ํ„ฐ ๊ฐ€์ค‘์น˜ WizWix/kor-pest-detector Apache-2.0 (๋ฒ ์ด์Šค ๋ชจ๋ธ ๊ธฐ์ค€)
๋ฒ ์ด์Šค VLM unsloth/Qwen3.5-9B Apache-2.0
ํ•™์Šต ๋ฐ์ดํ„ฐ์…‹ Himedia-AI-01/pest-detection-korean ๋ฐ์ดํ„ฐ์…‹ ํŽ˜์ด์ง€ ์•ฝ๊ด€ ๋”ฐ๋ฆ„
๋ฐฐํฌ ์Šคํฌ๋ฆฝํŠธ (server.py, inference.py, restart_server.sh, requirements.txt) ๋ณธ ์ €์žฅ์†Œ ์ž‘์„ฑ์ž (@pfox1995) MIT

๋ณธ ์ €์žฅ์†Œ๋Š” ํ•™์Šต ์ž์ฒด๋Š” WizWix ๊ฐ€ ์ˆ˜ํ–‰ํ•œ ์–ด๋Œ‘ํ„ฐ๋ฅผ RunPod ๋ฐฐํฌ์— ์ตœ์ ํ™”ํ•˜์—ฌ ์žฌํฌ์žฅํ•œ ๋ฒˆ๋“ค์ž…๋‹ˆ๋‹ค. ์–ด๋Œ‘ํ„ฐ ์ž์ฒด์— ๋Œ€ํ•œ ํ•™์Šต ์„ธ๋ถ€ ์ •๋ณดยทํ‰๊ฐ€ ๋ฉ”ํŠธ๋ฆญยทconfusion matrix ๋“ฑ์€ WizWix/kor-pest-detector ์˜ evaluation ๋””๋ ‰ํ† ๋ฆฌ๋ฅผ ์ฐธ๊ณ ํ•˜์„ธ์š”.


๋ณ€๊ฒฝ ์ด๋ ฅ

  • v1.0 (์ตœ์ดˆ): WizWix/kor-pest-detector v(time of bundling) + RunPod ๋ฐฐํฌ ์Šคํฌ๋ฆฝํŠธ ๋ฒˆ๋“ค
Downloads last month
66
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for pfox1995/pest-detector-runpod

Finetuned
Qwen/Qwen3.5-9B
Adapter
(64)
this model