simple-chat / README.md
alex4cip's picture
docs: Update README with multi-environment support and remove redundant footer
f1ac66c

A newer version of the Gradio SDK is available: 6.1.0

Upgrade
metadata
title: Multi-Model Korean LLM Chatbot
emoji: ๐Ÿค–
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
license: mit

๐Ÿค– Multi-Model Korean LLM Chatbot

13๊ฐœ์˜ ๋‹ค์–‘ํ•œ ํ•œ๊ตญ์–ด LLM ๋ชจ๋ธ์„ ์„ ํƒํ•˜์—ฌ ๋Œ€ํ™”ํ•  ์ˆ˜ ์žˆ๋Š” ๋ฉ€ํ‹ฐ๋ชจ๋ธ ์ฑ—๋ด‡์ž…๋‹ˆ๋‹ค. **๋กœ์ปฌ ํ™˜๊ฒฝ(CPU/GPU)**๊ณผ **Hugging Face Spaces(CPU Basic/Upgrade, ZeroGPU)**๋ฅผ ์ž๋™ ๊ฐ์ง€ํ•˜์—ฌ ์ตœ์  ์„ค์ •์„ ์ ์šฉํ•ฉ๋‹ˆ๋‹ค.

โœจ ์ฃผ์š” ํŠน์ง•

  • ๐ŸŽฏ 13๊ฐœ ๋ชจ๋ธ ์„ ํƒ: ๋‹ค์–‘ํ•œ ํฌ๊ธฐ์™€ ํŠน์„ฑ์˜ LLM ๋ชจ๋ธ ์ง€์›
  • ๐Ÿ‡ฐ๐Ÿ‡ท ํ•œ๊ธ€ ์ตœ์ ํ™”: ํ•œ๊ตญ์–ด ์„ฑ๋Šฅ์ด ์šฐ์ˆ˜ํ•œ ๋ชจ๋ธ๋“ค๋กœ ๊ตฌ์„ฑ
  • ๐Ÿ–ฅ๏ธ ๋ฉ€ํ‹ฐ ํ™˜๊ฒฝ ์ง€์›: ๋กœ์ปฌ(CPU/GPU) + HF Spaces(CPU Basic/Upgrade, ZeroGPU) ์ž๋™ ๊ฐ์ง€
  • ๐Ÿ’พ ์บ์‹œ ์‹œ์Šคํ…œ: ๋ชจ๋ธ ์žฌ๋‹ค์šด๋กœ๋“œ ๋ฐฉ์ง€, ๋น ๋ฅธ ๋กœ๋”ฉ
  • ๐Ÿ”„ Lazy Loading: ์„ ํƒํ•œ ๋ชจ๋ธ๋งŒ ๋กœ๋“œํ•˜์—ฌ ๋ฆฌ์†Œ์Šค ์ ˆ์•ฝ
  • ๐Ÿ›ก๏ธ ์•ˆ์ •์„ฑ: RTX 5080 ๋“ฑ ์ตœ์‹  GPU ์ง€์›, CUDA ํ˜ธํ™˜์„ฑ ์ž๋™ ํ…Œ์ŠคํŠธ

๐ŸŽฏ ์ง€์› ๋ชจ๋ธ (13๊ฐœ)

๐ŸŒŸ ์ถ”์ฒœ ํ•œ๊ตญ์–ด ๋ชจ๋ธ

๋ชจ๋ธ ํฌ๊ธฐ ํŠน์ง• ์ƒํƒœ
EXAONE 3.5 7.8B 7.3GB โญ ํŒŒ๋ผ๋ฏธํ„ฐ ๋Œ€๋น„ ์ตœ๊ณ  ํšจ์œจ Public
EXAONE 3.5 2.4B 2.2GB โšก ์ดˆ๊ฒฝ๋Ÿ‰, ๋น ๋ฅธ ์‘๋‹ต Public
Llama-3 Open-Ko 8B 7.5GB ๐Ÿ”ฅ Llama 3 ์ƒํƒœ๊ณ„ Public

๐Ÿ“š ์ „์ฒด ๋ชจ๋ธ ๋ชฉ๋ก

Public ๋ชจ๋ธ (10๊ฐœ)

  1. LGAI-EXAONE/EXAONE-3.5-7.8B-Instruct
  2. LGAI-EXAONE/EXAONE-3.5-2.4B-Instruct
  3. beomi/Llama-3-Open-Ko-8B
  4. Qwen/Qwen2.5-7B-Instruct
  5. Qwen/Qwen2.5-14B-Instruct
  6. 01-ai/Yi-1.5-9B-Chat
  7. 01-ai/Yi-1.5-34B-Chat
  8. mistralai/Mistral-7B-Instruct-v0.3
  9. upstage/SOLAR-10.7B-Instruct-v1.0
  10. EleutherAI/polyglot-ko-5.8b

Gated ๋ชจ๋ธ (3๊ฐœ) ๐Ÿ”’

  1. meta-llama/Llama-3.1-8B-Instruct
  2. meta-llama/Llama-3.1-70B-Instruct
  3. CohereForAI/aya-23-8B

์ฐธ๊ณ : Gated ๋ชจ๋ธ์€ Hugging Face์—์„œ ๋ณ„๋„ ์Šน์ธ ํ•„์š”

๐Ÿš€ ์ง€์› ํ™˜๊ฒฝ

๋กœ์ปฌ ํ™˜๊ฒฝ (๊ฐœ๋ฐœ/๊ฐœ์ธ ์‚ฌ์šฉ)

1. Local GPU (๊ถŒ์žฅ)

  • ์žฅ์ :
    • โšก ๋น ๋ฅธ ์‘๋‹ต (5-10์ดˆ, GPU ๊ฐ€์†)
    • ๐Ÿ”“ ๋ฌด์ œํ•œ ์‚ฌ์šฉ
    • ๐Ÿ’ฐ ๋น„์šฉ ์—†์Œ
  • ์ง€์› GPU:
    • NVIDIA CUDA ์ง€์› GPU (RTX ์‹œ๋ฆฌ์ฆˆ, A100 ๋“ฑ)
    • Apple Silicon GPU (M1/M2/M3 - MPS ๊ฐ€์†)
    • RTX 5080 ๋“ฑ ์ตœ์‹  Blackwell GPU (PyTorch nightly ํ•„์š”)
  • ์š”๊ตฌ์‚ฌํ•ญ: CUDA 12.0+ ๋˜๋Š” Apple Silicon

2. Local CPU

  • ์žฅ์ :
    • ๐Ÿ–ฅ๏ธ GPU ์—†์ด๋„ ์‹คํ–‰ ๊ฐ€๋Šฅ
    • ๐Ÿ”ง ๊ฐ„๋‹จํ•œ ์„ค์ •
  • ์ œ์•ฝ:
    • โณ ๋А๋ฆฐ ์‘๋‹ต (1~3๋ถ„)
    • ๐Ÿ”’ ๊ฒฝ๋Ÿ‰ ๋ชจ๋ธ ๊ถŒ์žฅ (EXAONE 2.4B, Mistral 7B)

Hugging Face Spaces (ํด๋ผ์šฐ๋“œ ๋ฐฐํฌ)

1. ZeroGPU (์ถ”์ฒœ)

  • ์žฅ์ :
    • โšก ๋น ๋ฅธ ์‘๋‹ต (3-10์ดˆ, NVIDIA H200 GPU ๊ฐ€์†)
    • ๐Ÿ’ฐ ์ €๋ ดํ•œ ๋น„์šฉ ($9/month)
    • ๐Ÿ”‹ ์ž๋™ GPU ํ• ๋‹น/ํ•ด์ œ
  • ์ œ์•ฝ:
    • ํ•˜๋ฃจ 25๋ถ„ ๋ฌด๋ฃŒ ์‚ฌ์šฉ (PRO ๊ตฌ๋… ํ•„์š”)
    • ๋Œ€๊ธฐ์—ด ๊ฐ€๋Šฅ (์‚ฌ์šฉ์ž ๋งŽ์„ ๊ฒฝ์šฐ)
  • ๋น„์šฉ: $9/month (PRO ๊ตฌ๋…)

2. CPU Upgrade

  • ์žฅ์ :
    • โฐ ๋ฌด์ œํ•œ ์‚ฌ์šฉ
    • ๐Ÿ“Š ์˜ˆ์ธก ๊ฐ€๋Šฅํ•œ ์„ฑ๋Šฅ
    • ๐Ÿ”ง ๊ฐ„๋‹จํ•œ ์„ค์ •
  • ์ œ์•ฝ:
    • ๐Ÿข ๋А๋ฆฐ ์‘๋‹ต (30์ดˆ~1๋ถ„)
    • ๐Ÿ’ต ์ƒ๋Œ€์ ์œผ๋กœ ๋น„์‹ผ ๋น„์šฉ
  • ๋น„์šฉ: $0.03/hour (์›” ์•ฝ $22)

3. CPU Basic (๋ฌด๋ฃŒ)

  • ์žฅ์ :
    • ๐Ÿ’ก ๋ฌด๋ฃŒ ํ‹ฐ์–ด
    • ๐Ÿงช ํ…Œ์ŠคํŠธ/ํ•™์Šต ์šฉ๋„
  • ์ œ์•ฝ:
    • โณ ๋งค์šฐ ๋А๋ฆฐ ์‘๋‹ต (1~2๋ถ„)
    • ๐Ÿ”’ ๊ฒฝ๋Ÿ‰ ๋ชจ๋ธ๋งŒ ๊ถŒ์žฅ
    • โš ๏ธ ์ œํ•œ์  ์‚ฌ์šฉ

โš™๏ธ ํ™˜๊ฒฝ๋ณ„ ์„ค์ • ๋ฐฉ๋ฒ•

๋กœ์ปฌ ์‹คํ–‰ (์ž๋™ ๊ฐ์ง€)

์•ฑ์ด ์ž๋™์œผ๋กœ ๋กœ์ปฌ ํ™˜๊ฒฝ์„ ๊ฐ์ง€ํ•˜๊ณ  ์ตœ์  ์„ค์ •์„ ์ ์šฉํ•ฉ๋‹ˆ๋‹ค:

python app.py

์ž๋™ ๊ฐ์ง€ ๋กœ์ง:

  1. GPU ๊ฐ์ง€: CUDA/MPS ์‚ฌ์šฉ ๊ฐ€๋Šฅ ์—ฌ๋ถ€ ํ™•์ธ
  2. CUDA ํ˜ธํ™˜์„ฑ ํ…Œ์ŠคํŠธ: ํ…์„œ ์—ฐ์‚ฐ์œผ๋กœ ์‹ค์ œ GPU ์ž‘๋™ ๊ฒ€์ฆ
  3. CPU ํด๋ฐฑ: GPU ์˜ค๋ฅ˜ ์‹œ ์ž๋™ CPU ๋ชจ๋“œ ์ „ํ™˜
  4. ํ™˜๊ฒฝ ์ •๋ณด ์ถœ๋ ฅ: ์‹œ์ž‘ ์‹œ ๊ฐ์ง€๋œ ํ™˜๊ฒฝ ์ •๋ณด ํ‘œ์‹œ

HF Spaces ๋ฐฐํฌ (์ž๋™ ๊ฐ์ง€)

Space Settings์—์„œ ํ•˜๋“œ์›จ์–ด๋ฅผ ๋ณ€๊ฒฝํ•˜๋ฉด ์•ฑ์ด ์ž๋™์œผ๋กœ ๊ฐ์ง€:

ZeroGPU๋กœ ๋ณ€๊ฒฝ:

  1. Space Settings โ†’ Hardware
  2. ZeroGPU ์„ ํƒ
  3. Confirm โ†’ ๋นŒ๋“œ ์™„๋ฃŒ ๋Œ€๊ธฐ (1-2๋ถ„)
  4. UI์— "๐Ÿš€ HF Spaces - ZeroGPU" ํ‘œ์‹œ ํ™•์ธ

CPU Upgrade๋กœ ๋ณ€๊ฒฝ:

  1. Space Settings โ†’ Hardware
  2. CPU Upgrade (8 vCPU / 32 GB) ์„ ํƒ
  3. Confirm โ†’ ๋นŒ๋“œ ์™„๋ฃŒ ๋Œ€๊ธฐ (1-2๋ถ„)
  4. UI์— "โš™๏ธ HF Spaces - CPU Upgrade" ํ‘œ์‹œ ํ™•์ธ

CPU Basic (๋ฌด๋ฃŒ):

  • ๊ธฐ๋ณธ ์„ค์ •, ๋ณ„๋„ ๋ณ€๊ฒฝ ๋ถˆํ•„์š”
  • UI์— "๐Ÿ’ป HF Spaces - CPU Basic" ํ‘œ์‹œ

๐Ÿ“Š ์„ฑ๋Šฅ ๋น„๊ต

ํ•ญ๋ชฉ Local GPU Local CPU ZeroGPU CPU Upgrade CPU Basic
์ฒซ ์‘๋‹ต 10-20์ดˆ 2-5๋ถ„ 10-20์ดˆ 1-2๋ถ„ 2-3๋ถ„
์ดํ›„ ์‘๋‹ต 5-10์ดˆ 1-3๋ถ„ 3-10์ดˆ 30์ดˆ~1๋ถ„ 1-2๋ถ„
์ผ์ผ ํ•œ๋„ ๋ฌด์ œํ•œ ๋ฌด์ œํ•œ 25๋ถ„ ๋ฌด์ œํ•œ ์ œํ•œ์ 
์›” ๋น„์šฉ $0 $0 $9 $22 $0
GPU ์‚ฌ์šฉ์ž GPU ์—†์Œ H200 (70GB) ์—†์Œ ์—†์Œ
๊ถŒ์žฅ ๋ชจ๋ธ ์ „์ฒด ๊ฒฝ๋Ÿ‰ ์ „์ฒด ์ „์ฒด ๊ฒฝ๋Ÿ‰

๐Ÿ”ง ๊ธฐ์ˆ  ๊ตฌ์กฐ

๋ฉ€ํ‹ฐ ํ™˜๊ฒฝ ์ž๋™ ๊ฐ์ง€ ์‹œ์Šคํ…œ

# 1. CUDA ์ดˆ๊ธฐํ™” ์˜ค๋ฅ˜ ๋ฐฉ์ง€: spaces๋ฅผ ๋จผ์ € import
try:
    import spaces
    ZEROGPU_AVAILABLE = True
except ImportError:
    ZEROGPU_AVAILABLE = False

# 2. ์ดํ›„ CUDA ๊ด€๋ จ ํŒจํ‚ค์ง€ import
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

# 3. ํ•˜๋“œ์›จ์–ด ํ™˜๊ฒฝ ๊ฐ์ง€
def detect_hardware_environment():
    """
    Returns: {
        'platform': 'hf_spaces' | 'local',
        'hardware': 'zerogpu' | 'cpu_upgrade' | 'cpu_basic' | 'local_gpu' | 'local_cpu',
        'gpu_available': bool,
        'gpu_name': str or None,
        'cuda_compatible': bool
    }
    """
    # HF Spaces ๊ฐ์ง€
    if os.environ.get('SPACE_ID'):
        if ZEROGPU_AVAILABLE:
            return 'zerogpu'
        elif cpu_count >= 8:
            return 'cpu_upgrade'
        else:
            return 'cpu_basic'

    # ๋กœ์ปฌ ํ™˜๊ฒฝ ๊ฐ์ง€
    if torch.cuda.is_available():
        # CUDA ํ˜ธํ™˜์„ฑ ํ…Œ์ŠคํŠธ (RTX 5080 ๋“ฑ ์ตœ์‹  GPU ์ง€์›)
        if test_cuda_compatibility():
            return 'local_gpu'
        else:
            return 'local_cpu'  # CUDA ์˜ค๋ฅ˜ โ†’ CPU ํด๋ฐฑ
    elif torch.backends.mps.is_available():
        return 'local_gpu'  # Apple Silicon
    else:
        return 'local_cpu'

# 4. ์กฐ๊ฑด๋ถ€ GPU decorator ์ ์šฉ
if ZEROGPU_AVAILABLE:
    @spaces.GPU(duration=120)
    def generate_response(message, history):
        return generate_response_impl(message, history)
else:
    def generate_response(message, history):
        return generate_response_impl(message, history)

Lazy Loading & ์บ์‹œ ์‹œ์Šคํ…œ

์Šค๋งˆํŠธ ๋ชจ๋ธ ๋กœ๋”ฉ:

def load_model_once(model_index=None):
    """๋ชจ๋ธ ๋ณ€๊ฒฝ ์‹œ์—๋งŒ ๋กœ๋“œ (Lazy Loading)"""
    global model, tokenizer, loaded_model_name

    model_name = MODEL_CONFIGS[model_index]["MODEL_NAME"]

    # 1. ์ด๋ฏธ ๋กœ๋“œ๋œ ๋ชจ๋ธ์ด๋ฉด ์žฌ์‚ฌ์šฉ
    if loaded_model_name == model_name:
        print(f"โ„น๏ธ Model {model_name} already loaded, reusing...")
        return model, tokenizer

    # 2. ์บ์‹œ ํ™•์ธ โ†’ UI์— ๋‹ค์šด๋กœ๋“œ vs ๋กœ๋”ฉ ๋ฉ”์‹œ์ง€ ํ‘œ์‹œ
    is_cached = check_model_cached(model_name)
    if is_cached:
        print(f"โœ… Model found in cache, loading from disk...")
    else:
        print(f"๐Ÿ“ฅ Model not in cache, downloading (~4-14GB)...")

    # 3. ์ด์ „ ๋ชจ๋ธ ๋ฉ”๋ชจ๋ฆฌ ํ•ด์ œ
    if model is not None:
        del model, tokenizer
        if HW_ENV['cuda_compatible']:
            torch.cuda.empty_cache()

    # 4. ์ƒˆ ๋ชจ๋ธ ๋กœ๋“œ (ํ™˜๊ฒฝ๋ณ„ ์ตœ์ ํ™”)
    device = "cuda" if HW_ENV['gpu_available'] and HW_ENV['cuda_compatible'] else "cpu"

    if device == "cuda":
        model = AutoModelForCausalLM.from_pretrained(
            model_name,
            dtype=torch.float16,  # GPU: float16
            device_map="auto",
        )
    else:
        model = AutoModelForCausalLM.from_pretrained(
            model_name,
            dtype=torch.float32,  # CPU: float32
        )

    loaded_model_name = model_name
    return model, tokenizer

์บ์‹œ ์ƒํƒœ ํ™•์ธ:

  • ์‚ฌ์šฉ์ž์—๊ฒŒ "๐Ÿ’พ ์บ์‹œ๋œ ๋ชจ๋ธ ๋กœ๋”ฉ ์ค‘" vs "๐Ÿ“ฅ ๋ชจ๋ธ ๋‹ค์šด๋กœ๋“œ ์ค‘" ์‹ค์‹œ๊ฐ„ ํ‘œ์‹œ
  • ๋‹ค์šด๋กœ๋“œ ์‹œ๊ฐ„ ์˜ˆ์ธก ์ •๋ณด ์ œ๊ณต (์ฒซ ์‚ฌ์šฉ ์‹œ 5-20๋ถ„)

๐Ÿ“ ์‚ฌ์šฉ ๋ฐฉ๋ฒ•

1. Space ์ ‘์†

https://huggingface.co/spaces/catchitplay/simple-chat

2. ๋ชจ๋ธ ์„ ํƒ

  • ๋“œ๋กญ๋‹ค์šด์—์„œ ์›ํ•˜๋Š” ๋ชจ๋ธ ์„ ํƒ
  • ์บ์‹œ ์ƒํƒœ ํ™•์ธ (๐Ÿ’พ ์บ์‹œ๋จ / ๐Ÿ“ฅ ๋‹ค์šด๋กœ๋“œ ํ•„์š”)
  • ์ฒซ ์‚ฌ์šฉ ์‹œ ๋ชจ๋ธ ๋‹ค์šด๋กœ๋“œ (2-14GB, 5-20๋ถ„)

3. ๋Œ€ํ™” ์‹œ์ž‘

์•ˆ๋…•ํ•˜์„ธ์š”
์ธ๊ณต์ง€๋Šฅ์— ๋Œ€ํ•ด ์„ค๋ช…ํ•ด์ฃผ์„ธ์š”
ํ•œ๊ตญ์˜ ์ˆ˜๋„๋Š” ์–ด๋””์ธ๊ฐ€์š”?

๐Ÿ’ก ๋ชจ๋ธ ์„ ํƒ ๊ฐ€์ด๋“œ

๋น ๋ฅธ ์‘๋‹ต์ด ํ•„์š”ํ•œ ๊ฒฝ์šฐ

  • EXAONE 3.5 2.4B โšก (2.2GB) - ๊ฐ€์žฅ ๋น ๋ฆ„
  • Mistral 7B (7GB) - ๊ฒฝ๋Ÿ‰ ๋ชจ๋ธ

ํ’ˆ์งˆ ์ค‘์‹œ

  • EXAONE 3.5 7.8B โญ (7.3GB) - ํšจ์œจ์„ฑ ์ตœ๊ณ 
  • Qwen2.5 14B (14GB) - ๋‹ค๊ตญ์–ด ๊ฐ•์ 
  • SOLAR 10.7B (10GB) - ํ•œ๊ตญ์–ด ํŠนํ™”

์ตœ๊ณ  ์„ฑ๋Šฅ (๋А๋ฆผ)

  • Llama 3.1 70B ๐Ÿ”’ (70GB) - ์ตœ๊ณ  ํ’ˆ์งˆ
  • Yi 1.5 34B (34GB) - ๊ธด ๋ฌธ๋งฅ

Llama ์ƒํƒœ๊ณ„

  • Llama-3 Open-Ko 8B ๐Ÿ”ฅ (7.5GB)
  • Llama 3.1 8B ๐Ÿ”’ (8GB)

๐Ÿ“ฆ ๋กœ์ปฌ ์‹คํ–‰

์„ค์น˜

# ์ €์žฅ์†Œ ํด๋ก 
git clone https://github.com/catchitplay/simple-chatbot-gradio.git
cd simple-chatbot-gradio

# ๊ฐ€์ƒํ™˜๊ฒฝ ์ƒ์„ฑ (๊ถŒ์žฅ)
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

# ์˜์กด์„ฑ ์„ค์น˜
pip install -r requirements.txt

RTX 5080 ๋“ฑ ์ตœ์‹  GPU ์‚ฌ์šฉ ์‹œ:

# PyTorch nightly ์„ค์น˜ (CUDA 12.8+ ์ง€์›)
pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128

.env ํŒŒ์ผ ์„ค์ •

# .env ํŒŒ์ผ ์ƒ์„ฑ
echo "HF_TOKEN=your_hugging_face_token" > .env

HF_TOKEN ๋ฐœ๊ธ‰ ๋ฐฉ๋ฒ•:

  1. https://huggingface.co/settings/tokens ์ ‘์†
  2. "New token" ํด๋ฆญ
  3. "Read" ๊ถŒํ•œ ์„ ํƒ
  4. ์ƒ์„ฑ๋œ ํ† ํฐ ๋ณต์‚ฌ

์‹คํ–‰

python app.py

๋ธŒ๋ผ์šฐ์ €์—์„œ http://localhost:7860 ์ ‘์†

์‹œ์ž‘ ์‹œ ์ž๋™ ํ™˜๊ฒฝ ๊ฐ์ง€ ์ถœ๋ ฅ: ```

Hardware Environment Detection

Platform: local Hardware: local_gpu GPU Available: True GPU Name: NVIDIA GeForce RTX 5080 CPU Cores: 16 OS: Linux Description: ๐Ÿ–ฅ๏ธ Local - GPU (NVIDIA GeForce RTX 5080)


**์ฐธ๊ณ **:
- ๋กœ์ปฌ ํ™˜๊ฒฝ ์ž๋™ ๊ฐ์ง€: CPU/GPU/Apple Silicon MPS
- CUDA ํ˜ธํ™˜์„ฑ ์ž๋™ ํ…Œ์ŠคํŠธ (GPU ์˜ค๋ฅ˜ ์‹œ CPU ํด๋ฐฑ)
- ์ฒซ ์‹คํ–‰ ์‹œ ๋ชจ๋ธ ๋‹ค์šด๋กœ๋“œ (4-14GB, 5-20๋ถ„ ์†Œ์š”)
- GPU ๊ถŒ์žฅ (RTX ์‹œ๋ฆฌ์ฆˆ, A100, Apple Silicon ๋“ฑ)

### ๋ฆฌ๋ˆ…์Šค ์‹œ์Šคํ…œ ์„œ๋น„์Šค๋กœ ์„ค์น˜ (์ž๋™ ์‹œ์ž‘)

์„œ๋ฒ„ ๋ถ€ํŒ… ์‹œ ์ฑ—๋ด‡์„ ์ž๋™์œผ๋กœ ์‹คํ–‰ํ•˜๋ ค๋ฉด systemd ์„œ๋น„์Šค๋กœ ์„ค์น˜ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

#### 1. ์„ค์น˜ ์Šคํฌ๋ฆฝํŠธ ์‹คํ–‰

```bash
# ํ”„๋กœ์ ํŠธ ๋””๋ ‰ํ† ๋ฆฌ์—์„œ ์‹คํ–‰
sudo ./install-service.sh

์„ค์น˜ ์Šคํฌ๋ฆฝํŠธ๊ฐ€ ์ž๋™์œผ๋กœ:

  • ํ˜„์žฌ ์‚ฌ์šฉ์ž์™€ ๋””๋ ‰ํ† ๋ฆฌ ๊ฒฝ๋กœ๋ฅผ ๊ฐ์ง€
  • systemd ์„œ๋น„์Šค ํŒŒ์ผ์„ /etc/systemd/system/chatbot.service์— ์„ค์น˜
  • ๋กœ๊ทธ ํŒŒ์ผ ์ƒ์„ฑ (/var/log/chatbot.log, /var/log/chatbot-error.log)
  • ๋ถ€ํŒ… ์‹œ ์ž๋™ ์‹œ์ž‘ ํ™œ์„ฑํ™”
  • ์„œ๋น„์Šค ์ฆ‰์‹œ ์‹œ์ž‘ ์—ฌ๋ถ€ ํ™•์ธ

2. ์„œ๋น„์Šค ๊ด€๋ฆฌ ๋ช…๋ น์–ด

# ์„œ๋น„์Šค ์‹œ์ž‘
sudo systemctl start chatbot

# ์„œ๋น„์Šค ์ค‘์ง€
sudo systemctl stop chatbot

# ์„œ๋น„์Šค ์žฌ์‹œ์ž‘
sudo systemctl restart chatbot

# ์„œ๋น„์Šค ์ƒํƒœ ํ™•์ธ
sudo systemctl status chatbot

# ์‹ค์‹œ๊ฐ„ ๋กœ๊ทธ ๋ณด๊ธฐ
sudo journalctl -u chatbot -f

# ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ๋กœ๊ทธ ๋ณด๊ธฐ
tail -f /var/log/chatbot.log

# ์—๋Ÿฌ ๋กœ๊ทธ ๋ณด๊ธฐ
tail -f /var/log/chatbot-error.log

# ๋ถ€ํŒ… ์‹œ ์ž๋™ ์‹œ์ž‘ ํ™œ์„ฑํ™”
sudo systemctl enable chatbot

# ๋ถ€ํŒ… ์‹œ ์ž๋™ ์‹œ์ž‘ ๋น„ํ™œ์„ฑํ™”
sudo systemctl disable chatbot

3. ์„œ๋น„์Šค ์‚ญ์ œ

์„œ๋น„์Šค๋ฅผ ์™„์ „ํžˆ ์ œ๊ฑฐํ•˜๋ ค๋ฉด:

# ์„œ๋น„์Šค ์ค‘์ง€ ๋ฐ ๋น„ํ™œ์„ฑํ™”
sudo systemctl stop chatbot
sudo systemctl disable chatbot

# ์„œ๋น„์Šค ํŒŒ์ผ ์‚ญ์ œ
sudo rm /etc/systemd/system/chatbot.service

# systemd ๋ฐ๋ชฌ ์žฌ๋กœ๋“œ
sudo systemctl daemon-reload

# ๋กœ๊ทธ ํŒŒ์ผ ์‚ญ์ œ (์„ ํƒ์‚ฌํ•ญ)
sudo rm /var/log/chatbot.log /var/log/chatbot-error.log

4. ์ฃผ์˜์‚ฌํ•ญ

  • ๊ฐ€์ƒํ™˜๊ฒฝ ํ•„์ˆ˜: ์„œ๋น„์Šค ์„ค์น˜ ์ „์— venv ๋””๋ ‰ํ† ๋ฆฌ๊ฐ€ ์กด์žฌํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค
  • ํฌํŠธ ์ถฉ๋Œ: ๊ธฐ์กด ํ”„๋กœ์„ธ์Šค๊ฐ€ 7860 ํฌํŠธ๋ฅผ ์‚ฌ์šฉ ์ค‘์ด๋ฉด ์„œ๋น„์Šค๊ฐ€ ์‹œ์ž‘๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค
  • ๊ถŒํ•œ: ์„ค์น˜ ์Šคํฌ๋ฆฝํŠธ๋Š” ๋ฐ˜๋“œ์‹œ sudo๋กœ ์‹คํ–‰ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค
  • ์žฌ์‹œ์ž‘: ์•ฑ ์ฝ”๋“œ ๋ณ€๊ฒฝ ํ›„์—๋Š” sudo systemctl restart chatbot ์‹คํ–‰ ํ•„์š”
  • ๋กœ๊ทธ ํ™•์ธ: ๋ฌธ์ œ ๋ฐœ์ƒ ์‹œ ๋กœ๊ทธ ํŒŒ์ผ์„ ๋จผ์ € ํ™•์ธํ•˜์„ธ์š”

5. ์ˆ˜๋™ ์„œ๋น„์Šค ์„ค์ • (๊ณ ๊ธ‰)

์ž๋™ ์„ค์น˜ ์Šคํฌ๋ฆฝํŠธ ๋Œ€์‹  ์ˆ˜๋™์œผ๋กœ ์„ค์ •ํ•˜๋ ค๋ฉด:

# 1. chatbot.service ํŒŒ์ผ ํŽธ์ง‘
sudo nano /etc/systemd/system/chatbot.service

# 2. ๋‹ค์Œ ๋‚ด์šฉ ์ž…๋ ฅ (๊ฒฝ๋กœ์™€ ์‚ฌ์šฉ์ž๋ช… ์ˆ˜์ • ํ•„์š”)
[Unit]
Description=Multi-Model Chatbot Gradio Service
After=network.target

[Service]
Type=simple
User=YOUR_USERNAME
WorkingDirectory=/path/to/simple-chatbot-gradio
Environment="PATH=/path/to/simple-chatbot-gradio/venv/bin:/usr/bin:/bin"
ExecStart=/path/to/simple-chatbot-gradio/venv/bin/python app.py
Restart=on-failure
RestartSec=10
StandardOutput=append:/var/log/chatbot.log
StandardError=append:/var/log/chatbot-error.log

[Install]
WantedBy=multi-user.target

# 3. ๋กœ๊ทธ ํŒŒ์ผ ์ƒ์„ฑ
sudo touch /var/log/chatbot.log /var/log/chatbot-error.log
sudo chown YOUR_USERNAME:YOUR_USERNAME /var/log/chatbot.log /var/log/chatbot-error.log

# 4. systemd ๋ฐ๋ชฌ ์žฌ๋กœ๋“œ ๋ฐ ์„œ๋น„์Šค ํ™œ์„ฑํ™”
sudo systemctl daemon-reload
sudo systemctl enable chatbot
sudo systemctl start chatbot

6. ํŠธ๋Ÿฌ๋ธ”์ŠˆํŒ…

์„œ๋น„์Šค๊ฐ€ ์‹œ์ž‘๋˜์ง€ ์•Š๋Š” ๊ฒฝ์šฐ:

# ์„œ๋น„์Šค ์ƒํƒœ ํ™•์ธ
sudo systemctl status chatbot

# ์—๋Ÿฌ ๋กœ๊ทธ ํ™•์ธ
sudo journalctl -u chatbot -n 50

# ์ˆ˜๋™ ์‹คํ–‰์œผ๋กœ ์—๋Ÿฌ ํ™•์ธ
cd /path/to/simple-chatbot-gradio
source venv/bin/activate
python app.py

ํฌํŠธ๊ฐ€ ์ด๋ฏธ ์‚ฌ์šฉ ์ค‘์ธ ๊ฒฝ์šฐ:

# ํฌํŠธ 7860์„ ์‚ฌ์šฉํ•˜๋Š” ํ”„๋กœ์„ธ์Šค ํ™•์ธ
sudo lsof -i :7860

# ํ”„๋กœ์„ธ์Šค ์ข…๋ฃŒ (PID ํ™•์ธ ํ›„)
sudo kill -9 <PID>

๊ฐ€์ƒํ™˜๊ฒฝ ๊ฒฝ๋กœ ๋ฌธ์ œ:

# ๊ฐ€์ƒํ™˜๊ฒฝ ์žฌ์ƒ์„ฑ
python -m venv venv
source venv/bin/activate
pip install -r requirements-local.txt

๐Ÿ› ๏ธ ๊ธฐ์ˆ  ์Šคํƒ

  • ํ”„๋ ˆ์ž„์›Œํฌ: Gradio 5.49.1
  • ML ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ: Transformers 4.57.1, PyTorch 2.2.0+
  • GPU ์ง€์›:
    • HF Spaces: ZeroGPU (NVIDIA H200)
    • ๋กœ์ปฌ: CUDA 12.0+, Apple Silicon MPS
    • ์ตœ์‹  GPU: PyTorch nightly (CUDA 12.8+) ์ง€์›
  • ์–ธ์–ด: Python 3.10+

๐Ÿ“š Dependencies

# Core
gradio==5.49.1
transformers==4.57.1
torch>=2.2.0  # HF Spaces: 2.2.0 (ZeroGPU), Local: 2.2.0+ or nightly
safetensors==0.6.2
accelerate==0.26.1
sentencepiece==0.2.0
protobuf==4.25.1
huggingface-hub>=0.19.0
python-dotenv==1.0.0
spaces  # ZeroGPU support (HF Spaces only)

ํ™˜๊ฒฝ๋ณ„ PyTorch ๋ฒ„์ „:

  • HF Spaces: PyTorch 2.2.0 (ZeroGPU ํ˜ธํ™˜)
  • ๋กœ์ปฌ ์ผ๋ฐ˜ GPU: PyTorch 2.2.0+ (CUDA 12.0+)
  • ๋กœ์ปฌ ์ตœ์‹  GPU (RTX 5080 ๋“ฑ): PyTorch nightly (CUDA 12.8+)
  • ๋กœ์ปฌ CPU: PyTorch 2.2.0+ (CPU-only build)

๐Ÿ”’ Gated ๋ชจ๋ธ ์‚ฌ์šฉ๋ฒ•

1. ๋ชจ๋ธ ์Šน์ธ ์š”์ฒญ

๊ฐ Gated ๋ชจ๋ธ ํŽ˜์ด์ง€์—์„œ "Request Access" ํด๋ฆญ:

2. HF_TOKEN ์„ค์ •

์Šน์ธ ํ›„ HF_TOKEN์„ .env ํŒŒ์ผ์— ์„ค์ • (์œ„ ์ฐธ์กฐ)

3. Space Secrets ์„ค์ • (HF Spaces)

Space Settings โ†’ Repository secrets:

  • Name: HF_TOKEN
  • Value: your_token_here

โš ๏ธ ์ œํ•œ์‚ฌํ•ญ ๋ฐ ์•Œ๋ ค์ง„ ์ด์Šˆ

๊ณตํ†ต

  • ๋ชจ๋ธ ํฌ๊ธฐ: 2-70GB (๋กœ๋”ฉ ์‹œ๊ฐ„ ํ•„์š”)
  • ์ปจํ…์ŠคํŠธ: ๋Œ€ํ™” ํžˆ์Šคํ† ๋ฆฌ ์œ ์ง€ (์ตœ๊ทผ 3ํ„ด)
  • ๋ฉ”๋ชจ๋ฆฌ: ํฐ ๋ชจ๋ธ์€ GPU/๊ณ ์šฉ๋Ÿ‰ RAM ํ•„์š”

ํ™˜๊ฒฝ๋ณ„ ์ œ์•ฝ

HF Spaces - ZeroGPU:

  • ์ผ์ผ ํ•œ๋„: 25๋ถ„ (PRO ๊ตฌ๋… ํ•„์š”)
  • ๋Œ€๊ธฐ์—ด: ์‚ฌ์šฉ์ž ๋งŽ์„ ๊ฒฝ์šฐ ๋Œ€๊ธฐ
  • ๋น„์šฉ: $9/month

HF Spaces - CPU Upgrade:

  • ๋А๋ฆฐ ์†๋„: GPU ๋Œ€๋น„ 10-30๋ฐฐ ๋А๋ฆผ
  • ๋น„์šฉ: ์‹œ๊ฐ„๋‹น $0.03 ($22/month)
  • ๋ฉ”๋ชจ๋ฆฌ: 32GB RAM (๋Œ€ํ˜• ๋ชจ๋ธ ์ œ์•ฝ)

HF Spaces - CPU Basic:

  • ๋งค์šฐ ๋А๋ฆผ: 1-2๋ถ„ ์‘๋‹ต
  • ์ œํ•œ์  ์‚ฌ์šฉ
  • ๊ฒฝ๋Ÿ‰ ๋ชจ๋ธ ๊ถŒ์žฅ

๋กœ์ปฌ ํ™˜๊ฒฝ:

  • GPU ๋ฉ”๋ชจ๋ฆฌ: ํฐ ๋ชจ๋ธ์€ VRAM ๋ถ€์กฑ ๊ฐ€๋Šฅ
  • ์ตœ์‹  GPU: PyTorch nightly ํ•„์š” (RTX 5080 ๋“ฑ)
  • CPU ๋ชจ๋“œ: ๋งค์šฐ ๋А๋ฆผ (1-3๋ถ„ ์‘๋‹ต)

์•Œ๋ ค์ง„ ์ด์Šˆ ๋ฐ ํ•ด๊ฒฐ๋ฐฉ๋ฒ•

"CUDA has been initialized" ์˜ค๋ฅ˜ (ZeroGPU):

  • ์›์ธ: torch ์ „์— spaces import ํ•„์š”
  • ํ•ด๊ฒฐ: app.py์—์„œ spaces๋ฅผ ๊ฐ€์žฅ ๋จผ์ € import (์ด๋ฏธ ์ ์šฉ๋จ)

RTX 5080 ๋“ฑ Blackwell GPU์—์„œ CUDA ์˜ค๋ฅ˜:

  • ์›์ธ: CUDA 12.8+ ํ•„์š” (PyTorch 2.2.0์€ ๋ฏธ์ง€์›)
  • ํ•ด๊ฒฐ: PyTorch nightly ์„ค์น˜ (์œ„ ์„ค์น˜ ์„น์…˜ ์ฐธ์กฐ)

GPU ๊ฐ์ง€๋˜์ง€๋งŒ CPU ๋ชจ๋“œ๋กœ ๋™์ž‘:

  • ์›์ธ: CUDA ํ˜ธํ™˜์„ฑ ํ…Œ์ŠคํŠธ ์‹คํŒจ
  • ํ•ด๊ฒฐ: PyTorch ๋ฒ„์ „ ํ™•์ธ, CUDA ๋“œ๋ผ์ด๋ฒ„ ์—…๋ฐ์ดํŠธ

๐Ÿ”— ๊ด€๋ จ ๋ฆฌ์†Œ์Šค

๋ชจ๋ธ ์นด๋“œ

๋ฌธ์„œ

๐Ÿ“„ ๋ผ์ด์„ ์Šค

MIT License

๐Ÿ™‹โ€โ™‚๏ธ ๋ฌธ์˜

์ด์Šˆ๋‚˜ ์งˆ๋ฌธ์ด ์žˆ์œผ์‹œ๋ฉด GitHub Issues๋ฅผ ํ†ตํ•ด ๋ฌธ์˜ํ•ด์ฃผ์„ธ์š”.


๐Ÿ’ก TIP:

  • ๋น ๋ฅธ ํ…Œ์ŠคํŠธ: EXAONE 2.4B โšก
  • ๊ท ํ˜•์žกํžŒ ์„ฑ๋Šฅ: EXAONE 7.8B โญ
  • ์ตœ๊ณ  ํ’ˆ์งˆ: Llama 3.1 70B ๐Ÿ”’ (๋А๋ฆผ)