simple-chat / README.md
alex4cip's picture
docs: Update README with multi-environment support and remove redundant footer
f1ac66c
---
title: Multi-Model Korean LLM Chatbot
emoji: ๐Ÿค–
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
license: mit
---
# ๐Ÿค– Multi-Model Korean LLM Chatbot
13๊ฐœ์˜ ๋‹ค์–‘ํ•œ ํ•œ๊ตญ์–ด LLM ๋ชจ๋ธ์„ ์„ ํƒํ•˜์—ฌ ๋Œ€ํ™”ํ•  ์ˆ˜ ์žˆ๋Š” ๋ฉ€ํ‹ฐ๋ชจ๋ธ ์ฑ—๋ด‡์ž…๋‹ˆ๋‹ค. **๋กœ์ปฌ ํ™˜๊ฒฝ(CPU/GPU)**๊ณผ **Hugging Face Spaces(CPU Basic/Upgrade, ZeroGPU)**๋ฅผ ์ž๋™ ๊ฐ์ง€ํ•˜์—ฌ ์ตœ์  ์„ค์ •์„ ์ ์šฉํ•ฉ๋‹ˆ๋‹ค.
## โœจ ์ฃผ์š” ํŠน์ง•
- **๐ŸŽฏ 13๊ฐœ ๋ชจ๋ธ ์„ ํƒ**: ๋‹ค์–‘ํ•œ ํฌ๊ธฐ์™€ ํŠน์„ฑ์˜ LLM ๋ชจ๋ธ ์ง€์›
- **๐Ÿ‡ฐ๐Ÿ‡ท ํ•œ๊ธ€ ์ตœ์ ํ™”**: ํ•œ๊ตญ์–ด ์„ฑ๋Šฅ์ด ์šฐ์ˆ˜ํ•œ ๋ชจ๋ธ๋“ค๋กœ ๊ตฌ์„ฑ
- **๐Ÿ–ฅ๏ธ ๋ฉ€ํ‹ฐ ํ™˜๊ฒฝ ์ง€์›**: ๋กœ์ปฌ(CPU/GPU) + HF Spaces(CPU Basic/Upgrade, ZeroGPU) ์ž๋™ ๊ฐ์ง€
- **๐Ÿ’พ ์บ์‹œ ์‹œ์Šคํ…œ**: ๋ชจ๋ธ ์žฌ๋‹ค์šด๋กœ๋“œ ๋ฐฉ์ง€, ๋น ๋ฅธ ๋กœ๋”ฉ
- **๐Ÿ”„ Lazy Loading**: ์„ ํƒํ•œ ๋ชจ๋ธ๋งŒ ๋กœ๋“œํ•˜์—ฌ ๋ฆฌ์†Œ์Šค ์ ˆ์•ฝ
- **๐Ÿ›ก๏ธ ์•ˆ์ •์„ฑ**: RTX 5080 ๋“ฑ ์ตœ์‹  GPU ์ง€์›, CUDA ํ˜ธํ™˜์„ฑ ์ž๋™ ํ…Œ์ŠคํŠธ
## ๐ŸŽฏ ์ง€์› ๋ชจ๋ธ (13๊ฐœ)
### ๐ŸŒŸ ์ถ”์ฒœ ํ•œ๊ตญ์–ด ๋ชจ๋ธ
| ๋ชจ๋ธ | ํฌ๊ธฐ | ํŠน์ง• | ์ƒํƒœ |
|------|------|------|------|
| **EXAONE 3.5 7.8B** | 7.3GB | โญ ํŒŒ๋ผ๋ฏธํ„ฐ ๋Œ€๋น„ ์ตœ๊ณ  ํšจ์œจ | Public |
| **EXAONE 3.5 2.4B** | 2.2GB | โšก ์ดˆ๊ฒฝ๋Ÿ‰, ๋น ๋ฅธ ์‘๋‹ต | Public |
| **Llama-3 Open-Ko 8B** | 7.5GB | ๐Ÿ”ฅ Llama 3 ์ƒํƒœ๊ณ„ | Public |
### ๐Ÿ“š ์ „์ฒด ๋ชจ๋ธ ๋ชฉ๋ก
#### Public ๋ชจ๋ธ (10๊ฐœ)
1. LGAI-EXAONE/EXAONE-3.5-7.8B-Instruct
2. LGAI-EXAONE/EXAONE-3.5-2.4B-Instruct
3. beomi/Llama-3-Open-Ko-8B
4. Qwen/Qwen2.5-7B-Instruct
5. Qwen/Qwen2.5-14B-Instruct
6. 01-ai/Yi-1.5-9B-Chat
7. 01-ai/Yi-1.5-34B-Chat
8. mistralai/Mistral-7B-Instruct-v0.3
9. upstage/SOLAR-10.7B-Instruct-v1.0
10. EleutherAI/polyglot-ko-5.8b
#### Gated ๋ชจ๋ธ (3๊ฐœ) ๐Ÿ”’
11. meta-llama/Llama-3.1-8B-Instruct
12. meta-llama/Llama-3.1-70B-Instruct
13. CohereForAI/aya-23-8B
> **์ฐธ๊ณ **: Gated ๋ชจ๋ธ์€ Hugging Face์—์„œ ๋ณ„๋„ ์Šน์ธ ํ•„์š”
## ๐Ÿš€ ์ง€์› ํ™˜๊ฒฝ
### ๋กœ์ปฌ ํ™˜๊ฒฝ (๊ฐœ๋ฐœ/๊ฐœ์ธ ์‚ฌ์šฉ)
**1. Local GPU (๊ถŒ์žฅ)**
- **์žฅ์ **:
- โšก ๋น ๋ฅธ ์‘๋‹ต (5-10์ดˆ, GPU ๊ฐ€์†)
- ๐Ÿ”“ ๋ฌด์ œํ•œ ์‚ฌ์šฉ
- ๐Ÿ’ฐ ๋น„์šฉ ์—†์Œ
- **์ง€์› GPU**:
- NVIDIA CUDA ์ง€์› GPU (RTX ์‹œ๋ฆฌ์ฆˆ, A100 ๋“ฑ)
- Apple Silicon GPU (M1/M2/M3 - MPS ๊ฐ€์†)
- RTX 5080 ๋“ฑ ์ตœ์‹  Blackwell GPU (PyTorch nightly ํ•„์š”)
- **์š”๊ตฌ์‚ฌํ•ญ**: CUDA 12.0+ ๋˜๋Š” Apple Silicon
**2. Local CPU**
- **์žฅ์ **:
- ๐Ÿ–ฅ๏ธ GPU ์—†์ด๋„ ์‹คํ–‰ ๊ฐ€๋Šฅ
- ๐Ÿ”ง ๊ฐ„๋‹จํ•œ ์„ค์ •
- **์ œ์•ฝ**:
- โณ ๋А๋ฆฐ ์‘๋‹ต (1~3๋ถ„)
- ๐Ÿ”’ ๊ฒฝ๋Ÿ‰ ๋ชจ๋ธ ๊ถŒ์žฅ (EXAONE 2.4B, Mistral 7B)
### Hugging Face Spaces (ํด๋ผ์šฐ๋“œ ๋ฐฐํฌ)
**1. ZeroGPU (์ถ”์ฒœ)**
- **์žฅ์ **:
- โšก ๋น ๋ฅธ ์‘๋‹ต (3-10์ดˆ, NVIDIA H200 GPU ๊ฐ€์†)
- ๐Ÿ’ฐ ์ €๋ ดํ•œ ๋น„์šฉ ($9/month)
- ๐Ÿ”‹ ์ž๋™ GPU ํ• ๋‹น/ํ•ด์ œ
- **์ œ์•ฝ**:
- ํ•˜๋ฃจ 25๋ถ„ ๋ฌด๋ฃŒ ์‚ฌ์šฉ (PRO ๊ตฌ๋… ํ•„์š”)
- ๋Œ€๊ธฐ์—ด ๊ฐ€๋Šฅ (์‚ฌ์šฉ์ž ๋งŽ์„ ๊ฒฝ์šฐ)
- **๋น„์šฉ**: $9/month (PRO ๊ตฌ๋…)
**2. CPU Upgrade**
- **์žฅ์ **:
- โฐ ๋ฌด์ œํ•œ ์‚ฌ์šฉ
- ๐Ÿ“Š ์˜ˆ์ธก ๊ฐ€๋Šฅํ•œ ์„ฑ๋Šฅ
- ๐Ÿ”ง ๊ฐ„๋‹จํ•œ ์„ค์ •
- **์ œ์•ฝ**:
- ๐Ÿข ๋А๋ฆฐ ์‘๋‹ต (30์ดˆ~1๋ถ„)
- ๐Ÿ’ต ์ƒ๋Œ€์ ์œผ๋กœ ๋น„์‹ผ ๋น„์šฉ
- **๋น„์šฉ**: $0.03/hour (์›” ์•ฝ $22)
**3. CPU Basic (๋ฌด๋ฃŒ)**
- **์žฅ์ **:
- ๐Ÿ’ก ๋ฌด๋ฃŒ ํ‹ฐ์–ด
- ๐Ÿงช ํ…Œ์ŠคํŠธ/ํ•™์Šต ์šฉ๋„
- **์ œ์•ฝ**:
- โณ ๋งค์šฐ ๋А๋ฆฐ ์‘๋‹ต (1~2๋ถ„)
- ๐Ÿ”’ ๊ฒฝ๋Ÿ‰ ๋ชจ๋ธ๋งŒ ๊ถŒ์žฅ
- โš ๏ธ ์ œํ•œ์  ์‚ฌ์šฉ
## โš™๏ธ ํ™˜๊ฒฝ๋ณ„ ์„ค์ • ๋ฐฉ๋ฒ•
### ๋กœ์ปฌ ์‹คํ–‰ (์ž๋™ ๊ฐ์ง€)
์•ฑ์ด ์ž๋™์œผ๋กœ ๋กœ์ปฌ ํ™˜๊ฒฝ์„ ๊ฐ์ง€ํ•˜๊ณ  ์ตœ์  ์„ค์ •์„ ์ ์šฉํ•ฉ๋‹ˆ๋‹ค:
```bash
python app.py
```
**์ž๋™ ๊ฐ์ง€ ๋กœ์ง**:
1. **GPU ๊ฐ์ง€**: CUDA/MPS ์‚ฌ์šฉ ๊ฐ€๋Šฅ ์—ฌ๋ถ€ ํ™•์ธ
2. **CUDA ํ˜ธํ™˜์„ฑ ํ…Œ์ŠคํŠธ**: ํ…์„œ ์—ฐ์‚ฐ์œผ๋กœ ์‹ค์ œ GPU ์ž‘๋™ ๊ฒ€์ฆ
3. **CPU ํด๋ฐฑ**: GPU ์˜ค๋ฅ˜ ์‹œ ์ž๋™ CPU ๋ชจ๋“œ ์ „ํ™˜
4. **ํ™˜๊ฒฝ ์ •๋ณด ์ถœ๋ ฅ**: ์‹œ์ž‘ ์‹œ ๊ฐ์ง€๋œ ํ™˜๊ฒฝ ์ •๋ณด ํ‘œ์‹œ
### HF Spaces ๋ฐฐํฌ (์ž๋™ ๊ฐ์ง€)
Space Settings์—์„œ ํ•˜๋“œ์›จ์–ด๋ฅผ ๋ณ€๊ฒฝํ•˜๋ฉด ์•ฑ์ด ์ž๋™์œผ๋กœ ๊ฐ์ง€:
**ZeroGPU๋กœ ๋ณ€๊ฒฝ**:
1. Space Settings โ†’ Hardware
2. **ZeroGPU** ์„ ํƒ
3. Confirm โ†’ ๋นŒ๋“œ ์™„๋ฃŒ ๋Œ€๊ธฐ (1-2๋ถ„)
4. UI์— "๐Ÿš€ HF Spaces - ZeroGPU" ํ‘œ์‹œ ํ™•์ธ
**CPU Upgrade๋กœ ๋ณ€๊ฒฝ**:
1. Space Settings โ†’ Hardware
2. **CPU Upgrade (8 vCPU / 32 GB)** ์„ ํƒ
3. Confirm โ†’ ๋นŒ๋“œ ์™„๋ฃŒ ๋Œ€๊ธฐ (1-2๋ถ„)
4. UI์— "โš™๏ธ HF Spaces - CPU Upgrade" ํ‘œ์‹œ ํ™•์ธ
**CPU Basic (๋ฌด๋ฃŒ)**:
- ๊ธฐ๋ณธ ์„ค์ •, ๋ณ„๋„ ๋ณ€๊ฒฝ ๋ถˆํ•„์š”
- UI์— "๐Ÿ’ป HF Spaces - CPU Basic" ํ‘œ์‹œ
## ๐Ÿ“Š ์„ฑ๋Šฅ ๋น„๊ต
| ํ•ญ๋ชฉ | Local GPU | Local CPU | ZeroGPU | CPU Upgrade | CPU Basic |
|------|-----------|-----------|---------|-------------|-----------|
| **์ฒซ ์‘๋‹ต** | 10-20์ดˆ | 2-5๋ถ„ | 10-20์ดˆ | 1-2๋ถ„ | 2-3๋ถ„ |
| **์ดํ›„ ์‘๋‹ต** | 5-10์ดˆ | 1-3๋ถ„ | 3-10์ดˆ | 30์ดˆ~1๋ถ„ | 1-2๋ถ„ |
| **์ผ์ผ ํ•œ๋„** | ๋ฌด์ œํ•œ | ๋ฌด์ œํ•œ | 25๋ถ„ | ๋ฌด์ œํ•œ | ์ œํ•œ์  |
| **์›” ๋น„์šฉ** | $0 | $0 | $9 | $22 | $0 |
| **GPU** | ์‚ฌ์šฉ์ž GPU | ์—†์Œ | H200 (70GB) | ์—†์Œ | ์—†์Œ |
| **๊ถŒ์žฅ ๋ชจ๋ธ** | ์ „์ฒด | ๊ฒฝ๋Ÿ‰ | ์ „์ฒด | ์ „์ฒด | ๊ฒฝ๋Ÿ‰ |
## ๐Ÿ”ง ๊ธฐ์ˆ  ๊ตฌ์กฐ
### ๋ฉ€ํ‹ฐ ํ™˜๊ฒฝ ์ž๋™ ๊ฐ์ง€ ์‹œ์Šคํ…œ
```python
# 1. CUDA ์ดˆ๊ธฐํ™” ์˜ค๋ฅ˜ ๋ฐฉ์ง€: spaces๋ฅผ ๋จผ์ € import
try:
import spaces
ZEROGPU_AVAILABLE = True
except ImportError:
ZEROGPU_AVAILABLE = False
# 2. ์ดํ›„ CUDA ๊ด€๋ จ ํŒจํ‚ค์ง€ import
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
# 3. ํ•˜๋“œ์›จ์–ด ํ™˜๊ฒฝ ๊ฐ์ง€
def detect_hardware_environment():
"""
Returns: {
'platform': 'hf_spaces' | 'local',
'hardware': 'zerogpu' | 'cpu_upgrade' | 'cpu_basic' | 'local_gpu' | 'local_cpu',
'gpu_available': bool,
'gpu_name': str or None,
'cuda_compatible': bool
}
"""
# HF Spaces ๊ฐ์ง€
if os.environ.get('SPACE_ID'):
if ZEROGPU_AVAILABLE:
return 'zerogpu'
elif cpu_count >= 8:
return 'cpu_upgrade'
else:
return 'cpu_basic'
# ๋กœ์ปฌ ํ™˜๊ฒฝ ๊ฐ์ง€
if torch.cuda.is_available():
# CUDA ํ˜ธํ™˜์„ฑ ํ…Œ์ŠคํŠธ (RTX 5080 ๋“ฑ ์ตœ์‹  GPU ์ง€์›)
if test_cuda_compatibility():
return 'local_gpu'
else:
return 'local_cpu' # CUDA ์˜ค๋ฅ˜ โ†’ CPU ํด๋ฐฑ
elif torch.backends.mps.is_available():
return 'local_gpu' # Apple Silicon
else:
return 'local_cpu'
# 4. ์กฐ๊ฑด๋ถ€ GPU decorator ์ ์šฉ
if ZEROGPU_AVAILABLE:
@spaces.GPU(duration=120)
def generate_response(message, history):
return generate_response_impl(message, history)
else:
def generate_response(message, history):
return generate_response_impl(message, history)
```
### Lazy Loading & ์บ์‹œ ์‹œ์Šคํ…œ
**์Šค๋งˆํŠธ ๋ชจ๋ธ ๋กœ๋”ฉ**:
```python
def load_model_once(model_index=None):
"""๋ชจ๋ธ ๋ณ€๊ฒฝ ์‹œ์—๋งŒ ๋กœ๋“œ (Lazy Loading)"""
global model, tokenizer, loaded_model_name
model_name = MODEL_CONFIGS[model_index]["MODEL_NAME"]
# 1. ์ด๋ฏธ ๋กœ๋“œ๋œ ๋ชจ๋ธ์ด๋ฉด ์žฌ์‚ฌ์šฉ
if loaded_model_name == model_name:
print(f"โ„น๏ธ Model {model_name} already loaded, reusing...")
return model, tokenizer
# 2. ์บ์‹œ ํ™•์ธ โ†’ UI์— ๋‹ค์šด๋กœ๋“œ vs ๋กœ๋”ฉ ๋ฉ”์‹œ์ง€ ํ‘œ์‹œ
is_cached = check_model_cached(model_name)
if is_cached:
print(f"โœ… Model found in cache, loading from disk...")
else:
print(f"๐Ÿ“ฅ Model not in cache, downloading (~4-14GB)...")
# 3. ์ด์ „ ๋ชจ๋ธ ๋ฉ”๋ชจ๋ฆฌ ํ•ด์ œ
if model is not None:
del model, tokenizer
if HW_ENV['cuda_compatible']:
torch.cuda.empty_cache()
# 4. ์ƒˆ ๋ชจ๋ธ ๋กœ๋“œ (ํ™˜๊ฒฝ๋ณ„ ์ตœ์ ํ™”)
device = "cuda" if HW_ENV['gpu_available'] and HW_ENV['cuda_compatible'] else "cpu"
if device == "cuda":
model = AutoModelForCausalLM.from_pretrained(
model_name,
dtype=torch.float16, # GPU: float16
device_map="auto",
)
else:
model = AutoModelForCausalLM.from_pretrained(
model_name,
dtype=torch.float32, # CPU: float32
)
loaded_model_name = model_name
return model, tokenizer
```
**์บ์‹œ ์ƒํƒœ ํ™•์ธ**:
- ์‚ฌ์šฉ์ž์—๊ฒŒ "๐Ÿ’พ ์บ์‹œ๋œ ๋ชจ๋ธ ๋กœ๋”ฉ ์ค‘" vs "๐Ÿ“ฅ ๋ชจ๋ธ ๋‹ค์šด๋กœ๋“œ ์ค‘" ์‹ค์‹œ๊ฐ„ ํ‘œ์‹œ
- ๋‹ค์šด๋กœ๋“œ ์‹œ๊ฐ„ ์˜ˆ์ธก ์ •๋ณด ์ œ๊ณต (์ฒซ ์‚ฌ์šฉ ์‹œ 5-20๋ถ„)
## ๐Ÿ“ ์‚ฌ์šฉ ๋ฐฉ๋ฒ•
### 1. Space ์ ‘์†
https://huggingface.co/spaces/catchitplay/simple-chat
### 2. ๋ชจ๋ธ ์„ ํƒ
- ๋“œ๋กญ๋‹ค์šด์—์„œ ์›ํ•˜๋Š” ๋ชจ๋ธ ์„ ํƒ
- ์บ์‹œ ์ƒํƒœ ํ™•์ธ (๐Ÿ’พ ์บ์‹œ๋จ / ๐Ÿ“ฅ ๋‹ค์šด๋กœ๋“œ ํ•„์š”)
- ์ฒซ ์‚ฌ์šฉ ์‹œ ๋ชจ๋ธ ๋‹ค์šด๋กœ๋“œ (2-14GB, 5-20๋ถ„)
### 3. ๋Œ€ํ™” ์‹œ์ž‘
```
์•ˆ๋…•ํ•˜์„ธ์š”
์ธ๊ณต์ง€๋Šฅ์— ๋Œ€ํ•ด ์„ค๋ช…ํ•ด์ฃผ์„ธ์š”
ํ•œ๊ตญ์˜ ์ˆ˜๋„๋Š” ์–ด๋””์ธ๊ฐ€์š”?
```
## ๐Ÿ’ก ๋ชจ๋ธ ์„ ํƒ ๊ฐ€์ด๋“œ
### ๋น ๋ฅธ ์‘๋‹ต์ด ํ•„์š”ํ•œ ๊ฒฝ์šฐ
- **EXAONE 3.5 2.4B** โšก (2.2GB) - ๊ฐ€์žฅ ๋น ๋ฆ„
- **Mistral 7B** (7GB) - ๊ฒฝ๋Ÿ‰ ๋ชจ๋ธ
### ํ’ˆ์งˆ ์ค‘์‹œ
- **EXAONE 3.5 7.8B** โญ (7.3GB) - ํšจ์œจ์„ฑ ์ตœ๊ณ 
- **Qwen2.5 14B** (14GB) - ๋‹ค๊ตญ์–ด ๊ฐ•์ 
- **SOLAR 10.7B** (10GB) - ํ•œ๊ตญ์–ด ํŠนํ™”
### ์ตœ๊ณ  ์„ฑ๋Šฅ (๋А๋ฆผ)
- **Llama 3.1 70B** ๐Ÿ”’ (70GB) - ์ตœ๊ณ  ํ’ˆ์งˆ
- **Yi 1.5 34B** (34GB) - ๊ธด ๋ฌธ๋งฅ
### Llama ์ƒํƒœ๊ณ„
- **Llama-3 Open-Ko 8B** ๐Ÿ”ฅ (7.5GB)
- **Llama 3.1 8B** ๐Ÿ”’ (8GB)
## ๐Ÿ“ฆ ๋กœ์ปฌ ์‹คํ–‰
### ์„ค์น˜
```bash
# ์ €์žฅ์†Œ ํด๋ก 
git clone https://github.com/catchitplay/simple-chatbot-gradio.git
cd simple-chatbot-gradio
# ๊ฐ€์ƒํ™˜๊ฒฝ ์ƒ์„ฑ (๊ถŒ์žฅ)
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# ์˜์กด์„ฑ ์„ค์น˜
pip install -r requirements.txt
```
**RTX 5080 ๋“ฑ ์ตœ์‹  GPU ์‚ฌ์šฉ ์‹œ**:
```bash
# PyTorch nightly ์„ค์น˜ (CUDA 12.8+ ์ง€์›)
pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128
```
### .env ํŒŒ์ผ ์„ค์ •
```bash
# .env ํŒŒ์ผ ์ƒ์„ฑ
echo "HF_TOKEN=your_hugging_face_token" > .env
```
**HF_TOKEN ๋ฐœ๊ธ‰ ๋ฐฉ๋ฒ•**:
1. https://huggingface.co/settings/tokens ์ ‘์†
2. "New token" ํด๋ฆญ
3. "Read" ๊ถŒํ•œ ์„ ํƒ
4. ์ƒ์„ฑ๋œ ํ† ํฐ ๋ณต์‚ฌ
### ์‹คํ–‰
```bash
python app.py
```
๋ธŒ๋ผ์šฐ์ €์—์„œ http://localhost:7860 ์ ‘์†
**์‹œ์ž‘ ์‹œ ์ž๋™ ํ™˜๊ฒฝ ๊ฐ์ง€ ์ถœ๋ ฅ**:
```
============================================================
Hardware Environment Detection
============================================================
Platform: local
Hardware: local_gpu
GPU Available: True
GPU Name: NVIDIA GeForce RTX 5080
CPU Cores: 16
OS: Linux
Description: ๐Ÿ–ฅ๏ธ Local - GPU (NVIDIA GeForce RTX 5080)
============================================================
```
**์ฐธ๊ณ **:
- ๋กœ์ปฌ ํ™˜๊ฒฝ ์ž๋™ ๊ฐ์ง€: CPU/GPU/Apple Silicon MPS
- CUDA ํ˜ธํ™˜์„ฑ ์ž๋™ ํ…Œ์ŠคํŠธ (GPU ์˜ค๋ฅ˜ ์‹œ CPU ํด๋ฐฑ)
- ์ฒซ ์‹คํ–‰ ์‹œ ๋ชจ๋ธ ๋‹ค์šด๋กœ๋“œ (4-14GB, 5-20๋ถ„ ์†Œ์š”)
- GPU ๊ถŒ์žฅ (RTX ์‹œ๋ฆฌ์ฆˆ, A100, Apple Silicon ๋“ฑ)
### ๋ฆฌ๋ˆ…์Šค ์‹œ์Šคํ…œ ์„œ๋น„์Šค๋กœ ์„ค์น˜ (์ž๋™ ์‹œ์ž‘)
์„œ๋ฒ„ ๋ถ€ํŒ… ์‹œ ์ฑ—๋ด‡์„ ์ž๋™์œผ๋กœ ์‹คํ–‰ํ•˜๋ ค๋ฉด systemd ์„œ๋น„์Šค๋กœ ์„ค์น˜ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
#### 1. ์„ค์น˜ ์Šคํฌ๋ฆฝํŠธ ์‹คํ–‰
```bash
# ํ”„๋กœ์ ํŠธ ๋””๋ ‰ํ† ๋ฆฌ์—์„œ ์‹คํ–‰
sudo ./install-service.sh
```
์„ค์น˜ ์Šคํฌ๋ฆฝํŠธ๊ฐ€ ์ž๋™์œผ๋กœ:
- ํ˜„์žฌ ์‚ฌ์šฉ์ž์™€ ๋””๋ ‰ํ† ๋ฆฌ ๊ฒฝ๋กœ๋ฅผ ๊ฐ์ง€
- systemd ์„œ๋น„์Šค ํŒŒ์ผ์„ `/etc/systemd/system/chatbot.service`์— ์„ค์น˜
- ๋กœ๊ทธ ํŒŒ์ผ ์ƒ์„ฑ (`/var/log/chatbot.log`, `/var/log/chatbot-error.log`)
- ๋ถ€ํŒ… ์‹œ ์ž๋™ ์‹œ์ž‘ ํ™œ์„ฑํ™”
- ์„œ๋น„์Šค ์ฆ‰์‹œ ์‹œ์ž‘ ์—ฌ๋ถ€ ํ™•์ธ
#### 2. ์„œ๋น„์Šค ๊ด€๋ฆฌ ๋ช…๋ น์–ด
```bash
# ์„œ๋น„์Šค ์‹œ์ž‘
sudo systemctl start chatbot
# ์„œ๋น„์Šค ์ค‘์ง€
sudo systemctl stop chatbot
# ์„œ๋น„์Šค ์žฌ์‹œ์ž‘
sudo systemctl restart chatbot
# ์„œ๋น„์Šค ์ƒํƒœ ํ™•์ธ
sudo systemctl status chatbot
# ์‹ค์‹œ๊ฐ„ ๋กœ๊ทธ ๋ณด๊ธฐ
sudo journalctl -u chatbot -f
# ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ๋กœ๊ทธ ๋ณด๊ธฐ
tail -f /var/log/chatbot.log
# ์—๋Ÿฌ ๋กœ๊ทธ ๋ณด๊ธฐ
tail -f /var/log/chatbot-error.log
# ๋ถ€ํŒ… ์‹œ ์ž๋™ ์‹œ์ž‘ ํ™œ์„ฑํ™”
sudo systemctl enable chatbot
# ๋ถ€ํŒ… ์‹œ ์ž๋™ ์‹œ์ž‘ ๋น„ํ™œ์„ฑํ™”
sudo systemctl disable chatbot
```
#### 3. ์„œ๋น„์Šค ์‚ญ์ œ
์„œ๋น„์Šค๋ฅผ ์™„์ „ํžˆ ์ œ๊ฑฐํ•˜๋ ค๋ฉด:
```bash
# ์„œ๋น„์Šค ์ค‘์ง€ ๋ฐ ๋น„ํ™œ์„ฑํ™”
sudo systemctl stop chatbot
sudo systemctl disable chatbot
# ์„œ๋น„์Šค ํŒŒ์ผ ์‚ญ์ œ
sudo rm /etc/systemd/system/chatbot.service
# systemd ๋ฐ๋ชฌ ์žฌ๋กœ๋“œ
sudo systemctl daemon-reload
# ๋กœ๊ทธ ํŒŒ์ผ ์‚ญ์ œ (์„ ํƒ์‚ฌํ•ญ)
sudo rm /var/log/chatbot.log /var/log/chatbot-error.log
```
#### 4. ์ฃผ์˜์‚ฌํ•ญ
- **๊ฐ€์ƒํ™˜๊ฒฝ ํ•„์ˆ˜**: ์„œ๋น„์Šค ์„ค์น˜ ์ „์— `venv` ๋””๋ ‰ํ† ๋ฆฌ๊ฐ€ ์กด์žฌํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค
- **ํฌํŠธ ์ถฉ๋Œ**: ๊ธฐ์กด ํ”„๋กœ์„ธ์Šค๊ฐ€ 7860 ํฌํŠธ๋ฅผ ์‚ฌ์šฉ ์ค‘์ด๋ฉด ์„œ๋น„์Šค๊ฐ€ ์‹œ์ž‘๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค
- **๊ถŒํ•œ**: ์„ค์น˜ ์Šคํฌ๋ฆฝํŠธ๋Š” ๋ฐ˜๋“œ์‹œ `sudo`๋กœ ์‹คํ–‰ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค
- **์žฌ์‹œ์ž‘**: ์•ฑ ์ฝ”๋“œ ๋ณ€๊ฒฝ ํ›„์—๋Š” `sudo systemctl restart chatbot` ์‹คํ–‰ ํ•„์š”
- **๋กœ๊ทธ ํ™•์ธ**: ๋ฌธ์ œ ๋ฐœ์ƒ ์‹œ ๋กœ๊ทธ ํŒŒ์ผ์„ ๋จผ์ € ํ™•์ธํ•˜์„ธ์š”
#### 5. ์ˆ˜๋™ ์„œ๋น„์Šค ์„ค์ • (๊ณ ๊ธ‰)
์ž๋™ ์„ค์น˜ ์Šคํฌ๋ฆฝํŠธ ๋Œ€์‹  ์ˆ˜๋™์œผ๋กœ ์„ค์ •ํ•˜๋ ค๋ฉด:
```bash
# 1. chatbot.service ํŒŒ์ผ ํŽธ์ง‘
sudo nano /etc/systemd/system/chatbot.service
# 2. ๋‹ค์Œ ๋‚ด์šฉ ์ž…๋ ฅ (๊ฒฝ๋กœ์™€ ์‚ฌ์šฉ์ž๋ช… ์ˆ˜์ • ํ•„์š”)
[Unit]
Description=Multi-Model Chatbot Gradio Service
After=network.target
[Service]
Type=simple
User=YOUR_USERNAME
WorkingDirectory=/path/to/simple-chatbot-gradio
Environment="PATH=/path/to/simple-chatbot-gradio/venv/bin:/usr/bin:/bin"
ExecStart=/path/to/simple-chatbot-gradio/venv/bin/python app.py
Restart=on-failure
RestartSec=10
StandardOutput=append:/var/log/chatbot.log
StandardError=append:/var/log/chatbot-error.log
[Install]
WantedBy=multi-user.target
# 3. ๋กœ๊ทธ ํŒŒ์ผ ์ƒ์„ฑ
sudo touch /var/log/chatbot.log /var/log/chatbot-error.log
sudo chown YOUR_USERNAME:YOUR_USERNAME /var/log/chatbot.log /var/log/chatbot-error.log
# 4. systemd ๋ฐ๋ชฌ ์žฌ๋กœ๋“œ ๋ฐ ์„œ๋น„์Šค ํ™œ์„ฑํ™”
sudo systemctl daemon-reload
sudo systemctl enable chatbot
sudo systemctl start chatbot
```
#### 6. ํŠธ๋Ÿฌ๋ธ”์ŠˆํŒ…
**์„œ๋น„์Šค๊ฐ€ ์‹œ์ž‘๋˜์ง€ ์•Š๋Š” ๊ฒฝ์šฐ**:
```bash
# ์„œ๋น„์Šค ์ƒํƒœ ํ™•์ธ
sudo systemctl status chatbot
# ์—๋Ÿฌ ๋กœ๊ทธ ํ™•์ธ
sudo journalctl -u chatbot -n 50
# ์ˆ˜๋™ ์‹คํ–‰์œผ๋กœ ์—๋Ÿฌ ํ™•์ธ
cd /path/to/simple-chatbot-gradio
source venv/bin/activate
python app.py
```
**ํฌํŠธ๊ฐ€ ์ด๋ฏธ ์‚ฌ์šฉ ์ค‘์ธ ๊ฒฝ์šฐ**:
```bash
# ํฌํŠธ 7860์„ ์‚ฌ์šฉํ•˜๋Š” ํ”„๋กœ์„ธ์Šค ํ™•์ธ
sudo lsof -i :7860
# ํ”„๋กœ์„ธ์Šค ์ข…๋ฃŒ (PID ํ™•์ธ ํ›„)
sudo kill -9 <PID>
```
**๊ฐ€์ƒํ™˜๊ฒฝ ๊ฒฝ๋กœ ๋ฌธ์ œ**:
```bash
# ๊ฐ€์ƒํ™˜๊ฒฝ ์žฌ์ƒ์„ฑ
python -m venv venv
source venv/bin/activate
pip install -r requirements-local.txt
```
## ๐Ÿ› ๏ธ ๊ธฐ์ˆ  ์Šคํƒ
- **ํ”„๋ ˆ์ž„์›Œํฌ**: Gradio 5.49.1
- **ML ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ**: Transformers 4.57.1, PyTorch 2.2.0+
- **GPU ์ง€์›**:
- HF Spaces: ZeroGPU (NVIDIA H200)
- ๋กœ์ปฌ: CUDA 12.0+, Apple Silicon MPS
- ์ตœ์‹  GPU: PyTorch nightly (CUDA 12.8+) ์ง€์›
- **์–ธ์–ด**: Python 3.10+
## ๐Ÿ“š Dependencies
```txt
# Core
gradio==5.49.1
transformers==4.57.1
torch>=2.2.0 # HF Spaces: 2.2.0 (ZeroGPU), Local: 2.2.0+ or nightly
safetensors==0.6.2
accelerate==0.26.1
sentencepiece==0.2.0
protobuf==4.25.1
huggingface-hub>=0.19.0
python-dotenv==1.0.0
spaces # ZeroGPU support (HF Spaces only)
```
**ํ™˜๊ฒฝ๋ณ„ PyTorch ๋ฒ„์ „**:
- **HF Spaces**: PyTorch 2.2.0 (ZeroGPU ํ˜ธํ™˜)
- **๋กœ์ปฌ ์ผ๋ฐ˜ GPU**: PyTorch 2.2.0+ (CUDA 12.0+)
- **๋กœ์ปฌ ์ตœ์‹  GPU (RTX 5080 ๋“ฑ)**: PyTorch nightly (CUDA 12.8+)
- **๋กœ์ปฌ CPU**: PyTorch 2.2.0+ (CPU-only build)
## ๐Ÿ”’ Gated ๋ชจ๋ธ ์‚ฌ์šฉ๋ฒ•
### 1. ๋ชจ๋ธ ์Šน์ธ ์š”์ฒญ
๊ฐ Gated ๋ชจ๋ธ ํŽ˜์ด์ง€์—์„œ "Request Access" ํด๋ฆญ:
- https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct
- https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct
- https://huggingface.co/CohereForAI/aya-23-8B
### 2. HF_TOKEN ์„ค์ •
์Šน์ธ ํ›„ HF_TOKEN์„ .env ํŒŒ์ผ์— ์„ค์ • (์œ„ ์ฐธ์กฐ)
### 3. Space Secrets ์„ค์ • (HF Spaces)
Space Settings โ†’ Repository secrets:
- Name: `HF_TOKEN`
- Value: `your_token_here`
## โš ๏ธ ์ œํ•œ์‚ฌํ•ญ ๋ฐ ์•Œ๋ ค์ง„ ์ด์Šˆ
### ๊ณตํ†ต
- **๋ชจ๋ธ ํฌ๊ธฐ**: 2-70GB (๋กœ๋”ฉ ์‹œ๊ฐ„ ํ•„์š”)
- **์ปจํ…์ŠคํŠธ**: ๋Œ€ํ™” ํžˆ์Šคํ† ๋ฆฌ ์œ ์ง€ (์ตœ๊ทผ 3ํ„ด)
- **๋ฉ”๋ชจ๋ฆฌ**: ํฐ ๋ชจ๋ธ์€ GPU/๊ณ ์šฉ๋Ÿ‰ RAM ํ•„์š”
### ํ™˜๊ฒฝ๋ณ„ ์ œ์•ฝ
**HF Spaces - ZeroGPU**:
- ์ผ์ผ ํ•œ๋„: 25๋ถ„ (PRO ๊ตฌ๋… ํ•„์š”)
- ๋Œ€๊ธฐ์—ด: ์‚ฌ์šฉ์ž ๋งŽ์„ ๊ฒฝ์šฐ ๋Œ€๊ธฐ
- ๋น„์šฉ: $9/month
**HF Spaces - CPU Upgrade**:
- ๋А๋ฆฐ ์†๋„: GPU ๋Œ€๋น„ 10-30๋ฐฐ ๋А๋ฆผ
- ๋น„์šฉ: ์‹œ๊ฐ„๋‹น $0.03 ($22/month)
- ๋ฉ”๋ชจ๋ฆฌ: 32GB RAM (๋Œ€ํ˜• ๋ชจ๋ธ ์ œ์•ฝ)
**HF Spaces - CPU Basic**:
- ๋งค์šฐ ๋А๋ฆผ: 1-2๋ถ„ ์‘๋‹ต
- ์ œํ•œ์  ์‚ฌ์šฉ
- ๊ฒฝ๋Ÿ‰ ๋ชจ๋ธ ๊ถŒ์žฅ
**๋กœ์ปฌ ํ™˜๊ฒฝ**:
- GPU ๋ฉ”๋ชจ๋ฆฌ: ํฐ ๋ชจ๋ธ์€ VRAM ๋ถ€์กฑ ๊ฐ€๋Šฅ
- ์ตœ์‹  GPU: PyTorch nightly ํ•„์š” (RTX 5080 ๋“ฑ)
- CPU ๋ชจ๋“œ: ๋งค์šฐ ๋А๋ฆผ (1-3๋ถ„ ์‘๋‹ต)
### ์•Œ๋ ค์ง„ ์ด์Šˆ ๋ฐ ํ•ด๊ฒฐ๋ฐฉ๋ฒ•
**"CUDA has been initialized" ์˜ค๋ฅ˜ (ZeroGPU)**:
- **์›์ธ**: torch ์ „์— spaces import ํ•„์š”
- **ํ•ด๊ฒฐ**: app.py์—์„œ spaces๋ฅผ ๊ฐ€์žฅ ๋จผ์ € import (์ด๋ฏธ ์ ์šฉ๋จ)
**RTX 5080 ๋“ฑ Blackwell GPU์—์„œ CUDA ์˜ค๋ฅ˜**:
- **์›์ธ**: CUDA 12.8+ ํ•„์š” (PyTorch 2.2.0์€ ๋ฏธ์ง€์›)
- **ํ•ด๊ฒฐ**: PyTorch nightly ์„ค์น˜ (์œ„ ์„ค์น˜ ์„น์…˜ ์ฐธ์กฐ)
**GPU ๊ฐ์ง€๋˜์ง€๋งŒ CPU ๋ชจ๋“œ๋กœ ๋™์ž‘**:
- **์›์ธ**: CUDA ํ˜ธํ™˜์„ฑ ํ…Œ์ŠคํŠธ ์‹คํŒจ
- **ํ•ด๊ฒฐ**: PyTorch ๋ฒ„์ „ ํ™•์ธ, CUDA ๋“œ๋ผ์ด๋ฒ„ ์—…๋ฐ์ดํŠธ
## ๐Ÿ”— ๊ด€๋ จ ๋ฆฌ์†Œ์Šค
### ๋ชจ๋ธ ์นด๋“œ
- [EXAONE 3.5](https://huggingface.co/LGAI-EXAONE)
- [Llama 3 Open-Ko](https://huggingface.co/beomi/Llama-3-Open-Ko-8B)
- [Qwen2.5](https://huggingface.co/Qwen)
- [SOLAR](https://huggingface.co/upstage/SOLAR-10.7B-Instruct-v1.0)
### ๋ฌธ์„œ
- [ZeroGPU Documentation](https://huggingface.co/docs/hub/spaces-zerogpu)
- [Gradio Documentation](https://www.gradio.app/docs)
- [HF Spaces Config](https://huggingface.co/docs/hub/spaces-config-reference)
- [HF Spaces Pricing](https://huggingface.co/pricing)
## ๐Ÿ“„ ๋ผ์ด์„ ์Šค
MIT License
## ๐Ÿ™‹โ€โ™‚๏ธ ๋ฌธ์˜
์ด์Šˆ๋‚˜ ์งˆ๋ฌธ์ด ์žˆ์œผ์‹œ๋ฉด GitHub Issues๋ฅผ ํ†ตํ•ด ๋ฌธ์˜ํ•ด์ฃผ์„ธ์š”.
---
**๐Ÿ’ก TIP**:
- ๋น ๋ฅธ ํ…Œ์ŠคํŠธ: EXAONE 2.4B โšก
- ๊ท ํ˜•์žกํžŒ ์„ฑ๋Šฅ: EXAONE 7.8B โญ
- ์ตœ๊ณ  ํ’ˆ์งˆ: Llama 3.1 70B ๐Ÿ”’ (๋А๋ฆผ)