Spaces:
Sleeping
Sleeping
| title: Multi-Model Korean LLM Chatbot | |
| emoji: ๐ค | |
| colorFrom: blue | |
| colorTo: purple | |
| sdk: gradio | |
| sdk_version: 5.49.1 | |
| app_file: app.py | |
| pinned: false | |
| license: mit | |
| # ๐ค Multi-Model Korean LLM Chatbot | |
| 13๊ฐ์ ๋ค์ํ ํ๊ตญ์ด LLM ๋ชจ๋ธ์ ์ ํํ์ฌ ๋ํํ ์ ์๋ ๋ฉํฐ๋ชจ๋ธ ์ฑ๋ด์ ๋๋ค. **๋ก์ปฌ ํ๊ฒฝ(CPU/GPU)**๊ณผ **Hugging Face Spaces(CPU Basic/Upgrade, ZeroGPU)**๋ฅผ ์๋ ๊ฐ์งํ์ฌ ์ต์ ์ค์ ์ ์ ์ฉํฉ๋๋ค. | |
| ## โจ ์ฃผ์ ํน์ง | |
| - **๐ฏ 13๊ฐ ๋ชจ๋ธ ์ ํ**: ๋ค์ํ ํฌ๊ธฐ์ ํน์ฑ์ LLM ๋ชจ๋ธ ์ง์ | |
| - **๐ฐ๐ท ํ๊ธ ์ต์ ํ**: ํ๊ตญ์ด ์ฑ๋ฅ์ด ์ฐ์ํ ๋ชจ๋ธ๋ค๋ก ๊ตฌ์ฑ | |
| - **๐ฅ๏ธ ๋ฉํฐ ํ๊ฒฝ ์ง์**: ๋ก์ปฌ(CPU/GPU) + HF Spaces(CPU Basic/Upgrade, ZeroGPU) ์๋ ๊ฐ์ง | |
| - **๐พ ์บ์ ์์คํ **: ๋ชจ๋ธ ์ฌ๋ค์ด๋ก๋ ๋ฐฉ์ง, ๋น ๋ฅธ ๋ก๋ฉ | |
| - **๐ Lazy Loading**: ์ ํํ ๋ชจ๋ธ๋ง ๋ก๋ํ์ฌ ๋ฆฌ์์ค ์ ์ฝ | |
| - **๐ก๏ธ ์์ ์ฑ**: RTX 5080 ๋ฑ ์ต์ GPU ์ง์, CUDA ํธํ์ฑ ์๋ ํ ์คํธ | |
| ## ๐ฏ ์ง์ ๋ชจ๋ธ (13๊ฐ) | |
| ### ๐ ์ถ์ฒ ํ๊ตญ์ด ๋ชจ๋ธ | |
| | ๋ชจ๋ธ | ํฌ๊ธฐ | ํน์ง | ์ํ | | |
| |------|------|------|------| | |
| | **EXAONE 3.5 7.8B** | 7.3GB | โญ ํ๋ผ๋ฏธํฐ ๋๋น ์ต๊ณ ํจ์จ | Public | | |
| | **EXAONE 3.5 2.4B** | 2.2GB | โก ์ด๊ฒฝ๋, ๋น ๋ฅธ ์๋ต | Public | | |
| | **Llama-3 Open-Ko 8B** | 7.5GB | ๐ฅ Llama 3 ์ํ๊ณ | Public | | |
| ### ๐ ์ ์ฒด ๋ชจ๋ธ ๋ชฉ๋ก | |
| #### Public ๋ชจ๋ธ (10๊ฐ) | |
| 1. LGAI-EXAONE/EXAONE-3.5-7.8B-Instruct | |
| 2. LGAI-EXAONE/EXAONE-3.5-2.4B-Instruct | |
| 3. beomi/Llama-3-Open-Ko-8B | |
| 4. Qwen/Qwen2.5-7B-Instruct | |
| 5. Qwen/Qwen2.5-14B-Instruct | |
| 6. 01-ai/Yi-1.5-9B-Chat | |
| 7. 01-ai/Yi-1.5-34B-Chat | |
| 8. mistralai/Mistral-7B-Instruct-v0.3 | |
| 9. upstage/SOLAR-10.7B-Instruct-v1.0 | |
| 10. EleutherAI/polyglot-ko-5.8b | |
| #### Gated ๋ชจ๋ธ (3๊ฐ) ๐ | |
| 11. meta-llama/Llama-3.1-8B-Instruct | |
| 12. meta-llama/Llama-3.1-70B-Instruct | |
| 13. CohereForAI/aya-23-8B | |
| > **์ฐธ๊ณ **: Gated ๋ชจ๋ธ์ Hugging Face์์ ๋ณ๋ ์น์ธ ํ์ | |
| ## ๐ ์ง์ ํ๊ฒฝ | |
| ### ๋ก์ปฌ ํ๊ฒฝ (๊ฐ๋ฐ/๊ฐ์ธ ์ฌ์ฉ) | |
| **1. Local GPU (๊ถ์ฅ)** | |
| - **์ฅ์ **: | |
| - โก ๋น ๋ฅธ ์๋ต (5-10์ด, GPU ๊ฐ์) | |
| - ๐ ๋ฌด์ ํ ์ฌ์ฉ | |
| - ๐ฐ ๋น์ฉ ์์ | |
| - **์ง์ GPU**: | |
| - NVIDIA CUDA ์ง์ GPU (RTX ์๋ฆฌ์ฆ, A100 ๋ฑ) | |
| - Apple Silicon GPU (M1/M2/M3 - MPS ๊ฐ์) | |
| - RTX 5080 ๋ฑ ์ต์ Blackwell GPU (PyTorch nightly ํ์) | |
| - **์๊ตฌ์ฌํญ**: CUDA 12.0+ ๋๋ Apple Silicon | |
| **2. Local CPU** | |
| - **์ฅ์ **: | |
| - ๐ฅ๏ธ GPU ์์ด๋ ์คํ ๊ฐ๋ฅ | |
| - ๐ง ๊ฐ๋จํ ์ค์ | |
| - **์ ์ฝ**: | |
| - โณ ๋๋ฆฐ ์๋ต (1~3๋ถ) | |
| - ๐ ๊ฒฝ๋ ๋ชจ๋ธ ๊ถ์ฅ (EXAONE 2.4B, Mistral 7B) | |
| ### Hugging Face Spaces (ํด๋ผ์ฐ๋ ๋ฐฐํฌ) | |
| **1. ZeroGPU (์ถ์ฒ)** | |
| - **์ฅ์ **: | |
| - โก ๋น ๋ฅธ ์๋ต (3-10์ด, NVIDIA H200 GPU ๊ฐ์) | |
| - ๐ฐ ์ ๋ ดํ ๋น์ฉ ($9/month) | |
| - ๐ ์๋ GPU ํ ๋น/ํด์ | |
| - **์ ์ฝ**: | |
| - ํ๋ฃจ 25๋ถ ๋ฌด๋ฃ ์ฌ์ฉ (PRO ๊ตฌ๋ ํ์) | |
| - ๋๊ธฐ์ด ๊ฐ๋ฅ (์ฌ์ฉ์ ๋ง์ ๊ฒฝ์ฐ) | |
| - **๋น์ฉ**: $9/month (PRO ๊ตฌ๋ ) | |
| **2. CPU Upgrade** | |
| - **์ฅ์ **: | |
| - โฐ ๋ฌด์ ํ ์ฌ์ฉ | |
| - ๐ ์์ธก ๊ฐ๋ฅํ ์ฑ๋ฅ | |
| - ๐ง ๊ฐ๋จํ ์ค์ | |
| - **์ ์ฝ**: | |
| - ๐ข ๋๋ฆฐ ์๋ต (30์ด~1๋ถ) | |
| - ๐ต ์๋์ ์ผ๋ก ๋น์ผ ๋น์ฉ | |
| - **๋น์ฉ**: $0.03/hour (์ ์ฝ $22) | |
| **3. CPU Basic (๋ฌด๋ฃ)** | |
| - **์ฅ์ **: | |
| - ๐ก ๋ฌด๋ฃ ํฐ์ด | |
| - ๐งช ํ ์คํธ/ํ์ต ์ฉ๋ | |
| - **์ ์ฝ**: | |
| - โณ ๋งค์ฐ ๋๋ฆฐ ์๋ต (1~2๋ถ) | |
| - ๐ ๊ฒฝ๋ ๋ชจ๋ธ๋ง ๊ถ์ฅ | |
| - โ ๏ธ ์ ํ์ ์ฌ์ฉ | |
| ## โ๏ธ ํ๊ฒฝ๋ณ ์ค์ ๋ฐฉ๋ฒ | |
| ### ๋ก์ปฌ ์คํ (์๋ ๊ฐ์ง) | |
| ์ฑ์ด ์๋์ผ๋ก ๋ก์ปฌ ํ๊ฒฝ์ ๊ฐ์งํ๊ณ ์ต์ ์ค์ ์ ์ ์ฉํฉ๋๋ค: | |
| ```bash | |
| python app.py | |
| ``` | |
| **์๋ ๊ฐ์ง ๋ก์ง**: | |
| 1. **GPU ๊ฐ์ง**: CUDA/MPS ์ฌ์ฉ ๊ฐ๋ฅ ์ฌ๋ถ ํ์ธ | |
| 2. **CUDA ํธํ์ฑ ํ ์คํธ**: ํ ์ ์ฐ์ฐ์ผ๋ก ์ค์ GPU ์๋ ๊ฒ์ฆ | |
| 3. **CPU ํด๋ฐฑ**: GPU ์ค๋ฅ ์ ์๋ CPU ๋ชจ๋ ์ ํ | |
| 4. **ํ๊ฒฝ ์ ๋ณด ์ถ๋ ฅ**: ์์ ์ ๊ฐ์ง๋ ํ๊ฒฝ ์ ๋ณด ํ์ | |
| ### HF Spaces ๋ฐฐํฌ (์๋ ๊ฐ์ง) | |
| Space Settings์์ ํ๋์จ์ด๋ฅผ ๋ณ๊ฒฝํ๋ฉด ์ฑ์ด ์๋์ผ๋ก ๊ฐ์ง: | |
| **ZeroGPU๋ก ๋ณ๊ฒฝ**: | |
| 1. Space Settings โ Hardware | |
| 2. **ZeroGPU** ์ ํ | |
| 3. Confirm โ ๋น๋ ์๋ฃ ๋๊ธฐ (1-2๋ถ) | |
| 4. UI์ "๐ HF Spaces - ZeroGPU" ํ์ ํ์ธ | |
| **CPU Upgrade๋ก ๋ณ๊ฒฝ**: | |
| 1. Space Settings โ Hardware | |
| 2. **CPU Upgrade (8 vCPU / 32 GB)** ์ ํ | |
| 3. Confirm โ ๋น๋ ์๋ฃ ๋๊ธฐ (1-2๋ถ) | |
| 4. UI์ "โ๏ธ HF Spaces - CPU Upgrade" ํ์ ํ์ธ | |
| **CPU Basic (๋ฌด๋ฃ)**: | |
| - ๊ธฐ๋ณธ ์ค์ , ๋ณ๋ ๋ณ๊ฒฝ ๋ถํ์ | |
| - UI์ "๐ป HF Spaces - CPU Basic" ํ์ | |
| ## ๐ ์ฑ๋ฅ ๋น๊ต | |
| | ํญ๋ชฉ | Local GPU | Local CPU | ZeroGPU | CPU Upgrade | CPU Basic | | |
| |------|-----------|-----------|---------|-------------|-----------| | |
| | **์ฒซ ์๋ต** | 10-20์ด | 2-5๋ถ | 10-20์ด | 1-2๋ถ | 2-3๋ถ | | |
| | **์ดํ ์๋ต** | 5-10์ด | 1-3๋ถ | 3-10์ด | 30์ด~1๋ถ | 1-2๋ถ | | |
| | **์ผ์ผ ํ๋** | ๋ฌด์ ํ | ๋ฌด์ ํ | 25๋ถ | ๋ฌด์ ํ | ์ ํ์ | | |
| | **์ ๋น์ฉ** | $0 | $0 | $9 | $22 | $0 | | |
| | **GPU** | ์ฌ์ฉ์ GPU | ์์ | H200 (70GB) | ์์ | ์์ | | |
| | **๊ถ์ฅ ๋ชจ๋ธ** | ์ ์ฒด | ๊ฒฝ๋ | ์ ์ฒด | ์ ์ฒด | ๊ฒฝ๋ | | |
| ## ๐ง ๊ธฐ์ ๊ตฌ์กฐ | |
| ### ๋ฉํฐ ํ๊ฒฝ ์๋ ๊ฐ์ง ์์คํ | |
| ```python | |
| # 1. CUDA ์ด๊ธฐํ ์ค๋ฅ ๋ฐฉ์ง: spaces๋ฅผ ๋จผ์ import | |
| try: | |
| import spaces | |
| ZEROGPU_AVAILABLE = True | |
| except ImportError: | |
| ZEROGPU_AVAILABLE = False | |
| # 2. ์ดํ CUDA ๊ด๋ จ ํจํค์ง import | |
| import torch | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| # 3. ํ๋์จ์ด ํ๊ฒฝ ๊ฐ์ง | |
| def detect_hardware_environment(): | |
| """ | |
| Returns: { | |
| 'platform': 'hf_spaces' | 'local', | |
| 'hardware': 'zerogpu' | 'cpu_upgrade' | 'cpu_basic' | 'local_gpu' | 'local_cpu', | |
| 'gpu_available': bool, | |
| 'gpu_name': str or None, | |
| 'cuda_compatible': bool | |
| } | |
| """ | |
| # HF Spaces ๊ฐ์ง | |
| if os.environ.get('SPACE_ID'): | |
| if ZEROGPU_AVAILABLE: | |
| return 'zerogpu' | |
| elif cpu_count >= 8: | |
| return 'cpu_upgrade' | |
| else: | |
| return 'cpu_basic' | |
| # ๋ก์ปฌ ํ๊ฒฝ ๊ฐ์ง | |
| if torch.cuda.is_available(): | |
| # CUDA ํธํ์ฑ ํ ์คํธ (RTX 5080 ๋ฑ ์ต์ GPU ์ง์) | |
| if test_cuda_compatibility(): | |
| return 'local_gpu' | |
| else: | |
| return 'local_cpu' # CUDA ์ค๋ฅ โ CPU ํด๋ฐฑ | |
| elif torch.backends.mps.is_available(): | |
| return 'local_gpu' # Apple Silicon | |
| else: | |
| return 'local_cpu' | |
| # 4. ์กฐ๊ฑด๋ถ GPU decorator ์ ์ฉ | |
| if ZEROGPU_AVAILABLE: | |
| @spaces.GPU(duration=120) | |
| def generate_response(message, history): | |
| return generate_response_impl(message, history) | |
| else: | |
| def generate_response(message, history): | |
| return generate_response_impl(message, history) | |
| ``` | |
| ### Lazy Loading & ์บ์ ์์คํ | |
| **์ค๋งํธ ๋ชจ๋ธ ๋ก๋ฉ**: | |
| ```python | |
| def load_model_once(model_index=None): | |
| """๋ชจ๋ธ ๋ณ๊ฒฝ ์์๋ง ๋ก๋ (Lazy Loading)""" | |
| global model, tokenizer, loaded_model_name | |
| model_name = MODEL_CONFIGS[model_index]["MODEL_NAME"] | |
| # 1. ์ด๋ฏธ ๋ก๋๋ ๋ชจ๋ธ์ด๋ฉด ์ฌ์ฌ์ฉ | |
| if loaded_model_name == model_name: | |
| print(f"โน๏ธ Model {model_name} already loaded, reusing...") | |
| return model, tokenizer | |
| # 2. ์บ์ ํ์ธ โ UI์ ๋ค์ด๋ก๋ vs ๋ก๋ฉ ๋ฉ์์ง ํ์ | |
| is_cached = check_model_cached(model_name) | |
| if is_cached: | |
| print(f"โ Model found in cache, loading from disk...") | |
| else: | |
| print(f"๐ฅ Model not in cache, downloading (~4-14GB)...") | |
| # 3. ์ด์ ๋ชจ๋ธ ๋ฉ๋ชจ๋ฆฌ ํด์ | |
| if model is not None: | |
| del model, tokenizer | |
| if HW_ENV['cuda_compatible']: | |
| torch.cuda.empty_cache() | |
| # 4. ์ ๋ชจ๋ธ ๋ก๋ (ํ๊ฒฝ๋ณ ์ต์ ํ) | |
| device = "cuda" if HW_ENV['gpu_available'] and HW_ENV['cuda_compatible'] else "cpu" | |
| if device == "cuda": | |
| model = AutoModelForCausalLM.from_pretrained( | |
| model_name, | |
| dtype=torch.float16, # GPU: float16 | |
| device_map="auto", | |
| ) | |
| else: | |
| model = AutoModelForCausalLM.from_pretrained( | |
| model_name, | |
| dtype=torch.float32, # CPU: float32 | |
| ) | |
| loaded_model_name = model_name | |
| return model, tokenizer | |
| ``` | |
| **์บ์ ์ํ ํ์ธ**: | |
| - ์ฌ์ฉ์์๊ฒ "๐พ ์บ์๋ ๋ชจ๋ธ ๋ก๋ฉ ์ค" vs "๐ฅ ๋ชจ๋ธ ๋ค์ด๋ก๋ ์ค" ์ค์๊ฐ ํ์ | |
| - ๋ค์ด๋ก๋ ์๊ฐ ์์ธก ์ ๋ณด ์ ๊ณต (์ฒซ ์ฌ์ฉ ์ 5-20๋ถ) | |
| ## ๐ ์ฌ์ฉ ๋ฐฉ๋ฒ | |
| ### 1. Space ์ ์ | |
| https://huggingface.co/spaces/catchitplay/simple-chat | |
| ### 2. ๋ชจ๋ธ ์ ํ | |
| - ๋๋กญ๋ค์ด์์ ์ํ๋ ๋ชจ๋ธ ์ ํ | |
| - ์บ์ ์ํ ํ์ธ (๐พ ์บ์๋จ / ๐ฅ ๋ค์ด๋ก๋ ํ์) | |
| - ์ฒซ ์ฌ์ฉ ์ ๋ชจ๋ธ ๋ค์ด๋ก๋ (2-14GB, 5-20๋ถ) | |
| ### 3. ๋ํ ์์ | |
| ``` | |
| ์๋ ํ์ธ์ | |
| ์ธ๊ณต์ง๋ฅ์ ๋ํด ์ค๋ช ํด์ฃผ์ธ์ | |
| ํ๊ตญ์ ์๋๋ ์ด๋์ธ๊ฐ์? | |
| ``` | |
| ## ๐ก ๋ชจ๋ธ ์ ํ ๊ฐ์ด๋ | |
| ### ๋น ๋ฅธ ์๋ต์ด ํ์ํ ๊ฒฝ์ฐ | |
| - **EXAONE 3.5 2.4B** โก (2.2GB) - ๊ฐ์ฅ ๋น ๋ฆ | |
| - **Mistral 7B** (7GB) - ๊ฒฝ๋ ๋ชจ๋ธ | |
| ### ํ์ง ์ค์ | |
| - **EXAONE 3.5 7.8B** โญ (7.3GB) - ํจ์จ์ฑ ์ต๊ณ | |
| - **Qwen2.5 14B** (14GB) - ๋ค๊ตญ์ด ๊ฐ์ | |
| - **SOLAR 10.7B** (10GB) - ํ๊ตญ์ด ํนํ | |
| ### ์ต๊ณ ์ฑ๋ฅ (๋๋ฆผ) | |
| - **Llama 3.1 70B** ๐ (70GB) - ์ต๊ณ ํ์ง | |
| - **Yi 1.5 34B** (34GB) - ๊ธด ๋ฌธ๋งฅ | |
| ### Llama ์ํ๊ณ | |
| - **Llama-3 Open-Ko 8B** ๐ฅ (7.5GB) | |
| - **Llama 3.1 8B** ๐ (8GB) | |
| ## ๐ฆ ๋ก์ปฌ ์คํ | |
| ### ์ค์น | |
| ```bash | |
| # ์ ์ฅ์ ํด๋ก | |
| git clone https://github.com/catchitplay/simple-chatbot-gradio.git | |
| cd simple-chatbot-gradio | |
| # ๊ฐ์ํ๊ฒฝ ์์ฑ (๊ถ์ฅ) | |
| python -m venv venv | |
| source venv/bin/activate # Windows: venv\Scripts\activate | |
| # ์์กด์ฑ ์ค์น | |
| pip install -r requirements.txt | |
| ``` | |
| **RTX 5080 ๋ฑ ์ต์ GPU ์ฌ์ฉ ์**: | |
| ```bash | |
| # PyTorch nightly ์ค์น (CUDA 12.8+ ์ง์) | |
| pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128 | |
| ``` | |
| ### .env ํ์ผ ์ค์ | |
| ```bash | |
| # .env ํ์ผ ์์ฑ | |
| echo "HF_TOKEN=your_hugging_face_token" > .env | |
| ``` | |
| **HF_TOKEN ๋ฐ๊ธ ๋ฐฉ๋ฒ**: | |
| 1. https://huggingface.co/settings/tokens ์ ์ | |
| 2. "New token" ํด๋ฆญ | |
| 3. "Read" ๊ถํ ์ ํ | |
| 4. ์์ฑ๋ ํ ํฐ ๋ณต์ฌ | |
| ### ์คํ | |
| ```bash | |
| python app.py | |
| ``` | |
| ๋ธ๋ผ์ฐ์ ์์ http://localhost:7860 ์ ์ | |
| **์์ ์ ์๋ ํ๊ฒฝ ๊ฐ์ง ์ถ๋ ฅ**: | |
| ``` | |
| ============================================================ | |
| Hardware Environment Detection | |
| ============================================================ | |
| Platform: local | |
| Hardware: local_gpu | |
| GPU Available: True | |
| GPU Name: NVIDIA GeForce RTX 5080 | |
| CPU Cores: 16 | |
| OS: Linux | |
| Description: ๐ฅ๏ธ Local - GPU (NVIDIA GeForce RTX 5080) | |
| ============================================================ | |
| ``` | |
| **์ฐธ๊ณ **: | |
| - ๋ก์ปฌ ํ๊ฒฝ ์๋ ๊ฐ์ง: CPU/GPU/Apple Silicon MPS | |
| - CUDA ํธํ์ฑ ์๋ ํ ์คํธ (GPU ์ค๋ฅ ์ CPU ํด๋ฐฑ) | |
| - ์ฒซ ์คํ ์ ๋ชจ๋ธ ๋ค์ด๋ก๋ (4-14GB, 5-20๋ถ ์์) | |
| - GPU ๊ถ์ฅ (RTX ์๋ฆฌ์ฆ, A100, Apple Silicon ๋ฑ) | |
| ### ๋ฆฌ๋ ์ค ์์คํ ์๋น์ค๋ก ์ค์น (์๋ ์์) | |
| ์๋ฒ ๋ถํ ์ ์ฑ๋ด์ ์๋์ผ๋ก ์คํํ๋ ค๋ฉด systemd ์๋น์ค๋ก ์ค์นํ ์ ์์ต๋๋ค. | |
| #### 1. ์ค์น ์คํฌ๋ฆฝํธ ์คํ | |
| ```bash | |
| # ํ๋ก์ ํธ ๋๋ ํ ๋ฆฌ์์ ์คํ | |
| sudo ./install-service.sh | |
| ``` | |
| ์ค์น ์คํฌ๋ฆฝํธ๊ฐ ์๋์ผ๋ก: | |
| - ํ์ฌ ์ฌ์ฉ์์ ๋๋ ํ ๋ฆฌ ๊ฒฝ๋ก๋ฅผ ๊ฐ์ง | |
| - systemd ์๋น์ค ํ์ผ์ `/etc/systemd/system/chatbot.service`์ ์ค์น | |
| - ๋ก๊ทธ ํ์ผ ์์ฑ (`/var/log/chatbot.log`, `/var/log/chatbot-error.log`) | |
| - ๋ถํ ์ ์๋ ์์ ํ์ฑํ | |
| - ์๋น์ค ์ฆ์ ์์ ์ฌ๋ถ ํ์ธ | |
| #### 2. ์๋น์ค ๊ด๋ฆฌ ๋ช ๋ น์ด | |
| ```bash | |
| # ์๋น์ค ์์ | |
| sudo systemctl start chatbot | |
| # ์๋น์ค ์ค์ง | |
| sudo systemctl stop chatbot | |
| # ์๋น์ค ์ฌ์์ | |
| sudo systemctl restart chatbot | |
| # ์๋น์ค ์ํ ํ์ธ | |
| sudo systemctl status chatbot | |
| # ์ค์๊ฐ ๋ก๊ทธ ๋ณด๊ธฐ | |
| sudo journalctl -u chatbot -f | |
| # ์ ํ๋ฆฌ์ผ์ด์ ๋ก๊ทธ ๋ณด๊ธฐ | |
| tail -f /var/log/chatbot.log | |
| # ์๋ฌ ๋ก๊ทธ ๋ณด๊ธฐ | |
| tail -f /var/log/chatbot-error.log | |
| # ๋ถํ ์ ์๋ ์์ ํ์ฑํ | |
| sudo systemctl enable chatbot | |
| # ๋ถํ ์ ์๋ ์์ ๋นํ์ฑํ | |
| sudo systemctl disable chatbot | |
| ``` | |
| #### 3. ์๋น์ค ์ญ์ | |
| ์๋น์ค๋ฅผ ์์ ํ ์ ๊ฑฐํ๋ ค๋ฉด: | |
| ```bash | |
| # ์๋น์ค ์ค์ง ๋ฐ ๋นํ์ฑํ | |
| sudo systemctl stop chatbot | |
| sudo systemctl disable chatbot | |
| # ์๋น์ค ํ์ผ ์ญ์ | |
| sudo rm /etc/systemd/system/chatbot.service | |
| # systemd ๋ฐ๋ชฌ ์ฌ๋ก๋ | |
| sudo systemctl daemon-reload | |
| # ๋ก๊ทธ ํ์ผ ์ญ์ (์ ํ์ฌํญ) | |
| sudo rm /var/log/chatbot.log /var/log/chatbot-error.log | |
| ``` | |
| #### 4. ์ฃผ์์ฌํญ | |
| - **๊ฐ์ํ๊ฒฝ ํ์**: ์๋น์ค ์ค์น ์ ์ `venv` ๋๋ ํ ๋ฆฌ๊ฐ ์กด์ฌํด์ผ ํฉ๋๋ค | |
| - **ํฌํธ ์ถฉ๋**: ๊ธฐ์กด ํ๋ก์ธ์ค๊ฐ 7860 ํฌํธ๋ฅผ ์ฌ์ฉ ์ค์ด๋ฉด ์๋น์ค๊ฐ ์์๋์ง ์์ต๋๋ค | |
| - **๊ถํ**: ์ค์น ์คํฌ๋ฆฝํธ๋ ๋ฐ๋์ `sudo`๋ก ์คํํด์ผ ํฉ๋๋ค | |
| - **์ฌ์์**: ์ฑ ์ฝ๋ ๋ณ๊ฒฝ ํ์๋ `sudo systemctl restart chatbot` ์คํ ํ์ | |
| - **๋ก๊ทธ ํ์ธ**: ๋ฌธ์ ๋ฐ์ ์ ๋ก๊ทธ ํ์ผ์ ๋จผ์ ํ์ธํ์ธ์ | |
| #### 5. ์๋ ์๋น์ค ์ค์ (๊ณ ๊ธ) | |
| ์๋ ์ค์น ์คํฌ๋ฆฝํธ ๋์ ์๋์ผ๋ก ์ค์ ํ๋ ค๋ฉด: | |
| ```bash | |
| # 1. chatbot.service ํ์ผ ํธ์ง | |
| sudo nano /etc/systemd/system/chatbot.service | |
| # 2. ๋ค์ ๋ด์ฉ ์ ๋ ฅ (๊ฒฝ๋ก์ ์ฌ์ฉ์๋ช ์์ ํ์) | |
| [Unit] | |
| Description=Multi-Model Chatbot Gradio Service | |
| After=network.target | |
| [Service] | |
| Type=simple | |
| User=YOUR_USERNAME | |
| WorkingDirectory=/path/to/simple-chatbot-gradio | |
| Environment="PATH=/path/to/simple-chatbot-gradio/venv/bin:/usr/bin:/bin" | |
| ExecStart=/path/to/simple-chatbot-gradio/venv/bin/python app.py | |
| Restart=on-failure | |
| RestartSec=10 | |
| StandardOutput=append:/var/log/chatbot.log | |
| StandardError=append:/var/log/chatbot-error.log | |
| [Install] | |
| WantedBy=multi-user.target | |
| # 3. ๋ก๊ทธ ํ์ผ ์์ฑ | |
| sudo touch /var/log/chatbot.log /var/log/chatbot-error.log | |
| sudo chown YOUR_USERNAME:YOUR_USERNAME /var/log/chatbot.log /var/log/chatbot-error.log | |
| # 4. systemd ๋ฐ๋ชฌ ์ฌ๋ก๋ ๋ฐ ์๋น์ค ํ์ฑํ | |
| sudo systemctl daemon-reload | |
| sudo systemctl enable chatbot | |
| sudo systemctl start chatbot | |
| ``` | |
| #### 6. ํธ๋ฌ๋ธ์ํ | |
| **์๋น์ค๊ฐ ์์๋์ง ์๋ ๊ฒฝ์ฐ**: | |
| ```bash | |
| # ์๋น์ค ์ํ ํ์ธ | |
| sudo systemctl status chatbot | |
| # ์๋ฌ ๋ก๊ทธ ํ์ธ | |
| sudo journalctl -u chatbot -n 50 | |
| # ์๋ ์คํ์ผ๋ก ์๋ฌ ํ์ธ | |
| cd /path/to/simple-chatbot-gradio | |
| source venv/bin/activate | |
| python app.py | |
| ``` | |
| **ํฌํธ๊ฐ ์ด๋ฏธ ์ฌ์ฉ ์ค์ธ ๊ฒฝ์ฐ**: | |
| ```bash | |
| # ํฌํธ 7860์ ์ฌ์ฉํ๋ ํ๋ก์ธ์ค ํ์ธ | |
| sudo lsof -i :7860 | |
| # ํ๋ก์ธ์ค ์ข ๋ฃ (PID ํ์ธ ํ) | |
| sudo kill -9 <PID> | |
| ``` | |
| **๊ฐ์ํ๊ฒฝ ๊ฒฝ๋ก ๋ฌธ์ **: | |
| ```bash | |
| # ๊ฐ์ํ๊ฒฝ ์ฌ์์ฑ | |
| python -m venv venv | |
| source venv/bin/activate | |
| pip install -r requirements-local.txt | |
| ``` | |
| ## ๐ ๏ธ ๊ธฐ์ ์คํ | |
| - **ํ๋ ์์ํฌ**: Gradio 5.49.1 | |
| - **ML ๋ผ์ด๋ธ๋ฌ๋ฆฌ**: Transformers 4.57.1, PyTorch 2.2.0+ | |
| - **GPU ์ง์**: | |
| - HF Spaces: ZeroGPU (NVIDIA H200) | |
| - ๋ก์ปฌ: CUDA 12.0+, Apple Silicon MPS | |
| - ์ต์ GPU: PyTorch nightly (CUDA 12.8+) ์ง์ | |
| - **์ธ์ด**: Python 3.10+ | |
| ## ๐ Dependencies | |
| ```txt | |
| # Core | |
| gradio==5.49.1 | |
| transformers==4.57.1 | |
| torch>=2.2.0 # HF Spaces: 2.2.0 (ZeroGPU), Local: 2.2.0+ or nightly | |
| safetensors==0.6.2 | |
| accelerate==0.26.1 | |
| sentencepiece==0.2.0 | |
| protobuf==4.25.1 | |
| huggingface-hub>=0.19.0 | |
| python-dotenv==1.0.0 | |
| spaces # ZeroGPU support (HF Spaces only) | |
| ``` | |
| **ํ๊ฒฝ๋ณ PyTorch ๋ฒ์ **: | |
| - **HF Spaces**: PyTorch 2.2.0 (ZeroGPU ํธํ) | |
| - **๋ก์ปฌ ์ผ๋ฐ GPU**: PyTorch 2.2.0+ (CUDA 12.0+) | |
| - **๋ก์ปฌ ์ต์ GPU (RTX 5080 ๋ฑ)**: PyTorch nightly (CUDA 12.8+) | |
| - **๋ก์ปฌ CPU**: PyTorch 2.2.0+ (CPU-only build) | |
| ## ๐ Gated ๋ชจ๋ธ ์ฌ์ฉ๋ฒ | |
| ### 1. ๋ชจ๋ธ ์น์ธ ์์ฒญ | |
| ๊ฐ Gated ๋ชจ๋ธ ํ์ด์ง์์ "Request Access" ํด๋ฆญ: | |
| - https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct | |
| - https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct | |
| - https://huggingface.co/CohereForAI/aya-23-8B | |
| ### 2. HF_TOKEN ์ค์ | |
| ์น์ธ ํ HF_TOKEN์ .env ํ์ผ์ ์ค์ (์ ์ฐธ์กฐ) | |
| ### 3. Space Secrets ์ค์ (HF Spaces) | |
| Space Settings โ Repository secrets: | |
| - Name: `HF_TOKEN` | |
| - Value: `your_token_here` | |
| ## โ ๏ธ ์ ํ์ฌํญ ๋ฐ ์๋ ค์ง ์ด์ | |
| ### ๊ณตํต | |
| - **๋ชจ๋ธ ํฌ๊ธฐ**: 2-70GB (๋ก๋ฉ ์๊ฐ ํ์) | |
| - **์ปจํ ์คํธ**: ๋ํ ํ์คํ ๋ฆฌ ์ ์ง (์ต๊ทผ 3ํด) | |
| - **๋ฉ๋ชจ๋ฆฌ**: ํฐ ๋ชจ๋ธ์ GPU/๊ณ ์ฉ๋ RAM ํ์ | |
| ### ํ๊ฒฝ๋ณ ์ ์ฝ | |
| **HF Spaces - ZeroGPU**: | |
| - ์ผ์ผ ํ๋: 25๋ถ (PRO ๊ตฌ๋ ํ์) | |
| - ๋๊ธฐ์ด: ์ฌ์ฉ์ ๋ง์ ๊ฒฝ์ฐ ๋๊ธฐ | |
| - ๋น์ฉ: $9/month | |
| **HF Spaces - CPU Upgrade**: | |
| - ๋๋ฆฐ ์๋: GPU ๋๋น 10-30๋ฐฐ ๋๋ฆผ | |
| - ๋น์ฉ: ์๊ฐ๋น $0.03 ($22/month) | |
| - ๋ฉ๋ชจ๋ฆฌ: 32GB RAM (๋ํ ๋ชจ๋ธ ์ ์ฝ) | |
| **HF Spaces - CPU Basic**: | |
| - ๋งค์ฐ ๋๋ฆผ: 1-2๋ถ ์๋ต | |
| - ์ ํ์ ์ฌ์ฉ | |
| - ๊ฒฝ๋ ๋ชจ๋ธ ๊ถ์ฅ | |
| **๋ก์ปฌ ํ๊ฒฝ**: | |
| - GPU ๋ฉ๋ชจ๋ฆฌ: ํฐ ๋ชจ๋ธ์ VRAM ๋ถ์กฑ ๊ฐ๋ฅ | |
| - ์ต์ GPU: PyTorch nightly ํ์ (RTX 5080 ๋ฑ) | |
| - CPU ๋ชจ๋: ๋งค์ฐ ๋๋ฆผ (1-3๋ถ ์๋ต) | |
| ### ์๋ ค์ง ์ด์ ๋ฐ ํด๊ฒฐ๋ฐฉ๋ฒ | |
| **"CUDA has been initialized" ์ค๋ฅ (ZeroGPU)**: | |
| - **์์ธ**: torch ์ ์ spaces import ํ์ | |
| - **ํด๊ฒฐ**: app.py์์ spaces๋ฅผ ๊ฐ์ฅ ๋จผ์ import (์ด๋ฏธ ์ ์ฉ๋จ) | |
| **RTX 5080 ๋ฑ Blackwell GPU์์ CUDA ์ค๋ฅ**: | |
| - **์์ธ**: CUDA 12.8+ ํ์ (PyTorch 2.2.0์ ๋ฏธ์ง์) | |
| - **ํด๊ฒฐ**: PyTorch nightly ์ค์น (์ ์ค์น ์น์ ์ฐธ์กฐ) | |
| **GPU ๊ฐ์ง๋์ง๋ง CPU ๋ชจ๋๋ก ๋์**: | |
| - **์์ธ**: CUDA ํธํ์ฑ ํ ์คํธ ์คํจ | |
| - **ํด๊ฒฐ**: PyTorch ๋ฒ์ ํ์ธ, CUDA ๋๋ผ์ด๋ฒ ์ ๋ฐ์ดํธ | |
| ## ๐ ๊ด๋ จ ๋ฆฌ์์ค | |
| ### ๋ชจ๋ธ ์นด๋ | |
| - [EXAONE 3.5](https://huggingface.co/LGAI-EXAONE) | |
| - [Llama 3 Open-Ko](https://huggingface.co/beomi/Llama-3-Open-Ko-8B) | |
| - [Qwen2.5](https://huggingface.co/Qwen) | |
| - [SOLAR](https://huggingface.co/upstage/SOLAR-10.7B-Instruct-v1.0) | |
| ### ๋ฌธ์ | |
| - [ZeroGPU Documentation](https://huggingface.co/docs/hub/spaces-zerogpu) | |
| - [Gradio Documentation](https://www.gradio.app/docs) | |
| - [HF Spaces Config](https://huggingface.co/docs/hub/spaces-config-reference) | |
| - [HF Spaces Pricing](https://huggingface.co/pricing) | |
| ## ๐ ๋ผ์ด์ ์ค | |
| MIT License | |
| ## ๐โโ๏ธ ๋ฌธ์ | |
| ์ด์๋ ์ง๋ฌธ์ด ์์ผ์๋ฉด GitHub Issues๋ฅผ ํตํด ๋ฌธ์ํด์ฃผ์ธ์. | |
| --- | |
| **๐ก TIP**: | |
| - ๋น ๋ฅธ ํ ์คํธ: EXAONE 2.4B โก | |
| - ๊ท ํ์กํ ์ฑ๋ฅ: EXAONE 7.8B โญ | |
| - ์ต๊ณ ํ์ง: Llama 3.1 70B ๐ (๋๋ฆผ) | |