--- title: Multi-Model Korean LLM Chatbot emoji: ๐Ÿค– colorFrom: blue colorTo: purple sdk: gradio sdk_version: 5.49.1 app_file: app.py pinned: false license: mit --- # ๐Ÿค– Multi-Model Korean LLM Chatbot 13๊ฐœ์˜ ๋‹ค์–‘ํ•œ ํ•œ๊ตญ์–ด LLM ๋ชจ๋ธ์„ ์„ ํƒํ•˜์—ฌ ๋Œ€ํ™”ํ•  ์ˆ˜ ์žˆ๋Š” ๋ฉ€ํ‹ฐ๋ชจ๋ธ ์ฑ—๋ด‡์ž…๋‹ˆ๋‹ค. **๋กœ์ปฌ ํ™˜๊ฒฝ(CPU/GPU)**๊ณผ **Hugging Face Spaces(CPU Basic/Upgrade, ZeroGPU)**๋ฅผ ์ž๋™ ๊ฐ์ง€ํ•˜์—ฌ ์ตœ์  ์„ค์ •์„ ์ ์šฉํ•ฉ๋‹ˆ๋‹ค. ## โœจ ์ฃผ์š” ํŠน์ง• - **๐ŸŽฏ 13๊ฐœ ๋ชจ๋ธ ์„ ํƒ**: ๋‹ค์–‘ํ•œ ํฌ๊ธฐ์™€ ํŠน์„ฑ์˜ LLM ๋ชจ๋ธ ์ง€์› - **๐Ÿ‡ฐ๐Ÿ‡ท ํ•œ๊ธ€ ์ตœ์ ํ™”**: ํ•œ๊ตญ์–ด ์„ฑ๋Šฅ์ด ์šฐ์ˆ˜ํ•œ ๋ชจ๋ธ๋“ค๋กœ ๊ตฌ์„ฑ - **๐Ÿ–ฅ๏ธ ๋ฉ€ํ‹ฐ ํ™˜๊ฒฝ ์ง€์›**: ๋กœ์ปฌ(CPU/GPU) + HF Spaces(CPU Basic/Upgrade, ZeroGPU) ์ž๋™ ๊ฐ์ง€ - **๐Ÿ’พ ์บ์‹œ ์‹œ์Šคํ…œ**: ๋ชจ๋ธ ์žฌ๋‹ค์šด๋กœ๋“œ ๋ฐฉ์ง€, ๋น ๋ฅธ ๋กœ๋”ฉ - **๐Ÿ”„ Lazy Loading**: ์„ ํƒํ•œ ๋ชจ๋ธ๋งŒ ๋กœ๋“œํ•˜์—ฌ ๋ฆฌ์†Œ์Šค ์ ˆ์•ฝ - **๐Ÿ›ก๏ธ ์•ˆ์ •์„ฑ**: RTX 5080 ๋“ฑ ์ตœ์‹  GPU ์ง€์›, CUDA ํ˜ธํ™˜์„ฑ ์ž๋™ ํ…Œ์ŠคํŠธ ## ๐ŸŽฏ ์ง€์› ๋ชจ๋ธ (13๊ฐœ) ### ๐ŸŒŸ ์ถ”์ฒœ ํ•œ๊ตญ์–ด ๋ชจ๋ธ | ๋ชจ๋ธ | ํฌ๊ธฐ | ํŠน์ง• | ์ƒํƒœ | |------|------|------|------| | **EXAONE 3.5 7.8B** | 7.3GB | โญ ํŒŒ๋ผ๋ฏธํ„ฐ ๋Œ€๋น„ ์ตœ๊ณ  ํšจ์œจ | Public | | **EXAONE 3.5 2.4B** | 2.2GB | โšก ์ดˆ๊ฒฝ๋Ÿ‰, ๋น ๋ฅธ ์‘๋‹ต | Public | | **Llama-3 Open-Ko 8B** | 7.5GB | ๐Ÿ”ฅ Llama 3 ์ƒํƒœ๊ณ„ | Public | ### ๐Ÿ“š ์ „์ฒด ๋ชจ๋ธ ๋ชฉ๋ก #### Public ๋ชจ๋ธ (10๊ฐœ) 1. LGAI-EXAONE/EXAONE-3.5-7.8B-Instruct 2. LGAI-EXAONE/EXAONE-3.5-2.4B-Instruct 3. beomi/Llama-3-Open-Ko-8B 4. Qwen/Qwen2.5-7B-Instruct 5. Qwen/Qwen2.5-14B-Instruct 6. 01-ai/Yi-1.5-9B-Chat 7. 01-ai/Yi-1.5-34B-Chat 8. mistralai/Mistral-7B-Instruct-v0.3 9. upstage/SOLAR-10.7B-Instruct-v1.0 10. EleutherAI/polyglot-ko-5.8b #### Gated ๋ชจ๋ธ (3๊ฐœ) ๐Ÿ”’ 11. meta-llama/Llama-3.1-8B-Instruct 12. meta-llama/Llama-3.1-70B-Instruct 13. CohereForAI/aya-23-8B > **์ฐธ๊ณ **: Gated ๋ชจ๋ธ์€ Hugging Face์—์„œ ๋ณ„๋„ ์Šน์ธ ํ•„์š” ## ๐Ÿš€ ์ง€์› ํ™˜๊ฒฝ ### ๋กœ์ปฌ ํ™˜๊ฒฝ (๊ฐœ๋ฐœ/๊ฐœ์ธ ์‚ฌ์šฉ) **1. Local GPU (๊ถŒ์žฅ)** - **์žฅ์ **: - โšก ๋น ๋ฅธ ์‘๋‹ต (5-10์ดˆ, GPU ๊ฐ€์†) - ๐Ÿ”“ ๋ฌด์ œํ•œ ์‚ฌ์šฉ - ๐Ÿ’ฐ ๋น„์šฉ ์—†์Œ - **์ง€์› GPU**: - NVIDIA CUDA ์ง€์› GPU (RTX ์‹œ๋ฆฌ์ฆˆ, A100 ๋“ฑ) - Apple Silicon GPU (M1/M2/M3 - MPS ๊ฐ€์†) - RTX 5080 ๋“ฑ ์ตœ์‹  Blackwell GPU (PyTorch nightly ํ•„์š”) - **์š”๊ตฌ์‚ฌํ•ญ**: CUDA 12.0+ ๋˜๋Š” Apple Silicon **2. Local CPU** - **์žฅ์ **: - ๐Ÿ–ฅ๏ธ GPU ์—†์ด๋„ ์‹คํ–‰ ๊ฐ€๋Šฅ - ๐Ÿ”ง ๊ฐ„๋‹จํ•œ ์„ค์ • - **์ œ์•ฝ**: - โณ ๋А๋ฆฐ ์‘๋‹ต (1~3๋ถ„) - ๐Ÿ”’ ๊ฒฝ๋Ÿ‰ ๋ชจ๋ธ ๊ถŒ์žฅ (EXAONE 2.4B, Mistral 7B) ### Hugging Face Spaces (ํด๋ผ์šฐ๋“œ ๋ฐฐํฌ) **1. ZeroGPU (์ถ”์ฒœ)** - **์žฅ์ **: - โšก ๋น ๋ฅธ ์‘๋‹ต (3-10์ดˆ, NVIDIA H200 GPU ๊ฐ€์†) - ๐Ÿ’ฐ ์ €๋ ดํ•œ ๋น„์šฉ ($9/month) - ๐Ÿ”‹ ์ž๋™ GPU ํ• ๋‹น/ํ•ด์ œ - **์ œ์•ฝ**: - ํ•˜๋ฃจ 25๋ถ„ ๋ฌด๋ฃŒ ์‚ฌ์šฉ (PRO ๊ตฌ๋… ํ•„์š”) - ๋Œ€๊ธฐ์—ด ๊ฐ€๋Šฅ (์‚ฌ์šฉ์ž ๋งŽ์„ ๊ฒฝ์šฐ) - **๋น„์šฉ**: $9/month (PRO ๊ตฌ๋…) **2. CPU Upgrade** - **์žฅ์ **: - โฐ ๋ฌด์ œํ•œ ์‚ฌ์šฉ - ๐Ÿ“Š ์˜ˆ์ธก ๊ฐ€๋Šฅํ•œ ์„ฑ๋Šฅ - ๐Ÿ”ง ๊ฐ„๋‹จํ•œ ์„ค์ • - **์ œ์•ฝ**: - ๐Ÿข ๋А๋ฆฐ ์‘๋‹ต (30์ดˆ~1๋ถ„) - ๐Ÿ’ต ์ƒ๋Œ€์ ์œผ๋กœ ๋น„์‹ผ ๋น„์šฉ - **๋น„์šฉ**: $0.03/hour (์›” ์•ฝ $22) **3. CPU Basic (๋ฌด๋ฃŒ)** - **์žฅ์ **: - ๐Ÿ’ก ๋ฌด๋ฃŒ ํ‹ฐ์–ด - ๐Ÿงช ํ…Œ์ŠคํŠธ/ํ•™์Šต ์šฉ๋„ - **์ œ์•ฝ**: - โณ ๋งค์šฐ ๋А๋ฆฐ ์‘๋‹ต (1~2๋ถ„) - ๐Ÿ”’ ๊ฒฝ๋Ÿ‰ ๋ชจ๋ธ๋งŒ ๊ถŒ์žฅ - โš ๏ธ ์ œํ•œ์  ์‚ฌ์šฉ ## โš™๏ธ ํ™˜๊ฒฝ๋ณ„ ์„ค์ • ๋ฐฉ๋ฒ• ### ๋กœ์ปฌ ์‹คํ–‰ (์ž๋™ ๊ฐ์ง€) ์•ฑ์ด ์ž๋™์œผ๋กœ ๋กœ์ปฌ ํ™˜๊ฒฝ์„ ๊ฐ์ง€ํ•˜๊ณ  ์ตœ์  ์„ค์ •์„ ์ ์šฉํ•ฉ๋‹ˆ๋‹ค: ```bash python app.py ``` **์ž๋™ ๊ฐ์ง€ ๋กœ์ง**: 1. **GPU ๊ฐ์ง€**: CUDA/MPS ์‚ฌ์šฉ ๊ฐ€๋Šฅ ์—ฌ๋ถ€ ํ™•์ธ 2. **CUDA ํ˜ธํ™˜์„ฑ ํ…Œ์ŠคํŠธ**: ํ…์„œ ์—ฐ์‚ฐ์œผ๋กœ ์‹ค์ œ GPU ์ž‘๋™ ๊ฒ€์ฆ 3. **CPU ํด๋ฐฑ**: GPU ์˜ค๋ฅ˜ ์‹œ ์ž๋™ CPU ๋ชจ๋“œ ์ „ํ™˜ 4. **ํ™˜๊ฒฝ ์ •๋ณด ์ถœ๋ ฅ**: ์‹œ์ž‘ ์‹œ ๊ฐ์ง€๋œ ํ™˜๊ฒฝ ์ •๋ณด ํ‘œ์‹œ ### HF Spaces ๋ฐฐํฌ (์ž๋™ ๊ฐ์ง€) Space Settings์—์„œ ํ•˜๋“œ์›จ์–ด๋ฅผ ๋ณ€๊ฒฝํ•˜๋ฉด ์•ฑ์ด ์ž๋™์œผ๋กœ ๊ฐ์ง€: **ZeroGPU๋กœ ๋ณ€๊ฒฝ**: 1. Space Settings โ†’ Hardware 2. **ZeroGPU** ์„ ํƒ 3. Confirm โ†’ ๋นŒ๋“œ ์™„๋ฃŒ ๋Œ€๊ธฐ (1-2๋ถ„) 4. UI์— "๐Ÿš€ HF Spaces - ZeroGPU" ํ‘œ์‹œ ํ™•์ธ **CPU Upgrade๋กœ ๋ณ€๊ฒฝ**: 1. Space Settings โ†’ Hardware 2. **CPU Upgrade (8 vCPU / 32 GB)** ์„ ํƒ 3. Confirm โ†’ ๋นŒ๋“œ ์™„๋ฃŒ ๋Œ€๊ธฐ (1-2๋ถ„) 4. UI์— "โš™๏ธ HF Spaces - CPU Upgrade" ํ‘œ์‹œ ํ™•์ธ **CPU Basic (๋ฌด๋ฃŒ)**: - ๊ธฐ๋ณธ ์„ค์ •, ๋ณ„๋„ ๋ณ€๊ฒฝ ๋ถˆํ•„์š” - UI์— "๐Ÿ’ป HF Spaces - CPU Basic" ํ‘œ์‹œ ## ๐Ÿ“Š ์„ฑ๋Šฅ ๋น„๊ต | ํ•ญ๋ชฉ | Local GPU | Local CPU | ZeroGPU | CPU Upgrade | CPU Basic | |------|-----------|-----------|---------|-------------|-----------| | **์ฒซ ์‘๋‹ต** | 10-20์ดˆ | 2-5๋ถ„ | 10-20์ดˆ | 1-2๋ถ„ | 2-3๋ถ„ | | **์ดํ›„ ์‘๋‹ต** | 5-10์ดˆ | 1-3๋ถ„ | 3-10์ดˆ | 30์ดˆ~1๋ถ„ | 1-2๋ถ„ | | **์ผ์ผ ํ•œ๋„** | ๋ฌด์ œํ•œ | ๋ฌด์ œํ•œ | 25๋ถ„ | ๋ฌด์ œํ•œ | ์ œํ•œ์  | | **์›” ๋น„์šฉ** | $0 | $0 | $9 | $22 | $0 | | **GPU** | ์‚ฌ์šฉ์ž GPU | ์—†์Œ | H200 (70GB) | ์—†์Œ | ์—†์Œ | | **๊ถŒ์žฅ ๋ชจ๋ธ** | ์ „์ฒด | ๊ฒฝ๋Ÿ‰ | ์ „์ฒด | ์ „์ฒด | ๊ฒฝ๋Ÿ‰ | ## ๐Ÿ”ง ๊ธฐ์ˆ  ๊ตฌ์กฐ ### ๋ฉ€ํ‹ฐ ํ™˜๊ฒฝ ์ž๋™ ๊ฐ์ง€ ์‹œ์Šคํ…œ ```python # 1. CUDA ์ดˆ๊ธฐํ™” ์˜ค๋ฅ˜ ๋ฐฉ์ง€: spaces๋ฅผ ๋จผ์ € import try: import spaces ZEROGPU_AVAILABLE = True except ImportError: ZEROGPU_AVAILABLE = False # 2. ์ดํ›„ CUDA ๊ด€๋ จ ํŒจํ‚ค์ง€ import import torch from transformers import AutoModelForCausalLM, AutoTokenizer # 3. ํ•˜๋“œ์›จ์–ด ํ™˜๊ฒฝ ๊ฐ์ง€ def detect_hardware_environment(): """ Returns: { 'platform': 'hf_spaces' | 'local', 'hardware': 'zerogpu' | 'cpu_upgrade' | 'cpu_basic' | 'local_gpu' | 'local_cpu', 'gpu_available': bool, 'gpu_name': str or None, 'cuda_compatible': bool } """ # HF Spaces ๊ฐ์ง€ if os.environ.get('SPACE_ID'): if ZEROGPU_AVAILABLE: return 'zerogpu' elif cpu_count >= 8: return 'cpu_upgrade' else: return 'cpu_basic' # ๋กœ์ปฌ ํ™˜๊ฒฝ ๊ฐ์ง€ if torch.cuda.is_available(): # CUDA ํ˜ธํ™˜์„ฑ ํ…Œ์ŠคํŠธ (RTX 5080 ๋“ฑ ์ตœ์‹  GPU ์ง€์›) if test_cuda_compatibility(): return 'local_gpu' else: return 'local_cpu' # CUDA ์˜ค๋ฅ˜ โ†’ CPU ํด๋ฐฑ elif torch.backends.mps.is_available(): return 'local_gpu' # Apple Silicon else: return 'local_cpu' # 4. ์กฐ๊ฑด๋ถ€ GPU decorator ์ ์šฉ if ZEROGPU_AVAILABLE: @spaces.GPU(duration=120) def generate_response(message, history): return generate_response_impl(message, history) else: def generate_response(message, history): return generate_response_impl(message, history) ``` ### Lazy Loading & ์บ์‹œ ์‹œ์Šคํ…œ **์Šค๋งˆํŠธ ๋ชจ๋ธ ๋กœ๋”ฉ**: ```python def load_model_once(model_index=None): """๋ชจ๋ธ ๋ณ€๊ฒฝ ์‹œ์—๋งŒ ๋กœ๋“œ (Lazy Loading)""" global model, tokenizer, loaded_model_name model_name = MODEL_CONFIGS[model_index]["MODEL_NAME"] # 1. ์ด๋ฏธ ๋กœ๋“œ๋œ ๋ชจ๋ธ์ด๋ฉด ์žฌ์‚ฌ์šฉ if loaded_model_name == model_name: print(f"โ„น๏ธ Model {model_name} already loaded, reusing...") return model, tokenizer # 2. ์บ์‹œ ํ™•์ธ โ†’ UI์— ๋‹ค์šด๋กœ๋“œ vs ๋กœ๋”ฉ ๋ฉ”์‹œ์ง€ ํ‘œ์‹œ is_cached = check_model_cached(model_name) if is_cached: print(f"โœ… Model found in cache, loading from disk...") else: print(f"๐Ÿ“ฅ Model not in cache, downloading (~4-14GB)...") # 3. ์ด์ „ ๋ชจ๋ธ ๋ฉ”๋ชจ๋ฆฌ ํ•ด์ œ if model is not None: del model, tokenizer if HW_ENV['cuda_compatible']: torch.cuda.empty_cache() # 4. ์ƒˆ ๋ชจ๋ธ ๋กœ๋“œ (ํ™˜๊ฒฝ๋ณ„ ์ตœ์ ํ™”) device = "cuda" if HW_ENV['gpu_available'] and HW_ENV['cuda_compatible'] else "cpu" if device == "cuda": model = AutoModelForCausalLM.from_pretrained( model_name, dtype=torch.float16, # GPU: float16 device_map="auto", ) else: model = AutoModelForCausalLM.from_pretrained( model_name, dtype=torch.float32, # CPU: float32 ) loaded_model_name = model_name return model, tokenizer ``` **์บ์‹œ ์ƒํƒœ ํ™•์ธ**: - ์‚ฌ์šฉ์ž์—๊ฒŒ "๐Ÿ’พ ์บ์‹œ๋œ ๋ชจ๋ธ ๋กœ๋”ฉ ์ค‘" vs "๐Ÿ“ฅ ๋ชจ๋ธ ๋‹ค์šด๋กœ๋“œ ์ค‘" ์‹ค์‹œ๊ฐ„ ํ‘œ์‹œ - ๋‹ค์šด๋กœ๋“œ ์‹œ๊ฐ„ ์˜ˆ์ธก ์ •๋ณด ์ œ๊ณต (์ฒซ ์‚ฌ์šฉ ์‹œ 5-20๋ถ„) ## ๐Ÿ“ ์‚ฌ์šฉ ๋ฐฉ๋ฒ• ### 1. Space ์ ‘์† https://huggingface.co/spaces/catchitplay/simple-chat ### 2. ๋ชจ๋ธ ์„ ํƒ - ๋“œ๋กญ๋‹ค์šด์—์„œ ์›ํ•˜๋Š” ๋ชจ๋ธ ์„ ํƒ - ์บ์‹œ ์ƒํƒœ ํ™•์ธ (๐Ÿ’พ ์บ์‹œ๋จ / ๐Ÿ“ฅ ๋‹ค์šด๋กœ๋“œ ํ•„์š”) - ์ฒซ ์‚ฌ์šฉ ์‹œ ๋ชจ๋ธ ๋‹ค์šด๋กœ๋“œ (2-14GB, 5-20๋ถ„) ### 3. ๋Œ€ํ™” ์‹œ์ž‘ ``` ์•ˆ๋…•ํ•˜์„ธ์š” ์ธ๊ณต์ง€๋Šฅ์— ๋Œ€ํ•ด ์„ค๋ช…ํ•ด์ฃผ์„ธ์š” ํ•œ๊ตญ์˜ ์ˆ˜๋„๋Š” ์–ด๋””์ธ๊ฐ€์š”? ``` ## ๐Ÿ’ก ๋ชจ๋ธ ์„ ํƒ ๊ฐ€์ด๋“œ ### ๋น ๋ฅธ ์‘๋‹ต์ด ํ•„์š”ํ•œ ๊ฒฝ์šฐ - **EXAONE 3.5 2.4B** โšก (2.2GB) - ๊ฐ€์žฅ ๋น ๋ฆ„ - **Mistral 7B** (7GB) - ๊ฒฝ๋Ÿ‰ ๋ชจ๋ธ ### ํ’ˆ์งˆ ์ค‘์‹œ - **EXAONE 3.5 7.8B** โญ (7.3GB) - ํšจ์œจ์„ฑ ์ตœ๊ณ  - **Qwen2.5 14B** (14GB) - ๋‹ค๊ตญ์–ด ๊ฐ•์  - **SOLAR 10.7B** (10GB) - ํ•œ๊ตญ์–ด ํŠนํ™” ### ์ตœ๊ณ  ์„ฑ๋Šฅ (๋А๋ฆผ) - **Llama 3.1 70B** ๐Ÿ”’ (70GB) - ์ตœ๊ณ  ํ’ˆ์งˆ - **Yi 1.5 34B** (34GB) - ๊ธด ๋ฌธ๋งฅ ### Llama ์ƒํƒœ๊ณ„ - **Llama-3 Open-Ko 8B** ๐Ÿ”ฅ (7.5GB) - **Llama 3.1 8B** ๐Ÿ”’ (8GB) ## ๐Ÿ“ฆ ๋กœ์ปฌ ์‹คํ–‰ ### ์„ค์น˜ ```bash # ์ €์žฅ์†Œ ํด๋ก  git clone https://github.com/catchitplay/simple-chatbot-gradio.git cd simple-chatbot-gradio # ๊ฐ€์ƒํ™˜๊ฒฝ ์ƒ์„ฑ (๊ถŒ์žฅ) python -m venv venv source venv/bin/activate # Windows: venv\Scripts\activate # ์˜์กด์„ฑ ์„ค์น˜ pip install -r requirements.txt ``` **RTX 5080 ๋“ฑ ์ตœ์‹  GPU ์‚ฌ์šฉ ์‹œ**: ```bash # PyTorch nightly ์„ค์น˜ (CUDA 12.8+ ์ง€์›) pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128 ``` ### .env ํŒŒ์ผ ์„ค์ • ```bash # .env ํŒŒ์ผ ์ƒ์„ฑ echo "HF_TOKEN=your_hugging_face_token" > .env ``` **HF_TOKEN ๋ฐœ๊ธ‰ ๋ฐฉ๋ฒ•**: 1. https://huggingface.co/settings/tokens ์ ‘์† 2. "New token" ํด๋ฆญ 3. "Read" ๊ถŒํ•œ ์„ ํƒ 4. ์ƒ์„ฑ๋œ ํ† ํฐ ๋ณต์‚ฌ ### ์‹คํ–‰ ```bash python app.py ``` ๋ธŒ๋ผ์šฐ์ €์—์„œ http://localhost:7860 ์ ‘์† **์‹œ์ž‘ ์‹œ ์ž๋™ ํ™˜๊ฒฝ ๊ฐ์ง€ ์ถœ๋ ฅ**: ``` ============================================================ Hardware Environment Detection ============================================================ Platform: local Hardware: local_gpu GPU Available: True GPU Name: NVIDIA GeForce RTX 5080 CPU Cores: 16 OS: Linux Description: ๐Ÿ–ฅ๏ธ Local - GPU (NVIDIA GeForce RTX 5080) ============================================================ ``` **์ฐธ๊ณ **: - ๋กœ์ปฌ ํ™˜๊ฒฝ ์ž๋™ ๊ฐ์ง€: CPU/GPU/Apple Silicon MPS - CUDA ํ˜ธํ™˜์„ฑ ์ž๋™ ํ…Œ์ŠคํŠธ (GPU ์˜ค๋ฅ˜ ์‹œ CPU ํด๋ฐฑ) - ์ฒซ ์‹คํ–‰ ์‹œ ๋ชจ๋ธ ๋‹ค์šด๋กœ๋“œ (4-14GB, 5-20๋ถ„ ์†Œ์š”) - GPU ๊ถŒ์žฅ (RTX ์‹œ๋ฆฌ์ฆˆ, A100, Apple Silicon ๋“ฑ) ### ๋ฆฌ๋ˆ…์Šค ์‹œ์Šคํ…œ ์„œ๋น„์Šค๋กœ ์„ค์น˜ (์ž๋™ ์‹œ์ž‘) ์„œ๋ฒ„ ๋ถ€ํŒ… ์‹œ ์ฑ—๋ด‡์„ ์ž๋™์œผ๋กœ ์‹คํ–‰ํ•˜๋ ค๋ฉด systemd ์„œ๋น„์Šค๋กœ ์„ค์น˜ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. #### 1. ์„ค์น˜ ์Šคํฌ๋ฆฝํŠธ ์‹คํ–‰ ```bash # ํ”„๋กœ์ ํŠธ ๋””๋ ‰ํ† ๋ฆฌ์—์„œ ์‹คํ–‰ sudo ./install-service.sh ``` ์„ค์น˜ ์Šคํฌ๋ฆฝํŠธ๊ฐ€ ์ž๋™์œผ๋กœ: - ํ˜„์žฌ ์‚ฌ์šฉ์ž์™€ ๋””๋ ‰ํ† ๋ฆฌ ๊ฒฝ๋กœ๋ฅผ ๊ฐ์ง€ - systemd ์„œ๋น„์Šค ํŒŒ์ผ์„ `/etc/systemd/system/chatbot.service`์— ์„ค์น˜ - ๋กœ๊ทธ ํŒŒ์ผ ์ƒ์„ฑ (`/var/log/chatbot.log`, `/var/log/chatbot-error.log`) - ๋ถ€ํŒ… ์‹œ ์ž๋™ ์‹œ์ž‘ ํ™œ์„ฑํ™” - ์„œ๋น„์Šค ์ฆ‰์‹œ ์‹œ์ž‘ ์—ฌ๋ถ€ ํ™•์ธ #### 2. ์„œ๋น„์Šค ๊ด€๋ฆฌ ๋ช…๋ น์–ด ```bash # ์„œ๋น„์Šค ์‹œ์ž‘ sudo systemctl start chatbot # ์„œ๋น„์Šค ์ค‘์ง€ sudo systemctl stop chatbot # ์„œ๋น„์Šค ์žฌ์‹œ์ž‘ sudo systemctl restart chatbot # ์„œ๋น„์Šค ์ƒํƒœ ํ™•์ธ sudo systemctl status chatbot # ์‹ค์‹œ๊ฐ„ ๋กœ๊ทธ ๋ณด๊ธฐ sudo journalctl -u chatbot -f # ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ๋กœ๊ทธ ๋ณด๊ธฐ tail -f /var/log/chatbot.log # ์—๋Ÿฌ ๋กœ๊ทธ ๋ณด๊ธฐ tail -f /var/log/chatbot-error.log # ๋ถ€ํŒ… ์‹œ ์ž๋™ ์‹œ์ž‘ ํ™œ์„ฑํ™” sudo systemctl enable chatbot # ๋ถ€ํŒ… ์‹œ ์ž๋™ ์‹œ์ž‘ ๋น„ํ™œ์„ฑํ™” sudo systemctl disable chatbot ``` #### 3. ์„œ๋น„์Šค ์‚ญ์ œ ์„œ๋น„์Šค๋ฅผ ์™„์ „ํžˆ ์ œ๊ฑฐํ•˜๋ ค๋ฉด: ```bash # ์„œ๋น„์Šค ์ค‘์ง€ ๋ฐ ๋น„ํ™œ์„ฑํ™” sudo systemctl stop chatbot sudo systemctl disable chatbot # ์„œ๋น„์Šค ํŒŒ์ผ ์‚ญ์ œ sudo rm /etc/systemd/system/chatbot.service # systemd ๋ฐ๋ชฌ ์žฌ๋กœ๋“œ sudo systemctl daemon-reload # ๋กœ๊ทธ ํŒŒ์ผ ์‚ญ์ œ (์„ ํƒ์‚ฌํ•ญ) sudo rm /var/log/chatbot.log /var/log/chatbot-error.log ``` #### 4. ์ฃผ์˜์‚ฌํ•ญ - **๊ฐ€์ƒํ™˜๊ฒฝ ํ•„์ˆ˜**: ์„œ๋น„์Šค ์„ค์น˜ ์ „์— `venv` ๋””๋ ‰ํ† ๋ฆฌ๊ฐ€ ์กด์žฌํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค - **ํฌํŠธ ์ถฉ๋Œ**: ๊ธฐ์กด ํ”„๋กœ์„ธ์Šค๊ฐ€ 7860 ํฌํŠธ๋ฅผ ์‚ฌ์šฉ ์ค‘์ด๋ฉด ์„œ๋น„์Šค๊ฐ€ ์‹œ์ž‘๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค - **๊ถŒํ•œ**: ์„ค์น˜ ์Šคํฌ๋ฆฝํŠธ๋Š” ๋ฐ˜๋“œ์‹œ `sudo`๋กœ ์‹คํ–‰ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค - **์žฌ์‹œ์ž‘**: ์•ฑ ์ฝ”๋“œ ๋ณ€๊ฒฝ ํ›„์—๋Š” `sudo systemctl restart chatbot` ์‹คํ–‰ ํ•„์š” - **๋กœ๊ทธ ํ™•์ธ**: ๋ฌธ์ œ ๋ฐœ์ƒ ์‹œ ๋กœ๊ทธ ํŒŒ์ผ์„ ๋จผ์ € ํ™•์ธํ•˜์„ธ์š” #### 5. ์ˆ˜๋™ ์„œ๋น„์Šค ์„ค์ • (๊ณ ๊ธ‰) ์ž๋™ ์„ค์น˜ ์Šคํฌ๋ฆฝํŠธ ๋Œ€์‹  ์ˆ˜๋™์œผ๋กœ ์„ค์ •ํ•˜๋ ค๋ฉด: ```bash # 1. chatbot.service ํŒŒ์ผ ํŽธ์ง‘ sudo nano /etc/systemd/system/chatbot.service # 2. ๋‹ค์Œ ๋‚ด์šฉ ์ž…๋ ฅ (๊ฒฝ๋กœ์™€ ์‚ฌ์šฉ์ž๋ช… ์ˆ˜์ • ํ•„์š”) [Unit] Description=Multi-Model Chatbot Gradio Service After=network.target [Service] Type=simple User=YOUR_USERNAME WorkingDirectory=/path/to/simple-chatbot-gradio Environment="PATH=/path/to/simple-chatbot-gradio/venv/bin:/usr/bin:/bin" ExecStart=/path/to/simple-chatbot-gradio/venv/bin/python app.py Restart=on-failure RestartSec=10 StandardOutput=append:/var/log/chatbot.log StandardError=append:/var/log/chatbot-error.log [Install] WantedBy=multi-user.target # 3. ๋กœ๊ทธ ํŒŒ์ผ ์ƒ์„ฑ sudo touch /var/log/chatbot.log /var/log/chatbot-error.log sudo chown YOUR_USERNAME:YOUR_USERNAME /var/log/chatbot.log /var/log/chatbot-error.log # 4. systemd ๋ฐ๋ชฌ ์žฌ๋กœ๋“œ ๋ฐ ์„œ๋น„์Šค ํ™œ์„ฑํ™” sudo systemctl daemon-reload sudo systemctl enable chatbot sudo systemctl start chatbot ``` #### 6. ํŠธ๋Ÿฌ๋ธ”์ŠˆํŒ… **์„œ๋น„์Šค๊ฐ€ ์‹œ์ž‘๋˜์ง€ ์•Š๋Š” ๊ฒฝ์šฐ**: ```bash # ์„œ๋น„์Šค ์ƒํƒœ ํ™•์ธ sudo systemctl status chatbot # ์—๋Ÿฌ ๋กœ๊ทธ ํ™•์ธ sudo journalctl -u chatbot -n 50 # ์ˆ˜๋™ ์‹คํ–‰์œผ๋กœ ์—๋Ÿฌ ํ™•์ธ cd /path/to/simple-chatbot-gradio source venv/bin/activate python app.py ``` **ํฌํŠธ๊ฐ€ ์ด๋ฏธ ์‚ฌ์šฉ ์ค‘์ธ ๊ฒฝ์šฐ**: ```bash # ํฌํŠธ 7860์„ ์‚ฌ์šฉํ•˜๋Š” ํ”„๋กœ์„ธ์Šค ํ™•์ธ sudo lsof -i :7860 # ํ”„๋กœ์„ธ์Šค ์ข…๋ฃŒ (PID ํ™•์ธ ํ›„) sudo kill -9 ``` **๊ฐ€์ƒํ™˜๊ฒฝ ๊ฒฝ๋กœ ๋ฌธ์ œ**: ```bash # ๊ฐ€์ƒํ™˜๊ฒฝ ์žฌ์ƒ์„ฑ python -m venv venv source venv/bin/activate pip install -r requirements-local.txt ``` ## ๐Ÿ› ๏ธ ๊ธฐ์ˆ  ์Šคํƒ - **ํ”„๋ ˆ์ž„์›Œํฌ**: Gradio 5.49.1 - **ML ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ**: Transformers 4.57.1, PyTorch 2.2.0+ - **GPU ์ง€์›**: - HF Spaces: ZeroGPU (NVIDIA H200) - ๋กœ์ปฌ: CUDA 12.0+, Apple Silicon MPS - ์ตœ์‹  GPU: PyTorch nightly (CUDA 12.8+) ์ง€์› - **์–ธ์–ด**: Python 3.10+ ## ๐Ÿ“š Dependencies ```txt # Core gradio==5.49.1 transformers==4.57.1 torch>=2.2.0 # HF Spaces: 2.2.0 (ZeroGPU), Local: 2.2.0+ or nightly safetensors==0.6.2 accelerate==0.26.1 sentencepiece==0.2.0 protobuf==4.25.1 huggingface-hub>=0.19.0 python-dotenv==1.0.0 spaces # ZeroGPU support (HF Spaces only) ``` **ํ™˜๊ฒฝ๋ณ„ PyTorch ๋ฒ„์ „**: - **HF Spaces**: PyTorch 2.2.0 (ZeroGPU ํ˜ธํ™˜) - **๋กœ์ปฌ ์ผ๋ฐ˜ GPU**: PyTorch 2.2.0+ (CUDA 12.0+) - **๋กœ์ปฌ ์ตœ์‹  GPU (RTX 5080 ๋“ฑ)**: PyTorch nightly (CUDA 12.8+) - **๋กœ์ปฌ CPU**: PyTorch 2.2.0+ (CPU-only build) ## ๐Ÿ”’ Gated ๋ชจ๋ธ ์‚ฌ์šฉ๋ฒ• ### 1. ๋ชจ๋ธ ์Šน์ธ ์š”์ฒญ ๊ฐ Gated ๋ชจ๋ธ ํŽ˜์ด์ง€์—์„œ "Request Access" ํด๋ฆญ: - https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct - https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct - https://huggingface.co/CohereForAI/aya-23-8B ### 2. HF_TOKEN ์„ค์ • ์Šน์ธ ํ›„ HF_TOKEN์„ .env ํŒŒ์ผ์— ์„ค์ • (์œ„ ์ฐธ์กฐ) ### 3. Space Secrets ์„ค์ • (HF Spaces) Space Settings โ†’ Repository secrets: - Name: `HF_TOKEN` - Value: `your_token_here` ## โš ๏ธ ์ œํ•œ์‚ฌํ•ญ ๋ฐ ์•Œ๋ ค์ง„ ์ด์Šˆ ### ๊ณตํ†ต - **๋ชจ๋ธ ํฌ๊ธฐ**: 2-70GB (๋กœ๋”ฉ ์‹œ๊ฐ„ ํ•„์š”) - **์ปจํ…์ŠคํŠธ**: ๋Œ€ํ™” ํžˆ์Šคํ† ๋ฆฌ ์œ ์ง€ (์ตœ๊ทผ 3ํ„ด) - **๋ฉ”๋ชจ๋ฆฌ**: ํฐ ๋ชจ๋ธ์€ GPU/๊ณ ์šฉ๋Ÿ‰ RAM ํ•„์š” ### ํ™˜๊ฒฝ๋ณ„ ์ œ์•ฝ **HF Spaces - ZeroGPU**: - ์ผ์ผ ํ•œ๋„: 25๋ถ„ (PRO ๊ตฌ๋… ํ•„์š”) - ๋Œ€๊ธฐ์—ด: ์‚ฌ์šฉ์ž ๋งŽ์„ ๊ฒฝ์šฐ ๋Œ€๊ธฐ - ๋น„์šฉ: $9/month **HF Spaces - CPU Upgrade**: - ๋А๋ฆฐ ์†๋„: GPU ๋Œ€๋น„ 10-30๋ฐฐ ๋А๋ฆผ - ๋น„์šฉ: ์‹œ๊ฐ„๋‹น $0.03 ($22/month) - ๋ฉ”๋ชจ๋ฆฌ: 32GB RAM (๋Œ€ํ˜• ๋ชจ๋ธ ์ œ์•ฝ) **HF Spaces - CPU Basic**: - ๋งค์šฐ ๋А๋ฆผ: 1-2๋ถ„ ์‘๋‹ต - ์ œํ•œ์  ์‚ฌ์šฉ - ๊ฒฝ๋Ÿ‰ ๋ชจ๋ธ ๊ถŒ์žฅ **๋กœ์ปฌ ํ™˜๊ฒฝ**: - GPU ๋ฉ”๋ชจ๋ฆฌ: ํฐ ๋ชจ๋ธ์€ VRAM ๋ถ€์กฑ ๊ฐ€๋Šฅ - ์ตœ์‹  GPU: PyTorch nightly ํ•„์š” (RTX 5080 ๋“ฑ) - CPU ๋ชจ๋“œ: ๋งค์šฐ ๋А๋ฆผ (1-3๋ถ„ ์‘๋‹ต) ### ์•Œ๋ ค์ง„ ์ด์Šˆ ๋ฐ ํ•ด๊ฒฐ๋ฐฉ๋ฒ• **"CUDA has been initialized" ์˜ค๋ฅ˜ (ZeroGPU)**: - **์›์ธ**: torch ์ „์— spaces import ํ•„์š” - **ํ•ด๊ฒฐ**: app.py์—์„œ spaces๋ฅผ ๊ฐ€์žฅ ๋จผ์ € import (์ด๋ฏธ ์ ์šฉ๋จ) **RTX 5080 ๋“ฑ Blackwell GPU์—์„œ CUDA ์˜ค๋ฅ˜**: - **์›์ธ**: CUDA 12.8+ ํ•„์š” (PyTorch 2.2.0์€ ๋ฏธ์ง€์›) - **ํ•ด๊ฒฐ**: PyTorch nightly ์„ค์น˜ (์œ„ ์„ค์น˜ ์„น์…˜ ์ฐธ์กฐ) **GPU ๊ฐ์ง€๋˜์ง€๋งŒ CPU ๋ชจ๋“œ๋กœ ๋™์ž‘**: - **์›์ธ**: CUDA ํ˜ธํ™˜์„ฑ ํ…Œ์ŠคํŠธ ์‹คํŒจ - **ํ•ด๊ฒฐ**: PyTorch ๋ฒ„์ „ ํ™•์ธ, CUDA ๋“œ๋ผ์ด๋ฒ„ ์—…๋ฐ์ดํŠธ ## ๐Ÿ”— ๊ด€๋ จ ๋ฆฌ์†Œ์Šค ### ๋ชจ๋ธ ์นด๋“œ - [EXAONE 3.5](https://huggingface.co/LGAI-EXAONE) - [Llama 3 Open-Ko](https://huggingface.co/beomi/Llama-3-Open-Ko-8B) - [Qwen2.5](https://huggingface.co/Qwen) - [SOLAR](https://huggingface.co/upstage/SOLAR-10.7B-Instruct-v1.0) ### ๋ฌธ์„œ - [ZeroGPU Documentation](https://huggingface.co/docs/hub/spaces-zerogpu) - [Gradio Documentation](https://www.gradio.app/docs) - [HF Spaces Config](https://huggingface.co/docs/hub/spaces-config-reference) - [HF Spaces Pricing](https://huggingface.co/pricing) ## ๐Ÿ“„ ๋ผ์ด์„ ์Šค MIT License ## ๐Ÿ™‹โ€โ™‚๏ธ ๋ฌธ์˜ ์ด์Šˆ๋‚˜ ์งˆ๋ฌธ์ด ์žˆ์œผ์‹œ๋ฉด GitHub Issues๋ฅผ ํ†ตํ•ด ๋ฌธ์˜ํ•ด์ฃผ์„ธ์š”. --- **๐Ÿ’ก TIP**: - ๋น ๋ฅธ ํ…Œ์ŠคํŠธ: EXAONE 2.4B โšก - ๊ท ํ˜•์žกํžŒ ์„ฑ๋Šฅ: EXAONE 7.8B โญ - ์ตœ๊ณ  ํ’ˆ์งˆ: Llama 3.1 70B ๐Ÿ”’ (๋А๋ฆผ)