simple-chat / README.md
alex4cip's picture
feat: Add flexible PyTorch installation for local vs HF Spaces
51c066f
|
raw
history blame
8.43 kB
metadata
title: Multi-Model Korean LLM Chatbot
emoji: ๐Ÿค–
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
license: mit

๐Ÿค– Multi-Model Korean LLM Chatbot

13๊ฐœ์˜ ๋‹ค์–‘ํ•œ ํ•œ๊ตญ์–ด LLM ๋ชจ๋ธ์„ ์„ ํƒํ•˜์—ฌ ๋Œ€ํ™”ํ•  ์ˆ˜ ์žˆ๋Š” ๋ฉ€ํ‹ฐ๋ชจ๋ธ ์ฑ—๋ด‡์ž…๋‹ˆ๋‹ค. ZeroGPU์™€ CPU Upgrade ํ•˜๋“œ์›จ์–ด๋ฅผ ๋ชจ๋‘ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค.

โœจ ์ฃผ์š” ํŠน์ง•

  • ๐ŸŽฏ 13๊ฐœ ๋ชจ๋ธ ์„ ํƒ: ๋‹ค์–‘ํ•œ ํฌ๊ธฐ์™€ ํŠน์„ฑ์˜ LLM ๋ชจ๋ธ ์ง€์›
  • ๐Ÿ‡ฐ๐Ÿ‡ท ํ•œ๊ธ€ ์ตœ์ ํ™”: ํ•œ๊ตญ์–ด ์„ฑ๋Šฅ์ด ์šฐ์ˆ˜ํ•œ ๋ชจ๋ธ๋“ค๋กœ ๊ตฌ์„ฑ
  • โšก ์œ ์—ฐํ•œ ํ•˜๋“œ์›จ์–ด: ZeroGPU/CPU Upgrade ์ž๋™ ๊ฐ์ง€
  • ๐Ÿ’พ ์บ์‹œ ์‹œ์Šคํ…œ: ๋ชจ๋ธ ์žฌ๋‹ค์šด๋กœ๋“œ ๋ฐฉ์ง€, ๋น ๋ฅธ ๋กœ๋”ฉ
  • ๐Ÿ”„ Lazy Loading: ์„ ํƒํ•œ ๋ชจ๋ธ๋งŒ ๋กœ๋“œํ•˜์—ฌ ๋ฆฌ์†Œ์Šค ์ ˆ์•ฝ

๐ŸŽฏ ์ง€์› ๋ชจ๋ธ (13๊ฐœ)

๐ŸŒŸ ์ถ”์ฒœ ํ•œ๊ตญ์–ด ๋ชจ๋ธ

๋ชจ๋ธ ํฌ๊ธฐ ํŠน์ง• ์ƒํƒœ
EXAONE 3.5 7.8B 7.3GB โญ ํŒŒ๋ผ๋ฏธํ„ฐ ๋Œ€๋น„ ์ตœ๊ณ  ํšจ์œจ Public
EXAONE 3.5 2.4B 2.2GB โšก ์ดˆ๊ฒฝ๋Ÿ‰, ๋น ๋ฅธ ์‘๋‹ต Public
Llama-3 Open-Ko 8B 7.5GB ๐Ÿ”ฅ Llama 3 ์ƒํƒœ๊ณ„ Public

๐Ÿ“š ์ „์ฒด ๋ชจ๋ธ ๋ชฉ๋ก

Public ๋ชจ๋ธ (10๊ฐœ)

  1. LGAI-EXAONE/EXAONE-3.5-7.8B-Instruct
  2. LGAI-EXAONE/EXAONE-3.5-2.4B-Instruct
  3. beomi/Llama-3-Open-Ko-8B
  4. Qwen/Qwen2.5-7B-Instruct
  5. Qwen/Qwen2.5-14B-Instruct
  6. 01-ai/Yi-1.5-9B-Chat
  7. 01-ai/Yi-1.5-34B-Chat
  8. mistralai/Mistral-7B-Instruct-v0.3
  9. upstage/SOLAR-10.7B-Instruct-v1.0
  10. EleutherAI/polyglot-ko-5.8b

Gated ๋ชจ๋ธ (3๊ฐœ) ๐Ÿ”’

  1. meta-llama/Llama-3.1-8B-Instruct
  2. meta-llama/Llama-3.1-70B-Instruct
  3. CohereForAI/aya-23-8B

์ฐธ๊ณ : Gated ๋ชจ๋ธ์€ Hugging Face์—์„œ ๋ณ„๋„ ์Šน์ธ ํ•„์š”

๐Ÿš€ ํ•˜๋“œ์›จ์–ด ์˜ต์…˜

Option 1: ZeroGPU (์ถ”์ฒœ)

์žฅ์ :

  • โšก ๋น ๋ฅธ ์‘๋‹ต (3-10์ดˆ)
  • ๐Ÿ’ฐ ์ €๋ ดํ•œ ๋น„์šฉ ($9/month)
  • ๐Ÿ”‹ ์ž๋™ GPU ํ• ๋‹น/ํ•ด์ œ

์ œ์•ฝ:

  • ํ•˜๋ฃจ 25๋ถ„ ๋ฌด๋ฃŒ ์‚ฌ์šฉ (PRO ๊ตฌ๋… ํ•„์š”)
  • ๋Œ€๊ธฐ์—ด ๊ฐ€๋Šฅ (์‚ฌ์šฉ์ž ๋งŽ์„ ๊ฒฝ์šฐ)

๋น„์šฉ: $9/month (PRO ๊ตฌ๋…)

Option 2: CPU Upgrade

์žฅ์ :

  • โฐ ๋ฌด์ œํ•œ ์‚ฌ์šฉ
  • ๐Ÿ“Š ์˜ˆ์ธก ๊ฐ€๋Šฅํ•œ ์„ฑ๋Šฅ
  • ๐Ÿ”ง ๊ฐ„๋‹จํ•œ ์„ค์ •

์ œ์•ฝ:

  • ๐Ÿข ๋А๋ฆฐ ์‘๋‹ต (15์ดˆ~2๋ถ„)
  • ๐Ÿ’ต ์ƒ๋Œ€์ ์œผ๋กœ ๋น„์‹ผ ๋น„์šฉ

๋น„์šฉ: $0.03/hour (์›” ์•ฝ $22)

โš™๏ธ ํ•˜๋“œ์›จ์–ด ์„ค์ • ๋ฐฉ๋ฒ•

ZeroGPU๋กœ ๋ณ€๊ฒฝ

  1. Space Settings โ†’ Hardware
  2. ZeroGPU ์„ ํƒ
  3. Confirm
  4. ๋นŒ๋“œ ์™„๋ฃŒ ๋Œ€๊ธฐ (1-2๋ถ„)

โ†’ UI์— "ZeroGPU" ํ‘œ์‹œ ํ™•์ธ

CPU Upgrade๋กœ ๋ณ€๊ฒฝ

  1. Space Settings โ†’ Hardware
  2. CPU Upgrade (8 vCPU / 32 GB) ์„ ํƒ
  3. Confirm
  4. ๋นŒ๋“œ ์™„๋ฃŒ ๋Œ€๊ธฐ (1-2๋ถ„)

โ†’ UI์— "CPU Upgrade" ํ‘œ์‹œ ํ™•์ธ

๐Ÿ“Š ์„ฑ๋Šฅ ๋น„๊ต

ํ•ญ๋ชฉ ZeroGPU CPU Upgrade
์ฒซ ์‘๋‹ต 10-20์ดˆ 1-3๋ถ„
์ดํ›„ ์‘๋‹ต 3-10์ดˆ 15์ดˆ~2๋ถ„
์ผ์ผ ํ•œ๋„ 25๋ถ„ ๋ฌด์ œํ•œ
์›” ๋น„์šฉ $9 $22
GPU H200 (70GB) ์—†์Œ
RAM - 32GB

๐Ÿ”ง ๊ธฐ์ˆ  ๊ตฌ์กฐ

์ž๋™ ํ•˜๋“œ์›จ์–ด ๊ฐ์ง€

# ZeroGPU ์‚ฌ์šฉ ๊ฐ€๋Šฅ ์—ฌ๋ถ€ ์ž๋™ ๊ฐ์ง€
try:
    import spaces
    ZEROGPU_AVAILABLE = True
except ImportError:
    ZEROGPU_AVAILABLE = False

# ์กฐ๊ฑด๋ถ€ decorator ์ ์šฉ
if ZEROGPU_AVAILABLE:
    @spaces.GPU(duration=120)
    def generate_response(messages):
        return generate_response_impl(messages)
else:
    def generate_response(messages):
        return generate_response_impl(messages)

Lazy Loading ์‹œ์Šคํ…œ

  • ์„ ํƒํ•œ ๋ชจ๋ธ๋งŒ ๋ฉ”๋ชจ๋ฆฌ์— ๋กœ๋“œ
  • ๋ชจ๋ธ ์ „ํ™˜ ์‹œ ์ด์ „ ๋ชจ๋ธ ์ž๋™ ์–ธ๋กœ๋“œ
  • ์บ์‹œ ํ™•์ธ์œผ๋กœ ์žฌ๋‹ค์šด๋กœ๋“œ ๋ฐฉ์ง€
  • ๋””์Šคํฌ์—์„œ ๋น ๋ฅธ ๋กœ๋”ฉ (์บ์‹œ๋œ ๊ฒฝ์šฐ)

์บ์‹œ ๊ด€๋ฆฌ

def check_model_cached(model_name):
    """Check if model is already downloaded in HF cache"""
    from huggingface_hub import scan_cache_dir
    cache_info = scan_cache_dir()

    for repo in cache_info.repos:
        if repo.repo_id == model_name:
            return True
    return False

๐Ÿ“ ์‚ฌ์šฉ ๋ฐฉ๋ฒ•

1. Space ์ ‘์†

https://huggingface.co/spaces/catchitplay/simple-chatbot-gradio

2. ๋ชจ๋ธ ์„ ํƒ

  • ๋“œ๋กญ๋‹ค์šด์—์„œ ์›ํ•˜๋Š” ๋ชจ๋ธ ์„ ํƒ
  • ์บ์‹œ ์ƒํƒœ ํ™•์ธ (๐Ÿ’พ ์บ์‹œ๋จ / ๐Ÿ“ฅ ๋‹ค์šด๋กœ๋“œ ํ•„์š”)
  • ์ฒซ ์‚ฌ์šฉ ์‹œ ๋ชจ๋ธ ๋‹ค์šด๋กœ๋“œ (2-14GB, 5-20๋ถ„)

3. ๋Œ€ํ™” ์‹œ์ž‘

์•ˆ๋…•ํ•˜์„ธ์š”
์ธ๊ณต์ง€๋Šฅ์— ๋Œ€ํ•ด ์„ค๋ช…ํ•ด์ฃผ์„ธ์š”
ํ•œ๊ตญ์˜ ์ˆ˜๋„๋Š” ์–ด๋””์ธ๊ฐ€์š”?

๐Ÿ’ก ๋ชจ๋ธ ์„ ํƒ ๊ฐ€์ด๋“œ

๋น ๋ฅธ ์‘๋‹ต์ด ํ•„์š”ํ•œ ๊ฒฝ์šฐ

  • EXAONE 3.5 2.4B โšก (2.2GB) - ๊ฐ€์žฅ ๋น ๋ฆ„
  • Mistral 7B (7GB) - ๊ฒฝ๋Ÿ‰ ๋ชจ๋ธ

ํ’ˆ์งˆ ์ค‘์‹œ

  • EXAONE 3.5 7.8B โญ (7.3GB) - ํšจ์œจ์„ฑ ์ตœ๊ณ 
  • Qwen2.5 14B (14GB) - ๋‹ค๊ตญ์–ด ๊ฐ•์ 
  • SOLAR 10.7B (10GB) - ํ•œ๊ตญ์–ด ํŠนํ™”

์ตœ๊ณ  ์„ฑ๋Šฅ (๋А๋ฆผ)

  • Llama 3.1 70B ๐Ÿ”’ (70GB) - ์ตœ๊ณ  ํ’ˆ์งˆ
  • Yi 1.5 34B (34GB) - ๊ธด ๋ฌธ๋งฅ

Llama ์ƒํƒœ๊ณ„

  • Llama-3 Open-Ko 8B ๐Ÿ”ฅ (7.5GB)
  • Llama 3.1 8B ๐Ÿ”’ (8GB)

๐Ÿ“ฆ ๋กœ์ปฌ ์‹คํ–‰

์„ค์น˜

# ์ €์žฅ์†Œ ํด๋ก 
git clone https://github.com/catchitplay/simple-chatbot-gradio.git
cd simple-chatbot-gradio

# ๊ฐ€์ƒํ™˜๊ฒฝ ์ƒ์„ฑ (๊ถŒ์žฅ)
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

# ์˜์กด์„ฑ ์„ค์น˜ (3๊ฐ€์ง€ ๋ฐฉ๋ฒ•)

๋ฐฉ๋ฒ• 1: ๋กœ์ปฌ ์ „์šฉ requirements (๊ถŒ์žฅ)

pip install -r requirements-local.txt
# ์ตœ์‹  PyTorch ๋ฒ„์ „ ์‚ฌ์šฉ (ZeroGPU ์ œ์•ฝ ์—†์Œ)

๋ฐฉ๋ฒ• 2: ์ž๋™ ํ™˜๊ฒฝ ๊ฐ์ง€ ์„ค์น˜

python setup.py
# ํ™˜๊ฒฝ์„ ์ž๋™ ๊ฐ์ง€ํ•˜๊ณ  ์ ์ ˆํ•œ ๋ฒ„์ „ ์„ค์น˜

๋ฐฉ๋ฒ• 3: HF Spaces์šฉ requirements

pip install -r requirements.txt
# PyTorch 2.2.0 (ZeroGPU ํ˜ธํ™˜)

.env ํŒŒ์ผ ์„ค์ •

# .env ํŒŒ์ผ ์ƒ์„ฑ
echo "HF_TOKEN=your_hugging_face_token" > .env

HF_TOKEN ๋ฐœ๊ธ‰ ๋ฐฉ๋ฒ•:

  1. https://huggingface.co/settings/tokens ์ ‘์†
  2. "New token" ํด๋ฆญ
  3. "Read" ๊ถŒํ•œ ์„ ํƒ
  4. ์ƒ์„ฑ๋œ ํ† ํฐ ๋ณต์‚ฌ

์‹คํ–‰

python app.py

๋ธŒ๋ผ์šฐ์ €์—์„œ http://localhost:7860 ์ ‘์†

์ฐธ๊ณ :

  • ๋กœ์ปฌ์€ CPU/GPU ์ž๋™ ๊ฐ์ง€
  • GPU ๊ถŒ์žฅ (CUDA ํ•„์š”)
  • ์ฒซ ์‹คํ–‰ ์‹œ ๋ชจ๋ธ ๋‹ค์šด๋กœ๋“œ (์‹œ๊ฐ„ ์†Œ์š”)

๐Ÿ› ๏ธ ๊ธฐ์ˆ  ์Šคํƒ

  • ํ”„๋ ˆ์ž„์›Œํฌ: Gradio 5.49.1
  • ML ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ: Transformers 4.57.1, PyTorch 2.2.0 (ZeroGPU ํ˜ธํ™˜)
  • GPU ์ธํ”„๋ผ: Hugging Face ZeroGPU (์„ ํƒ์ )
  • ์–ธ์–ด: Python 3.10+

๐Ÿ“š Dependencies

gradio==5.49.1
transformers==4.57.1
torch==2.2.0  # ZeroGPU compatible (supports 2.0.0-2.2.0)
safetensors==0.6.2
accelerate==0.26.1
sentencepiece==0.2.0
protobuf==4.25.1
huggingface-hub>=0.19.0
python-dotenv==1.0.0
spaces  # ZeroGPU support

๐Ÿ”’ Gated ๋ชจ๋ธ ์‚ฌ์šฉ๋ฒ•

1. ๋ชจ๋ธ ์Šน์ธ ์š”์ฒญ

๊ฐ Gated ๋ชจ๋ธ ํŽ˜์ด์ง€์—์„œ "Request Access" ํด๋ฆญ:

2. HF_TOKEN ์„ค์ •

์Šน์ธ ํ›„ HF_TOKEN์„ .env ํŒŒ์ผ์— ์„ค์ • (์œ„ ์ฐธ์กฐ)

3. Space Secrets ์„ค์ • (HF Spaces)

Space Settings โ†’ Repository secrets:

  • Name: HF_TOKEN
  • Value: your_token_here

โš ๏ธ ์ œํ•œ์‚ฌํ•ญ

๊ณตํ†ต

  • ๋ชจ๋ธ ํฌ๊ธฐ: 2-70GB (๋กœ๋”ฉ ์‹œ๊ฐ„ ํ•„์š”)
  • ์ปจํ…์ŠคํŠธ: ๋Œ€ํ™” ํžˆ์Šคํ† ๋ฆฌ ์œ ์ง€
  • ๋ฉ”๋ชจ๋ฆฌ: ํฐ ๋ชจ๋ธ์€ GPU/๊ณ ์šฉ๋Ÿ‰ RAM ํ•„์š”

ZeroGPU ์ „์šฉ

  • ์ผ์ผ ํ•œ๋„: 25๋ถ„ (PRO ๊ตฌ๋…)
  • ๋Œ€๊ธฐ์—ด: ์‚ฌ์šฉ์ž ๋งŽ์„ ๊ฒฝ์šฐ ๋Œ€๊ธฐ
  • PRO ํ•„์š”: $9/month ๊ตฌ๋… ํ•„์š”

CPU Upgrade ์ „์šฉ

  • ๋А๋ฆฐ ์†๋„: GPU ๋Œ€๋น„ 10-30๋ฐฐ ๋А๋ฆผ
  • ๋น„์šฉ: ์‹œ๊ฐ„๋‹น $0.03 ($22/month)
  • ๋ฉ”๋ชจ๋ฆฌ ์ œ์•ฝ: 32GB RAM (๋Œ€ํ˜• ๋ชจ๋ธ ์ œ์•ฝ)

๐Ÿ”— ๊ด€๋ จ ๋ฆฌ์†Œ์Šค

๋ชจ๋ธ ์นด๋“œ

๋ฌธ์„œ

๐Ÿ“„ ๋ผ์ด์„ ์Šค

MIT License

๐Ÿ™‹โ€โ™‚๏ธ ๋ฌธ์˜

์ด์Šˆ๋‚˜ ์งˆ๋ฌธ์ด ์žˆ์œผ์‹œ๋ฉด GitHub Issues๋ฅผ ํ†ตํ•ด ๋ฌธ์˜ํ•ด์ฃผ์„ธ์š”.


๐Ÿ’ก TIP:

  • ๋น ๋ฅธ ํ…Œ์ŠคํŠธ: EXAONE 2.4B โšก
  • ๊ท ํ˜•์žกํžŒ ์„ฑ๋Šฅ: EXAONE 7.8B โญ
  • ์ตœ๊ณ  ํ’ˆ์งˆ: Llama 3.1 70B ๐Ÿ”’ (๋А๋ฆผ)