alex4cip Claude commited on
Commit
c9ef1fe
Β·
0 Parent(s):

feat: Hugging Face LLM chatbot with multi-language support

Browse files

- Implement local model execution using transformers
- Add 5 models: 3 English (DialoGPT, GPT-2) + 2 Korean (KoGPT-2, KoAlpaca)
- Support both English and Korean conversations
- No API rate limits, fully offline-capable after initial download
- Built with Gradio 5.x for web interface

Features:
- Multiple model selection with automatic chat reset
- Local model caching for improved performance
- Detailed error handling and user feedback
- Comprehensive documentation in README and CLAUDE.md

Technical stack:
- Gradio 5.x for web UI
- Transformers + PyTorch for model inference
- CPU/GPU support with automatic device detection

πŸ€– Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

Files changed (6) hide show
  1. .claude/settings.local.json +13 -0
  2. .gitignore +46 -0
  3. CLAUDE.md +180 -0
  4. README.md +164 -0
  5. app.py +270 -0
  6. requirements.txt +4 -0
.claude/settings.local.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "permissions": {
3
+ "allow": [
4
+ "Bash(python app.py)",
5
+ "Bash(curl -s http://localhost:7860)",
6
+ "Bash(curl -X POST \"https://api-inference.huggingface.co/models/gpt2\" )",
7
+ "Bash(git init)",
8
+ "Bash(git add .)"
9
+ ],
10
+ "deny": [],
11
+ "ask": []
12
+ }
13
+ }
.gitignore ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Environment variables
2
+ .env
3
+ .env.local
4
+
5
+ # Python
6
+ __pycache__/
7
+ *.py[cod]
8
+ *$py.class
9
+ *.so
10
+ .Python
11
+ build/
12
+ develop-eggs/
13
+ dist/
14
+ downloads/
15
+ eggs/
16
+ .eggs/
17
+ lib/
18
+ lib64/
19
+ parts/
20
+ sdist/
21
+ var/
22
+ wheels/
23
+ *.egg-info/
24
+ .installed.cfg
25
+ *.egg
26
+
27
+ # Virtual environments
28
+ venv/
29
+ env/
30
+ ENV/
31
+ .venv
32
+
33
+ # IDE
34
+ .vscode/
35
+ .idea/
36
+ *.swp
37
+ *.swo
38
+ *~
39
+
40
+ # Gradio
41
+ gradio_cached_examples/
42
+ flagged/
43
+
44
+ # OS
45
+ .DS_Store
46
+ Thumbs.db
CLAUDE.md ADDED
@@ -0,0 +1,180 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # CLAUDE.md
2
+
3
+ This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4
+
5
+ ## Project Overview
6
+
7
+ Multi-model LLM chatbot using Hugging Face Inference API and Gradio. Users can select from multiple pre-configured models and have conversations with them. Model changes automatically reset the conversation.
8
+
9
+ ## Tech Stack
10
+
11
+ - **Python**: 3.10+
12
+ - **Framework**: Gradio 5.x (ChatInterface + Blocks)
13
+ - **API**: Hugging Face Serverless Inference API (free tier)
14
+ - **Deployment**: Hugging Face Spaces (free CPU instance)
15
+
16
+ ## Project Structure
17
+
18
+ ```
19
+ β”œβ”€β”€ app.py # Main application
20
+ β”œβ”€β”€ requirements.txt # Python dependencies
21
+ β”œβ”€β”€ README.md # Spaces configuration + documentation
22
+ β”œβ”€β”€ .env # HF_TOKEN (git ignored)
23
+ └── CLAUDE.md # This file
24
+ ```
25
+
26
+ ## Development Commands
27
+
28
+ ### Local Development
29
+
30
+ ```bash
31
+ # Install dependencies
32
+ pip install -r requirements.txt
33
+
34
+ # Run locally (requires HF_TOKEN in .env)
35
+ python app.py
36
+
37
+ # Access at http://localhost:7860
38
+ ```
39
+
40
+ ### Deployment to Hugging Face Spaces
41
+
42
+ **Method 1: Web UI**
43
+ 1. Create Space at https://huggingface.co/spaces
44
+ 2. Select Gradio SDK
45
+ 3. Upload `app.py`, `requirements.txt`, `README.md`
46
+ 4. Add `HF_TOKEN` to Settings β†’ Repository secrets
47
+
48
+ **Method 2: Git Push**
49
+ ```bash
50
+ git remote add space https://huggingface.co/spaces/<username>/<space-name>
51
+ git push space main
52
+ ```
53
+
54
+ ## Architecture
55
+
56
+ ### Core Components
57
+
58
+ **`app.py` Structure**:
59
+ - `MODELS` dict: Model configurations (ID, display name, parameters)
60
+ - `chat_response()`: Main inference function handling multiple model types
61
+ - `on_model_change()`: Clears chat when model selection changes
62
+ - Gradio Blocks: UI composition with model dropdown + ChatInterface
63
+
64
+ **Model Handling Patterns**:
65
+ - **DialoGPT**: Text continuation with conversation history formatting
66
+ - **BlenderBot**: Conversational API with single-turn context
67
+ - **Flan-T5**: Instruction-based text generation with prompt engineering
68
+ - **Zephyr**: Chat completion API with message history formatting
69
+
70
+ **State Management**:
71
+ - Global `current_model` tracks selected model
72
+ - Model change triggers chat history reset via Gradio event handlers
73
+ - Each model type uses appropriate API method from `InferenceClient`
74
+
75
+ ### API Integration
76
+
77
+ **Hugging Face InferenceClient Usage**:
78
+ ```python
79
+ client = InferenceClient(token=HF_TOKEN)
80
+
81
+ # Different methods for different model types
82
+ client.text_generation() # DialoGPT, Flan-T5
83
+ client.conversational() # BlenderBot
84
+ client.chat_completion() # Zephyr (chat models)
85
+ ```
86
+
87
+ **Rate Limiting & Error Handling**:
88
+ - Free tier: ~100-300 requests/hour
89
+ - Graceful degradation with user-friendly error messages
90
+ - Timeout and rate limit detection in exception handling
91
+
92
+ ## Environment Setup
93
+
94
+ **Required Environment Variable**:
95
+ ```bash
96
+ HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
97
+ ```
98
+
99
+ **Obtaining HF_TOKEN**:
100
+ 1. Login to https://huggingface.co
101
+ 2. Settings β†’ Access Tokens
102
+ 3. Create new token with "Read" permissions
103
+ 4. Copy to `.env` file (local) or Space secrets (deployment)
104
+
105
+ ## Adding New Models
106
+
107
+ 1. **Add to MODELS dict** in [app.py:23-45](app.py#L23-L45):
108
+ ```python
109
+ "model-org/model-name": {
110
+ "name": "Display Name",
111
+ "max_length": 512,
112
+ "temperature": 0.7,
113
+ }
114
+ ```
115
+
116
+ 2. **Update chat_response()** if model requires special handling:
117
+ - Check model name in conditional logic
118
+ - Use appropriate InferenceClient method
119
+ - Format prompt/messages according to model requirements
120
+
121
+ 3. **Verify free tier compatibility**:
122
+ - Test model availability via Inference API
123
+ - Check rate limits and response times
124
+ - Update README.md model list
125
+
126
+ ## UI Customization
127
+
128
+ **Changing Language**:
129
+ - All UI strings are in Korean by default
130
+ - Modify markdown strings and button labels in [app.py:140-220](app.py#L140-L220)
131
+
132
+ **Theme & Styling**:
133
+ ```python
134
+ gr.Blocks(theme=gr.themes.Soft()) # Change theme here
135
+ ```
136
+
137
+ **Chat Examples**:
138
+ - Modify `examples` parameter in ChatInterface [app.py:187-192](app.py#L187-L192)
139
+
140
+ ## Common Issues
141
+
142
+ **"Rate limit exceeded"**:
143
+ - Free tier limitation, wait ~1 hour or upgrade to PRO ($9/month)
144
+
145
+ **Model timeout/unavailable**:
146
+ - High demand on free tier, try different model or retry later
147
+
148
+ **Space sleeping**:
149
+ - Spaces sleep after inactivity, first load may be slow
150
+
151
+ ## Testing Locally
152
+
153
+ ```bash
154
+ # Ensure .env exists with HF_TOKEN
155
+ python app.py
156
+
157
+ # Test each model:
158
+ # 1. Select model from dropdown
159
+ # 2. Send test message
160
+ # 3. Verify response generation
161
+ # 4. Change model and verify chat resets
162
+ ```
163
+
164
+ ## Deployment Notes
165
+
166
+ **README.md YAML Header**:
167
+ - Required for Spaces configuration
168
+ - Specifies SDK, Python version, app file
169
+ - Auto-detected by Hugging Face
170
+
171
+ **Environment Variables in Spaces**:
172
+ - Set via Settings β†’ Repository secrets
173
+ - Name must match exactly: `HF_TOKEN`
174
+ - Never commit tokens to repository
175
+
176
+ **Free Tier Constraints**:
177
+ - CPU only (no GPU)
178
+ - Auto-sleep after inactivity
179
+ - Rate limits on API calls
180
+ - May experience slower inference
README.md ADDED
@@ -0,0 +1,164 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: LLM Chatbot
3
+ emoji: πŸ€–
4
+ colorFrom: blue
5
+ colorTo: purple
6
+ sdk: gradio
7
+ sdk_version: 5.9.1
8
+ app_file: app.py
9
+ pinned: false
10
+ license: mit
11
+ ---
12
+
13
+ # πŸ€– Hugging Face LLM Chatbot
14
+
15
+ λ‹€μ–‘ν•œ μ˜€ν”ˆμ†ŒμŠ€ LLM λͺ¨λΈκ³Ό λŒ€ν™”ν•  수 μžˆλŠ” μ›Ή 기반 챗봇 μ• ν”Œλ¦¬μΌ€μ΄μ…˜μž…λ‹ˆλ‹€.
16
+
17
+ ## ✨ μ£Όμš” κΈ°λŠ₯
18
+
19
+ - **닀쀑 λͺ¨λΈ 지원**: 5개 λͺ¨λΈ (μ˜μ–΄ 3개, ν•œκΈ€ 2개)
20
+ - **둜컬 μ‹€ν–‰**: Transformers 라이브러리둜 λ‘œμ»¬μ—μ„œ λͺ¨λΈ μ‹€ν–‰
21
+ - **API μ œν•œ μ—†μŒ**: 인터넷 μ—°κ²° 없이도 μž‘λ™ (첫 λ‹€μš΄λ‘œλ“œ ν›„)
22
+ - **μžλ™ μ„Έμ…˜ 관리**: λͺ¨λΈ λ³€κ²½ μ‹œ λŒ€ν™” μžλ™ μ΄ˆκΈ°ν™”
23
+ - **μ™„μ „ 무료**: API λΉ„μš© μ—†μŒ, μ˜€ν”ˆμ†ŒμŠ€
24
+
25
+ ## 🎯 지원 λͺ¨λΈ
26
+
27
+ ### μ˜μ–΄ λͺ¨λΈ
28
+ 1. **DialoGPT Small** - λΉ λ₯Έ λŒ€ν™”ν˜• λͺ¨λΈ (~350MB)
29
+ 2. **DialoGPT Medium** - κ³ ν’ˆμ§ˆ λŒ€ν™”ν˜• λͺ¨λΈ (~800MB)
30
+ 3. **GPT-2** - λ²”μš© ν…μŠ€νŠΈ 생성 λͺ¨λΈ (~500MB)
31
+
32
+ ### ν•œκΈ€ λͺ¨λΈ
33
+ 4. **KoGPT-2** - SKT의 ν•œκΈ€ νŠΉν™” λͺ¨λΈ (~500MB)
34
+ 5. **KoAlpaca 5.8B** - λŒ€ν™”ν˜• ν•œκΈ€ λͺ¨λΈ, 고사양 ν•„μš” (~12GB)
35
+
36
+ ## πŸš€ 둜컬 μ‹€ν–‰ 방법
37
+
38
+ ### 1. μ €μž₯μ†Œ 클둠
39
+
40
+ ```bash
41
+ git clone <repository-url>
42
+ cd simple-chatbot-gradio
43
+ ```
44
+
45
+ ### 2. μ˜μ‘΄μ„± μ„€μΉ˜
46
+
47
+ ```bash
48
+ pip install -r requirements.txt
49
+ ```
50
+
51
+ ### 3. ν™˜κ²½ λ³€μˆ˜ μ„€μ •
52
+
53
+ `.env` νŒŒμΌμ„ μƒμ„±ν•˜κ³  Hugging Face 토큰을 μΆ”κ°€ν•˜μ„Έμš”:
54
+
55
+ ```
56
+ HF_TOKEN=your_hugging_face_token_here
57
+ ```
58
+
59
+ **Hugging Face 토큰 λ°œκΈ‰ 방법:**
60
+ 1. [Hugging Face](https://huggingface.co)에 둜그인
61
+ 2. Settings β†’ Access Tokens λ©”λ‰΄λ‘œ 이동
62
+ 3. "New token" ν΄λ¦­ν•˜μ—¬ 토큰 생성
63
+ 4. μƒμ„±λœ 토큰을 `.env` νŒŒμΌμ— 볡사
64
+
65
+ ### 4. μ• ν”Œλ¦¬μΌ€μ΄μ…˜ μ‹€ν–‰
66
+
67
+ ```bash
68
+ python app.py
69
+ ```
70
+
71
+ λΈŒλΌμš°μ €μ—μ„œ `http://localhost:7860`으둜 μ ‘μ†ν•˜μ„Έμš”.
72
+
73
+ ## 🌐 Hugging Face Spaces 배포
74
+
75
+ ### 방법 1: μ›Ή UI μ‚¬μš©
76
+
77
+ 1. [Hugging Face Spaces](https://huggingface.co/spaces)에 접속
78
+ 2. "Create new Space" 클릭
79
+ 3. SDK둜 "Gradio" 선택
80
+ 4. 파일 μ—…λ‘œλ“œ:
81
+ - `app.py`
82
+ - `requirements.txt`
83
+ - `README.md`
84
+ 5. Settings β†’ Repository secretsμ—μ„œ `HF_TOKEN` μΆ”κ°€
85
+ 6. μžλ™ λΉŒλ“œ 및 배포 λŒ€κΈ°
86
+
87
+ ### 방법 2: Git μ‚¬μš©
88
+
89
+ ```bash
90
+ # Hugging Face Space μ €μž₯μ†Œλ₯Ό remote둜 μΆ”κ°€
91
+ git remote add space https://huggingface.co/spaces/<username>/<space-name>
92
+
93
+ # 파일 ν‘Έμ‹œ
94
+ git add .
95
+ git commit -m "Initial commit"
96
+ git push space main
97
+ ```
98
+
99
+ ## βš™οΈ 기술 μŠ€νƒ
100
+
101
+ - **ν”„λ ˆμž„μ›Œν¬**: Gradio 5.x
102
+ - **ML 라이브러리**: Transformers, PyTorch
103
+ - **μ–Έμ–΄**: Python 3.10+
104
+ - **μ£Όμš” 라이브러리**:
105
+ - `gradio` - μ›Ή μΈν„°νŽ˜μ΄μŠ€
106
+ - `transformers` - λͺ¨λΈ λ‘œλ”© 및 μΆ”λ‘ 
107
+ - `torch` - λ”₯λŸ¬λ‹ ν”„λ ˆμž„μ›Œν¬
108
+ - `python-dotenv` - ν™˜κ²½ λ³€μˆ˜ 관리
109
+
110
+ ## πŸ“ ν”„λ‘œμ νŠΈ ꡬ쑰
111
+
112
+ ```
113
+ simple-chatbot-gradio/
114
+ β”œβ”€β”€ app.py # 메인 μ• ν”Œλ¦¬μΌ€μ΄μ…˜
115
+ β”œβ”€β”€ requirements.txt # Python μ˜μ‘΄μ„±
116
+ β”œβ”€β”€ README.md # ν”„λ‘œμ νŠΈ λ¬Έμ„œ
117
+ β”œβ”€β”€ .env # ν™˜κ²½ λ³€μˆ˜ (git ignored)
118
+ └── CLAUDE.md # 개발 κ°€μ΄λ“œ
119
+ ```
120
+
121
+ ## ⚠️ μ œν•œμ‚¬ν•­ 및 μ£Όμ˜μ‚¬ν•­
122
+
123
+ ### μ„±λŠ₯
124
+ - **CPU μ‹€ν–‰**: GPU 없이 CPUμ—μ„œ μ‹€ν–‰λ˜λ―€λ‘œ 응닡이 느릴 수 μžˆμŠ΅λ‹ˆλ‹€ (5-10초)
125
+ - **λ©”λͺ¨λ¦¬**: λͺ¨λΈ 크기에 따라 1-8GB RAM ν•„μš”
126
+ - **첫 μ‹€ν–‰**: λͺ¨λΈ λ‹€μš΄λ‘œλ“œλ‘œ μ‹œκ°„ μ†Œμš” (350MB~12GB)
127
+
128
+ ### λͺ¨λΈλ³„ νŠΉμ„±
129
+ - **μ˜μ–΄ λͺ¨λΈ**: ν•œκΈ€ μž…λ ₯ μ‹œ λΆ€μžμ—°μŠ€λŸ¬μš΄ 응닡
130
+ - **ν•œκΈ€ λͺ¨λΈ**: μ˜μ–΄ μž…λ ₯ μ‹œ μ„±λŠ₯ μ €ν•˜
131
+ - **KoAlpaca 5.8B**: 8GB+ RAM ν•„μš”, CPUμ—μ„œ 맀우 느림
132
+
133
+ ### Hugging Face Spaces 배포
134
+ - **무료 tier**: CPU μΈμŠ€ν„΄μŠ€λ§Œ 제곡
135
+ - **Space Sleep**: λΉ„ν™œμ„± μ‹œ μžλ™ sleep, 첫 λ‘œλ”© 느림
136
+ - **λ””μŠ€ν¬ μ œν•œ**: KoAlpaca 같은 큰 λͺ¨λΈμ€ 배포 λΆˆκ°€λŠ₯ν•  수 있음
137
+
138
+ ## πŸ”§ 개발 및 μ»€μŠ€ν„°λ§ˆμ΄μ§•
139
+
140
+ ### λͺ¨λΈ μΆ”κ°€
141
+
142
+ `app.py`의 `MODELS` λ”•μ…”λ„ˆλ¦¬μ— μƒˆ λͺ¨λΈμ„ μΆ”κ°€ν•˜μ„Έμš”:
143
+
144
+ ```python
145
+ MODELS = {
146
+ "your-model-id": {
147
+ "name": "λͺ¨λΈ ν‘œμ‹œ 이름",
148
+ "max_length": 512,
149
+ "temperature": 0.7,
150
+ },
151
+ }
152
+ ```
153
+
154
+ ### UI μ»€μŠ€ν„°λ§ˆμ΄μ§•
155
+
156
+ Gradio Blocks와 ChatInterfaceλ₯Ό μˆ˜μ •ν•˜μ—¬ UIλ₯Ό λ³€κ²½ν•  수 μžˆμŠ΅λ‹ˆλ‹€. μžμ„Έν•œ λ‚΄μš©μ€ [Gradio λ¬Έμ„œ](https://www.gradio.app/docs)λ₯Ό μ°Έκ³ ν•˜μ„Έμš”.
157
+
158
+ ## πŸ“„ λΌμ΄μ„ μŠ€
159
+
160
+ MIT License
161
+
162
+ ## πŸ™‹β€β™‚οΈ 지원
163
+
164
+ μ΄μŠˆλ‚˜ 질문이 μžˆμœΌμ‹œλ©΄ GitHub Issuesλ₯Ό 톡해 λ¬Έμ˜ν•΄μ£Όμ„Έμš”.
app.py ADDED
@@ -0,0 +1,270 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Hugging Face LLM Chatbot with Gradio
3
+ Using transformers library to run models locally
4
+ """
5
+
6
+ import os
7
+ import gradio as gr
8
+ from transformers import AutoModelForCausalLM, AutoTokenizer
9
+ import torch
10
+ from dotenv import load_dotenv
11
+
12
+ # Load environment variables
13
+ load_dotenv()
14
+ HF_TOKEN = os.getenv("HF_TOKEN")
15
+
16
+ # Check device
17
+ device = "cuda" if torch.cuda.is_available() else "cpu"
18
+ print(f"Using device: {device}")
19
+
20
+ # Available models (optimized for local execution)
21
+ MODELS = {
22
+ "microsoft/DialoGPT-small": {
23
+ "name": "DialoGPT Small (μ˜μ–΄, 빠름)",
24
+ "max_length": 80,
25
+ "language": "en",
26
+ },
27
+ "microsoft/DialoGPT-medium": {
28
+ "name": "DialoGPT Medium (μ˜μ–΄, κ³ ν’ˆμ§ˆ)",
29
+ "max_length": 100,
30
+ "language": "en",
31
+ },
32
+ "gpt2": {
33
+ "name": "GPT-2 (μ˜μ–΄, λ²”μš©)",
34
+ "max_length": 80,
35
+ "language": "en",
36
+ },
37
+ "skt/kogpt2-base-v2": {
38
+ "name": "KoGPT-2 (ν•œκΈ€ νŠΉν™”)",
39
+ "max_length": 100,
40
+ "language": "ko",
41
+ },
42
+ "beomi/KoAlpaca-Polyglot-5.8B": {
43
+ "name": "KoAlpaca 5.8B (ν•œκΈ€ λŒ€ν™”ν˜•, 느림)",
44
+ "max_length": 150,
45
+ "language": "ko",
46
+ },
47
+ }
48
+
49
+ # Model cache
50
+ loaded_models = {}
51
+ loaded_tokenizers = {}
52
+
53
+
54
+ def load_model(model_name):
55
+ """Load model and tokenizer"""
56
+ if model_name not in loaded_models:
57
+ try:
58
+ print(f"Loading model: {model_name}")
59
+
60
+ # Load tokenizer
61
+ tokenizer = AutoTokenizer.from_pretrained(
62
+ model_name,
63
+ token=HF_TOKEN,
64
+ padding_side='left'
65
+ )
66
+
67
+ # Add pad token if missing
68
+ if tokenizer.pad_token is None:
69
+ tokenizer.pad_token = tokenizer.eos_token
70
+
71
+ # Load model
72
+ model = AutoModelForCausalLM.from_pretrained(
73
+ model_name,
74
+ token=HF_TOKEN,
75
+ torch_dtype=torch.float32,
76
+ )
77
+ model.to(device)
78
+ model.eval()
79
+
80
+ loaded_models[model_name] = model
81
+ loaded_tokenizers[model_name] = tokenizer
82
+
83
+ print(f"βœ… Model {model_name} loaded successfully")
84
+
85
+ except Exception as e:
86
+ print(f"❌ Failed to load model {model_name}: {e}")
87
+ return None, None
88
+
89
+ return loaded_models.get(model_name), loaded_tokenizers.get(model_name)
90
+
91
+
92
+ def chat_response(message, history, model_name):
93
+ """
94
+ Generate chatbot response
95
+
96
+ Args:
97
+ message: User input
98
+ history: Chat history in Gradio format
99
+ model_name: Selected model
100
+
101
+ Returns:
102
+ Response text
103
+ """
104
+ try:
105
+ # Load model and tokenizer
106
+ model, tokenizer = load_model(model_name)
107
+
108
+ if model is None or tokenizer is None:
109
+ return f"❌ λͺ¨λΈ '{model_name}'을 λ‘œλ“œν•  수 μ—†μŠ΅λ‹ˆλ‹€. λ‹€λ₯Έ λͺ¨λΈμ„ μ„ νƒν•΄μ£Όμ„Έμš”."
110
+
111
+ model_config = MODELS[model_name]
112
+
113
+ # Build conversation context
114
+ conversation = ""
115
+ for msg in history:
116
+ if msg["role"] == "user":
117
+ conversation += f"{msg['content']}\n"
118
+ elif msg["role"] == "assistant":
119
+ conversation += f"{msg['content']}\n"
120
+
121
+ # Add current message
122
+ conversation += f"{message}\n"
123
+
124
+ # Tokenize
125
+ inputs = tokenizer.encode(conversation, return_tensors="pt").to(device)
126
+
127
+ # Generate response
128
+ with torch.no_grad():
129
+ outputs = model.generate(
130
+ inputs,
131
+ max_new_tokens=model_config["max_length"],
132
+ temperature=0.9,
133
+ do_sample=True,
134
+ pad_token_id=tokenizer.pad_token_id,
135
+ eos_token_id=tokenizer.eos_token_id,
136
+ )
137
+
138
+ # Decode response
139
+ response = tokenizer.decode(outputs[0], skip_special_tokens=True)
140
+
141
+ # Remove the input prompt from response
142
+ response = response[len(conversation):].strip()
143
+
144
+ # If empty, return a default message
145
+ if not response:
146
+ response = "I understand. Could you tell me more?"
147
+
148
+ return response
149
+
150
+ except Exception as e:
151
+ import traceback
152
+ error_msg = str(e)
153
+ error_type = type(e).__name__
154
+
155
+ print("=" * 50)
156
+ print(f"Error Type: {error_type}")
157
+ print(f"Error Message: {error_msg}")
158
+ print(f"Traceback:\n{traceback.format_exc()}")
159
+ print("=" * 50)
160
+
161
+ if "out of memory" in error_msg.lower() or "oom" in error_msg.lower():
162
+ return "❌ λ©”λͺ¨λ¦¬ λΆ€μ‘±. 더 μž‘μ€ λͺ¨λΈμ„ μ„ νƒν•˜κ±°λ‚˜ 앱을 μž¬μ‹œμž‘ν•˜μ„Έμš”."
163
+ elif "cuda" in error_msg.lower() and device == "cpu":
164
+ return "⚠️ GPU 없이 CPU둜 μ‹€ν–‰ μ€‘μž…λ‹ˆλ‹€. 응닡이 느릴 수 μžˆμŠ΅λ‹ˆλ‹€."
165
+ else:
166
+ return f"❌ 였λ₯˜: {error_type}\n{error_msg[:200]}\n\nν„°λ―Έλ„μ—μ„œ 전체 둜그λ₯Ό ν™•μΈν•˜μ„Έμš”."
167
+
168
+
169
+ # Global state
170
+ current_model = "microsoft/DialoGPT-small"
171
+
172
+ # Preload default model
173
+ print("Preloading default model...")
174
+ load_model(current_model)
175
+
176
+ # Create Gradio interface
177
+ with gr.Blocks(
178
+ title="πŸ€– Hugging Face Chatbot",
179
+ theme=gr.themes.Soft(),
180
+ ) as demo:
181
+ gr.Markdown(
182
+ """
183
+ # πŸ€– Hugging Face LLM Chatbot
184
+
185
+ **둜컬 λͺ¨λΈ μ‹€ν–‰ 방식** - API μ œν•œ μ—†μŒ!
186
+
187
+ **μ‚¬μš© 방법:**
188
+ 1. λͺ¨λΈμ„ μ„ νƒν•˜μ„Έμš” (μ²˜μŒμ—λŠ” λ‘œλ”© μ‹œκ°„ ν•„μš”)
189
+ 2. λ©”μ‹œμ§€λ₯Ό μž…λ ₯ν•˜κ³  λŒ€ν™”ν•˜μ„Έμš”
190
+ 3. CPUμ—μ„œ μ‹€ν–‰λ˜λ―€λ‘œ 응닡이 쑰금 느릴 수 μžˆμŠ΅λ‹ˆλ‹€
191
+
192
+ **언어별 μΆ”μ²œ λͺ¨λΈ:**
193
+ - πŸ‡¬πŸ‡§ μ˜μ–΄: DialoGPT, GPT-2
194
+ - πŸ‡°πŸ‡· ν•œκΈ€: KoGPT-2, KoAlpaca (5.8BλŠ” 큰 λͺ¨λΈ, 느림)
195
+
196
+ **μž₯점:** API μ œν•œ μ—†μŒ, μ™„μ „ 무료, μ˜€ν”„λΌμΈ μž‘λ™ κ°€λŠ₯
197
+ """
198
+ )
199
+
200
+ # Model selector
201
+ model_dropdown = gr.Dropdown(
202
+ choices=[(config["name"], model_id) for model_id, config in MODELS.items()],
203
+ value="microsoft/DialoGPT-small",
204
+ label="🎯 λͺ¨λΈ 선택",
205
+ info="λͺ¨λΈμ„ λ³€κ²½ν•˜λ©΄ μƒˆ λͺ¨λΈμ„ λ‹€μš΄λ‘œλ“œν•©λ‹ˆλ‹€ (처음 ν•œ 번만)",
206
+ )
207
+
208
+ # Chat interface
209
+ chatbot = gr.ChatInterface(
210
+ fn=chat_response,
211
+ type="messages",
212
+ additional_inputs=[model_dropdown],
213
+ chatbot=gr.Chatbot(
214
+ height=500,
215
+ placeholder="λ©”μ‹œμ§€λ₯Ό μž…λ ₯ν•˜μ„Έμš”...",
216
+ type="messages",
217
+ ),
218
+ textbox=gr.Textbox(
219
+ placeholder="λ©”μ‹œμ§€λ₯Ό μž…λ ₯ν•˜μ„Έμš” (μ˜μ–΄ ꢌμž₯)...",
220
+ container=False,
221
+ scale=7,
222
+ ),
223
+ examples=[
224
+ ["Hello! How are you?", "microsoft/DialoGPT-small"],
225
+ ["Tell me a joke", "microsoft/DialoGPT-medium"],
226
+ ["μ•ˆλ…•ν•˜μ„Έμš”! 였늘 날씨가 μ’‹λ„€μš”.", "skt/kogpt2-base-v2"],
227
+ ["인곡지λŠ₯에 λŒ€ν•΄ μ„€λͺ…ν•΄μ£Όμ„Έμš”.", "skt/kogpt2-base-v2"],
228
+ ],
229
+ )
230
+
231
+ # Clear chat when model changes
232
+ def on_model_change(new_model):
233
+ global current_model
234
+ current_model = new_model
235
+ # Preload new model
236
+ load_model(new_model)
237
+ return None
238
+
239
+ model_dropdown.change(
240
+ fn=on_model_change,
241
+ inputs=[model_dropdown],
242
+ outputs=[chatbot.chatbot],
243
+ )
244
+
245
+ gr.Markdown(
246
+ """
247
+ ---
248
+
249
+ **⚠️ 참고:**
250
+ - λͺ¨λΈμ€ λ‘œμ»¬μ—μ„œ μ‹€ν–‰λ©λ‹ˆλ‹€ (첫 μ‹€ν–‰ μ‹œ λ‹€μš΄λ‘œλ“œ)
251
+ - CPUμ—μ„œ μ‹€ν–‰λ˜λ―€λ‘œ GPU보닀 λŠλ¦½λ‹ˆλ‹€
252
+ - 각 λͺ¨λΈμ€ νŠΉμ • 언어에 μ΅œμ ν™”λ˜μ–΄ μžˆμŠ΅λ‹ˆλ‹€
253
+
254
+ **πŸ’Ύ λ””μŠ€ν¬ μ‚¬μš©λŸ‰:**
255
+ - DialoGPT-small: ~350MB
256
+ - DialoGPT-medium: ~800MB
257
+ - GPT-2: ~500MB
258
+ - KoGPT-2: ~500MB
259
+ - KoAlpaca-5.8B: ~12GB (큰 λͺ¨λΈ, λ©”λͺ¨λ¦¬ 8GB+ ν•„μš”)
260
+
261
+ **πŸ’‘ 팁:**
262
+ - μ˜μ–΄ λŒ€ν™”λŠ” DialoGPT μΆ”μ²œ
263
+ - ν•œκΈ€ λŒ€ν™”λŠ” KoGPT-2 μΆ”μ²œ (KoAlpacaλŠ” λ¦¬μ†ŒμŠ€ μΆ©λΆ„ν•  λ•Œλ§Œ)
264
+ - 짧은 λ¬Έμž₯으둜 λŒ€ν™”ν•˜λ©΄ 더 λ‚˜μ€ κ²°κ³Ό
265
+ - λͺ¨λΈμ΄ ν•œ 번 λ‘œλ“œλ˜λ©΄ λ‹€μ‹œ λ‹€μš΄λ‘œλ“œν•˜μ§€ μ•ŠμŠ΅λ‹ˆλ‹€
266
+ """
267
+ )
268
+
269
+ if __name__ == "__main__":
270
+ demo.launch()
requirements.txt ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ gradio>=5.0.0
2
+ transformers>=4.30.0
3
+ torch>=2.0.0
4
+ python-dotenv>=1.0.0