Spaces:

alex4cip
/

simple-chat

Running on Zero

alex4cip Claude commited on Oct 20, 2025

Commit

c9ef1fe

0 Parent(s):

feat: Hugging Face LLM chatbot with multi-language support

- Implement local model execution using transformers
- Add 5 models: 3 English (DialoGPT, GPT-2) + 2 Korean (KoGPT-2, KoAlpaca)
- Support both English and Korean conversations
- No API rate limits, fully offline-capable after initial download
- Built with Gradio 5.x for web interface

Features:
- Multiple model selection with automatic chat reset
- Local model caching for improved performance
- Detailed error handling and user feedback
- Comprehensive documentation in README and CLAUDE.md

Technical stack:
- Gradio 5.x for web UI
- Transformers + PyTorch for model inference
- CPU/GPU support with automatic device detection

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

Files changed (6) hide show

.claude/settings.local.json +13 -0
.gitignore +46 -0
CLAUDE.md +180 -0
README.md +164 -0
app.py +270 -0
requirements.txt +4 -0

.claude/settings.local.json ADDED Viewed

	@@ -0,0 +1,13 @@

+{
+  "permissions": {
+    "allow": [
+      "Bash(python app.py)",
+      "Bash(curl -s http://localhost:7860)",
+      "Bash(curl -X POST \"https://api-inference.huggingface.co/models/gpt2\" )",
+      "Bash(git init)",
+      "Bash(git add .)"
+    ],
+    "deny": [],
+    "ask": []
+  }
+}

.gitignore ADDED Viewed

	@@ -0,0 +1,46 @@

+# Environment variables
+.env
+.env.local
+# Python
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+# Virtual environments
+venv/
+env/
+ENV/
+.venv
+# IDE
+.vscode/
+.idea/
+*.swp
+*.swo
+*~
+# Gradio
+gradio_cached_examples/
+flagged/
+# OS
+.DS_Store
+Thumbs.db

CLAUDE.md ADDED Viewed

	@@ -0,0 +1,180 @@

+# CLAUDE.md
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+## Project Overview
+Multi-model LLM chatbot using Hugging Face Inference API and Gradio. Users can select from multiple pre-configured models and have conversations with them. Model changes automatically reset the conversation.
+## Tech Stack
+- **Python**: 3.10+
+- **Framework**: Gradio 5.x (ChatInterface + Blocks)
+- **API**: Hugging Face Serverless Inference API (free tier)
+- **Deployment**: Hugging Face Spaces (free CPU instance)
+## Project Structure
+```
+├── app.py              # Main application
+├── requirements.txt    # Python dependencies
+├── README.md          # Spaces configuration + documentation
+├── .env               # HF_TOKEN (git ignored)
+└── CLAUDE.md          # This file
+```
+## Development Commands
+### Local Development
+```bash
+# Install dependencies
+pip install -r requirements.txt
+# Run locally (requires HF_TOKEN in .env)
+python app.py
+# Access at http://localhost:7860
+```
+### Deployment to Hugging Face Spaces
+**Method 1: Web UI**
+1. Create Space at https://huggingface.co/spaces
+2. Select Gradio SDK
+3. Upload `app.py`, `requirements.txt`, `README.md`
+4. Add `HF_TOKEN` to Settings → Repository secrets
+**Method 2: Git Push**
+```bash
+git remote add space https://huggingface.co/spaces/<username>/<space-name>
+git push space main
+```
+## Architecture
+### Core Components
+**`app.py` Structure**:
+- `MODELS` dict: Model configurations (ID, display name, parameters)
+- `chat_response()`: Main inference function handling multiple model types
+- `on_model_change()`: Clears chat when model selection changes
+- Gradio Blocks: UI composition with model dropdown + ChatInterface
+**Model Handling Patterns**:
+- **DialoGPT**: Text continuation with conversation history formatting
+- **BlenderBot**: Conversational API with single-turn context
+- **Flan-T5**: Instruction-based text generation with prompt engineering
+- **Zephyr**: Chat completion API with message history formatting
+**State Management**:
+- Global `current_model` tracks selected model
+- Model change triggers chat history reset via Gradio event handlers
+- Each model type uses appropriate API method from `InferenceClient`
+### API Integration
+**Hugging Face InferenceClient Usage**:
+```python
+client = InferenceClient(token=HF_TOKEN)
+# Different methods for different model types
+client.text_generation()      # DialoGPT, Flan-T5
+client.conversational()        # BlenderBot
+client.chat_completion()       # Zephyr (chat models)
+```
+**Rate Limiting & Error Handling**:
+- Free tier: ~100-300 requests/hour
+- Graceful degradation with user-friendly error messages
+- Timeout and rate limit detection in exception handling
+## Environment Setup
+**Required Environment Variable**:
+```bash
+HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
+```
+**Obtaining HF_TOKEN**:
+1. Login to https://huggingface.co
+2. Settings → Access Tokens
+3. Create new token with "Read" permissions
+4. Copy to `.env` file (local) or Space secrets (deployment)
+## Adding New Models
+1. **Add to MODELS dict** in [app.py:23-45](app.py#L23-L45):
+```python
+"model-org/model-name": {
+    "name": "Display Name",
+    "max_length": 512,
+    "temperature": 0.7,
+}
+```
+2. **Update chat_response()** if model requires special handling:
+   - Check model name in conditional logic
+   - Use appropriate InferenceClient method
+   - Format prompt/messages according to model requirements
+3. **Verify free tier compatibility**:
+   - Test model availability via Inference API
+   - Check rate limits and response times
+   - Update README.md model list
+## UI Customization
+**Changing Language**:
+- All UI strings are in Korean by default
+- Modify markdown strings and button labels in [app.py:140-220](app.py#L140-L220)
+**Theme & Styling**:
+```python
+gr.Blocks(theme=gr.themes.Soft())  # Change theme here
+```
+**Chat Examples**:
+- Modify `examples` parameter in ChatInterface [app.py:187-192](app.py#L187-L192)
+## Common Issues
+**"Rate limit exceeded"**:
+- Free tier limitation, wait ~1 hour or upgrade to PRO ($9/month)
+**Model timeout/unavailable**:
+- High demand on free tier, try different model or retry later
+**Space sleeping**:
+- Spaces sleep after inactivity, first load may be slow
+## Testing Locally
+```bash
+# Ensure .env exists with HF_TOKEN
+python app.py
+# Test each model:
+# 1. Select model from dropdown
+# 2. Send test message
+# 3. Verify response generation
+# 4. Change model and verify chat resets
+```
+## Deployment Notes
+**README.md YAML Header**:
+- Required for Spaces configuration
+- Specifies SDK, Python version, app file
+- Auto-detected by Hugging Face
+**Environment Variables in Spaces**:
+- Set via Settings → Repository secrets
+- Name must match exactly: `HF_TOKEN`
+- Never commit tokens to repository
+**Free Tier Constraints**:
+- CPU only (no GPU)
+- Auto-sleep after inactivity
+- Rate limits on API calls
+- May experience slower inference

README.md ADDED Viewed

	@@ -0,0 +1,164 @@

+---
+title: LLM Chatbot
+emoji: 🤖
+colorFrom: blue
+colorTo: purple
+sdk: gradio
+sdk_version: 5.9.1
+app_file: app.py
+pinned: false
+license: mit
+---
+# 🤖 Hugging Face LLM Chatbot
+다양한 오픈소스 LLM 모델과 대화할 수 있는 웹 기반 챗봇 애플리케이션입니다.
+## ✨ 주요 기능
+- **다중 모델 지원**: 5개 모델 (영어 3개, 한글 2개)
+- **로컬 실행**: Transformers 라이브러리로 로컬에서 모델 실행
+- **API 제한 없음**: 인터넷 연결 없이도 작동 (첫 다운로드 후)
+- **자동 세션 관리**: 모델 변경 시 대화 자동 초기화
+- **완전 무료**: API 비용 없음, 오픈소스
+## 🎯 지원 모델
+### 영어 모델
+1. **DialoGPT Small** - 빠른 대화형 모델 (~350MB)
+2. **DialoGPT Medium** - 고품질 대화형 모델 (~800MB)
+3. **GPT-2** - 범용 텍스트 생성 모델 (~500MB)
+### 한글 모델
+4. **KoGPT-2** - SKT의 한글 특화 모델 (~500MB)
+5. **KoAlpaca 5.8B** - 대화형 한글 모델, 고사양 필요 (~12GB)
+## 🚀 로컬 실행 방법
+### 1. 저장소 클론
+```bash
+git clone <repository-url>
+cd simple-chatbot-gradio
+```
+### 2. 의존성 설치
+```bash
+pip install -r requirements.txt
+```
+### 3. 환경 변수 설정
+`.env` 파일을 생성하고 Hugging Face 토큰을 추가하세요:
+```
+HF_TOKEN=your_hugging_face_token_here
+```
+**Hugging Face 토큰 발급 방법:**
+1. [Hugging Face](https://huggingface.co)에 로그인
+2. Settings → Access Tokens 메뉴로 이동
+3. "New token" 클릭하여 토큰 생성
+4. 생성된 토큰을 `.env` 파일에 복사
+### 4. 애플리케이션 실행
+```bash
+python app.py
+```
+브라우저에서 `http://localhost:7860`으로 접속하세요.
+## 🌐 Hugging Face Spaces 배포
+### 방법 1: 웹 UI 사용
+1. [Hugging Face Spaces](https://huggingface.co/spaces)에 접속
+2. "Create new Space" 클릭
+3. SDK로 "Gradio" 선택
+4. 파일 업로드:
+   - `app.py`
+   - `requirements.txt`
+   - `README.md`
+5. Settings → Repository secrets에서 `HF_TOKEN` 추가
+6. 자동 빌드 및 배포 대기
+### 방법 2: Git 사용
+```bash
+# Hugging Face Space 저장소를 remote로 추가
+git remote add space https://huggingface.co/spaces/<username>/<space-name>
+# 파일 푸시
+git add .
+git commit -m "Initial commit"
+git push space main
+```
+## ⚙️ 기술 스택
+- **프레임워크**: Gradio 5.x
+- **ML 라이브러리**: Transformers, PyTorch
+- **언어**: Python 3.10+
+- **주요 라이브러리**:
+  - `gradio` - 웹 인터페이스
+  - `transformers` - 모델 로딩 및 추론
+  - `torch` - 딥러닝 프레임워크
+  - `python-dotenv` - 환경 변수 관리
+## 📝 프로젝트 구조
+```
+simple-chatbot-gradio/
+├── app.py              # 메인 애플리케이션
+├── requirements.txt    # Python 의존성
+├── README.md          # 프로젝트 문서
+├── .env               # 환경 변수 (git ignored)
+└── CLAUDE.md          # 개발 가이드
+```
+## ⚠️ 제한사항 및 주의사항
+### 성능
+- **CPU 실행**: GPU 없이 CPU에서 실행되므로 응답이 느릴 수 있습니다 (5-10초)
+- **메모리**: 모델 크기에 따라 1-8GB RAM 필요
+- **첫 실행**: 모델 다운로드로 시간 소요 (350MB~12GB)
+### 모델별 특성
+- **영어 모델**: 한글 입력 시 부자연스러운 응답
+- **한글 모델**: 영어 입력 시 성능 저하
+- **KoAlpaca 5.8B**: 8GB+ RAM 필요, CPU에서 매우 느림
+### Hugging Face Spaces 배포
+- **무료 tier**: CPU 인스턴스만 제공
+- **Space Sleep**: 비활성 시 자동 sleep, 첫 로딩 느림
+- **디스크 제한**: KoAlpaca 같은 큰 모델은 배포 불가능할 수 있음
+## 🔧 개발 및 커스터마이징
+### 모델 추가
+`app.py`의 `MODELS` 딕셔너리에 새 모델을 추가하세요:
+```python
+MODELS = {
+    "your-model-id": {
+        "name": "모델 표시 이름",
+        "max_length": 512,
+        "temperature": 0.7,
+    },
+}
+```
+### UI 커스터마이징
+Gradio Blocks와 ChatInterface를 수정하여 UI를 변경할 수 있습니다. 자세한 내용은 [Gradio 문서](https://www.gradio.app/docs)를 참고하세요.
+## 📄 라이선스
+MIT License
+## 🙋‍♂️ 지원
+이슈나 질문이 있으시면 GitHub Issues를 통해 문의해주세요.

app.py ADDED Viewed

	@@ -0,0 +1,270 @@

+"""
+Hugging Face LLM Chatbot with Gradio
+Using transformers library to run models locally
+"""
+import os
+import gradio as gr
+from transformers import AutoModelForCausalLM, AutoTokenizer
+import torch
+from dotenv import load_dotenv
+# Load environment variables
+load_dotenv()
+HF_TOKEN = os.getenv("HF_TOKEN")
+# Check device
+device = "cuda" if torch.cuda.is_available() else "cpu"
+print(f"Using device: {device}")
+# Available models (optimized for local execution)
+MODELS = {
+    "microsoft/DialoGPT-small": {
+        "name": "DialoGPT Small (영어, 빠름)",
+        "max_length": 80,
+        "language": "en",
+    },
+    "microsoft/DialoGPT-medium": {
+        "name": "DialoGPT Medium (영어, 고품질)",
+        "max_length": 100,
+        "language": "en",
+    },
+    "gpt2": {
+        "name": "GPT-2 (영어, 범용)",
+        "max_length": 80,
+        "language": "en",
+    },
+    "skt/kogpt2-base-v2": {
+        "name": "KoGPT-2 (한글 특화)",
+        "max_length": 100,
+        "language": "ko",
+    },
+    "beomi/KoAlpaca-Polyglot-5.8B": {
+        "name": "KoAlpaca 5.8B (한글 대화형, 느림)",
+        "max_length": 150,
+        "language": "ko",
+    },
+}
+# Model cache
+loaded_models = {}
+loaded_tokenizers = {}
+def load_model(model_name):
+    """Load model and tokenizer"""
+    if model_name not in loaded_models:
+        try:
+            print(f"Loading model: {model_name}")
+            # Load tokenizer
+            tokenizer = AutoTokenizer.from_pretrained(
+                model_name,
+                token=HF_TOKEN,
+                padding_side='left'
+            )
+            # Add pad token if missing
+            if tokenizer.pad_token is None:
+                tokenizer.pad_token = tokenizer.eos_token
+            # Load model
+            model = AutoModelForCausalLM.from_pretrained(
+                model_name,
+                token=HF_TOKEN,
+                torch_dtype=torch.float32,
+            )
+            model.to(device)
+            model.eval()
+            loaded_models[model_name] = model
+            loaded_tokenizers[model_name] = tokenizer
+            print(f"✅ Model {model_name} loaded successfully")
+        except Exception as e:
+            print(f"❌ Failed to load model {model_name}: {e}")
+            return None, None
+    return loaded_models.get(model_name), loaded_tokenizers.get(model_name)
+def chat_response(message, history, model_name):
+    """
+    Generate chatbot response
+    Args:
+        message: User input
+        history: Chat history in Gradio format
+        model_name: Selected model
+    Returns:
+        Response text
+    """
+    try:
+        # Load model and tokenizer
+        model, tokenizer = load_model(model_name)
+        if model is None or tokenizer is None:
+            return f"❌ 모델 '{model_name}'을 로드할 수 없습니다. 다른 모델을 선택해주세요."
+        model_config = MODELS[model_name]
+        # Build conversation context
+        conversation = ""
+        for msg in history:
+            if msg["role"] == "user":
+                conversation += f"{msg['content']}\n"
+            elif msg["role"] == "assistant":
+                conversation += f"{msg['content']}\n"
+        # Add current message
+        conversation += f"{message}\n"
+        # Tokenize
+        inputs = tokenizer.encode(conversation, return_tensors="pt").to(device)
+        # Generate response
+        with torch.no_grad():
+            outputs = model.generate(
+                inputs,
+                max_new_tokens=model_config["max_length"],
+                temperature=0.9,
+                do_sample=True,
+                pad_token_id=tokenizer.pad_token_id,
+                eos_token_id=tokenizer.eos_token_id,
+            )
+        # Decode response
+        response = tokenizer.decode(outputs[0], skip_special_tokens=True)
+        # Remove the input prompt from response
+        response = response[len(conversation):].strip()
+        # If empty, return a default message
+        if not response:
+            response = "I understand. Could you tell me more?"
+        return response
+    except Exception as e:
+        import traceback
+        error_msg = str(e)
+        error_type = type(e).__name__
+        print("=" * 50)
+        print(f"Error Type: {error_type}")
+        print(f"Error Message: {error_msg}")
+        print(f"Traceback:\n{traceback.format_exc()}")
+        print("=" * 50)
+        if "out of memory" in error_msg.lower() or "oom" in error_msg.lower():
+            return "❌ 메모리 부족. 더 작은 모델을 선택하거나 앱을 재시작하세요."
+        elif "cuda" in error_msg.lower() and device == "cpu":
+            return "⚠️ GPU 없이 CPU로 실행 중입니다. 응답이 느릴 수 있습니다."
+        else:
+            return f"❌ 오류: {error_type}\n{error_msg[:200]}\n\n터미널에서 전체 로그를 확인하세요."
+# Global state
+current_model = "microsoft/DialoGPT-small"
+# Preload default model
+print("Preloading default model...")
+load_model(current_model)
+# Create Gradio interface
+with gr.Blocks(
+    title="🤖 Hugging Face Chatbot",
+    theme=gr.themes.Soft(),
+) as demo:
+    gr.Markdown(
+        """
+        # 🤖 Hugging Face LLM Chatbot
+        **로컬 모델 실행 방식** - API 제한 없음!
+        **사용 방법:**
+        1. 모델을 선택하세요 (처음에는 로딩 시간 필요)
+        2. 메시지를 입력하고 대화하세요
+        3. CPU에서 실행되므로 응답이 조금 느릴 수 있습니다
+        **언어별 추천 모델:**
+        - 🇬🇧 영어: DialoGPT, GPT-2
+        - 🇰🇷 한글: KoGPT-2, KoAlpaca (5.8B는 큰 모델, 느림)
+        **장점:** API 제한 없음, 완전 무료, 오프라인 작동 가능
+        """
+    )
+    # Model selector
+    model_dropdown = gr.Dropdown(
+        choices=[(config["name"], model_id) for model_id, config in MODELS.items()],
+        value="microsoft/DialoGPT-small",
+        label="🎯 모델 선택",
+        info="모델을 변경하면 새 모델을 다운로드합니다 (처음 한 번만)",
+    )
+    # Chat interface
+    chatbot = gr.ChatInterface(
+        fn=chat_response,
+        type="messages",
+        additional_inputs=[model_dropdown],
+        chatbot=gr.Chatbot(
+            height=500,
+            placeholder="메시지를 입력하세요...",
+            type="messages",
+        ),
+        textbox=gr.Textbox(
+            placeholder="메시지를 입력하세요 (영어 권장)...",
+            container=False,
+            scale=7,
+        ),
+        examples=[
+            ["Hello! How are you?", "microsoft/DialoGPT-small"],
+            ["Tell me a joke", "microsoft/DialoGPT-medium"],
+            ["안녕하세요! 오늘 날씨가 좋네요.", "skt/kogpt2-base-v2"],
+            ["인공지능에 대해 설명해주세요.", "skt/kogpt2-base-v2"],
+        ],
+    )
+    # Clear chat when model changes
+    def on_model_change(new_model):
+        global current_model
+        current_model = new_model
+        # Preload new model
+        load_model(new_model)
+        return None
+    model_dropdown.change(
+        fn=on_model_change,
+        inputs=[model_dropdown],
+        outputs=[chatbot.chatbot],
+    )
+    gr.Markdown(
+        """
+        ---
+        **⚠️ 참고:**
+        - 모델은 로컬에서 실행됩니다 (첫 실행 시 다운로드)
+        - CPU에서 실행되므로 GPU보다 느립니다
+        - 각 모델은 특정 언어에 최적화되어 있습니다
+        **💾 디스크 사용량:**
+        - DialoGPT-small: ~350MB
+        - DialoGPT-medium: ~800MB
+        - GPT-2: ~500MB
+        - KoGPT-2: ~500MB
+        - KoAlpaca-5.8B: ~12GB (큰 모델, 메모리 8GB+ 필요)
+        **💡 팁:**
+        - 영어 대화는 DialoGPT 추천
+        - 한글 대화는 KoGPT-2 추천 (KoAlpaca는 리소스 충분할 때만)
+        - 짧은 문장으로 대화하면 더 나은 결과
+        - 모델이 한 번 로드되면 다시 다운로드하지 않습니다
+        """
+    )
+if __name__ == "__main__":
+    demo.launch()

requirements.txt ADDED Viewed

	@@ -0,0 +1,4 @@

+gradio>=5.0.0
+transformers>=4.30.0
+torch>=2.0.0
+python-dotenv>=1.0.0