Spaces:

alex4cip
/

simple-chat

Sleeping

App Files Files Community

simple-chat / CLAUDE.md

alex4cip

feat: Hugging Face LLM chatbot with multi-language support

c9ef1fe about 2 months ago

preview code

raw

history blame contribute delete

4.98 kB

A newer version of the Gradio SDK is available: 6.1.0

Upgrade

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

Multi-model LLM chatbot using Hugging Face Inference API and Gradio. Users can select from multiple pre-configured models and have conversations with them. Model changes automatically reset the conversation.

Tech Stack

Python: 3.10+
Framework: Gradio 5.x (ChatInterface + Blocks)
API: Hugging Face Serverless Inference API (free tier)
Deployment: Hugging Face Spaces (free CPU instance)

Project Structure

├── app.py              # Main application
├── requirements.txt    # Python dependencies
├── README.md          # Spaces configuration + documentation
├── .env               # HF_TOKEN (git ignored)
└── CLAUDE.md          # This file

Development Commands

Local Development

# Install dependencies
pip install -r requirements.txt

# Run locally (requires HF_TOKEN in .env)
python app.py

# Access at http://localhost:7860

Deployment to Hugging Face Spaces

Method 1: Web UI

Create Space at https://huggingface.co/spaces
Select Gradio SDK
Upload app.py, requirements.txt, README.md
Add HF_TOKEN to Settings → Repository secrets

Method 2: Git Push

git remote add space https://huggingface.co/spaces/<username>/<space-name>
git push space main

Architecture

Core Components

app.py Structure:

MODELS dict: Model configurations (ID, display name, parameters)
chat_response(): Main inference function handling multiple model types
on_model_change(): Clears chat when model selection changes
Gradio Blocks: UI composition with model dropdown + ChatInterface

Model Handling Patterns:

DialoGPT: Text continuation with conversation history formatting
BlenderBot: Conversational API with single-turn context
Flan-T5: Instruction-based text generation with prompt engineering
Zephyr: Chat completion API with message history formatting

State Management:

Global current_model tracks selected model
Model change triggers chat history reset via Gradio event handlers
Each model type uses appropriate API method from InferenceClient

API Integration

Hugging Face InferenceClient Usage:

client = InferenceClient(token=HF_TOKEN)

# Different methods for different model types
client.text_generation()      # DialoGPT, Flan-T5
client.conversational()        # BlenderBot
client.chat_completion()       # Zephyr (chat models)

Rate Limiting & Error Handling:

Free tier: ~100-300 requests/hour
Graceful degradation with user-friendly error messages
Timeout and rate limit detection in exception handling

Environment Setup

Required Environment Variable:

HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Obtaining HF_TOKEN:

Login to https://huggingface.co
Settings → Access Tokens
Create new token with "Read" permissions
Copy to .env file (local) or Space secrets (deployment)

Adding New Models

Add to MODELS dict in app.py:23-45:

"model-org/model-name": {
    "name": "Display Name",
    "max_length": 512,
    "temperature": 0.7,
}

Update chat_response() if model requires special handling:
- Check model name in conditional logic
- Use appropriate InferenceClient method
- Format prompt/messages according to model requirements
Verify free tier compatibility:
- Test model availability via Inference API
- Check rate limits and response times
- Update README.md model list

UI Customization

Changing Language:

All UI strings are in Korean by default
Modify markdown strings and button labels in app.py:140-220

Theme & Styling:

gr.Blocks(theme=gr.themes.Soft())  # Change theme here

Chat Examples:

Modify examples parameter in ChatInterface app.py:187-192

Common Issues

"Rate limit exceeded":

Free tier limitation, wait ~1 hour or upgrade to PRO ($9/month)

Model timeout/unavailable:

High demand on free tier, try different model or retry later

Space sleeping:

Spaces sleep after inactivity, first load may be slow

Testing Locally

# Ensure .env exists with HF_TOKEN
python app.py

# Test each model:
# 1. Select model from dropdown
# 2. Send test message
# 3. Verify response generation
# 4. Change model and verify chat resets

Deployment Notes

README.md YAML Header:

Required for Spaces configuration
Specifies SDK, Python version, app file
Auto-detected by Hugging Face

Environment Variables in Spaces:

Set via Settings → Repository secrets
Name must match exactly: HF_TOKEN
Never commit tokens to repository

Free Tier Constraints:

CPU only (no GPU)
Auto-sleep after inactivity
Rate limits on API calls
May experience slower inference