simple-chat / CLAUDE.md
alex4cip's picture
feat: Hugging Face LLM chatbot with multi-language support
c9ef1fe

A newer version of the Gradio SDK is available: 6.1.0

Upgrade

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

Multi-model LLM chatbot using Hugging Face Inference API and Gradio. Users can select from multiple pre-configured models and have conversations with them. Model changes automatically reset the conversation.

Tech Stack

  • Python: 3.10+
  • Framework: Gradio 5.x (ChatInterface + Blocks)
  • API: Hugging Face Serverless Inference API (free tier)
  • Deployment: Hugging Face Spaces (free CPU instance)

Project Structure

β”œβ”€β”€ app.py              # Main application
β”œβ”€β”€ requirements.txt    # Python dependencies
β”œβ”€β”€ README.md          # Spaces configuration + documentation
β”œβ”€β”€ .env               # HF_TOKEN (git ignored)
└── CLAUDE.md          # This file

Development Commands

Local Development

# Install dependencies
pip install -r requirements.txt

# Run locally (requires HF_TOKEN in .env)
python app.py

# Access at http://localhost:7860

Deployment to Hugging Face Spaces

Method 1: Web UI

  1. Create Space at https://huggingface.co/spaces
  2. Select Gradio SDK
  3. Upload app.py, requirements.txt, README.md
  4. Add HF_TOKEN to Settings β†’ Repository secrets

Method 2: Git Push

git remote add space https://huggingface.co/spaces/<username>/<space-name>
git push space main

Architecture

Core Components

app.py Structure:

  • MODELS dict: Model configurations (ID, display name, parameters)
  • chat_response(): Main inference function handling multiple model types
  • on_model_change(): Clears chat when model selection changes
  • Gradio Blocks: UI composition with model dropdown + ChatInterface

Model Handling Patterns:

  • DialoGPT: Text continuation with conversation history formatting
  • BlenderBot: Conversational API with single-turn context
  • Flan-T5: Instruction-based text generation with prompt engineering
  • Zephyr: Chat completion API with message history formatting

State Management:

  • Global current_model tracks selected model
  • Model change triggers chat history reset via Gradio event handlers
  • Each model type uses appropriate API method from InferenceClient

API Integration

Hugging Face InferenceClient Usage:

client = InferenceClient(token=HF_TOKEN)

# Different methods for different model types
client.text_generation()      # DialoGPT, Flan-T5
client.conversational()        # BlenderBot
client.chat_completion()       # Zephyr (chat models)

Rate Limiting & Error Handling:

  • Free tier: ~100-300 requests/hour
  • Graceful degradation with user-friendly error messages
  • Timeout and rate limit detection in exception handling

Environment Setup

Required Environment Variable:

HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Obtaining HF_TOKEN:

  1. Login to https://huggingface.co
  2. Settings β†’ Access Tokens
  3. Create new token with "Read" permissions
  4. Copy to .env file (local) or Space secrets (deployment)

Adding New Models

  1. Add to MODELS dict in app.py:23-45:
"model-org/model-name": {
    "name": "Display Name",
    "max_length": 512,
    "temperature": 0.7,
}
  1. Update chat_response() if model requires special handling:

    • Check model name in conditional logic
    • Use appropriate InferenceClient method
    • Format prompt/messages according to model requirements
  2. Verify free tier compatibility:

    • Test model availability via Inference API
    • Check rate limits and response times
    • Update README.md model list

UI Customization

Changing Language:

  • All UI strings are in Korean by default
  • Modify markdown strings and button labels in app.py:140-220

Theme & Styling:

gr.Blocks(theme=gr.themes.Soft())  # Change theme here

Chat Examples:

Common Issues

"Rate limit exceeded":

  • Free tier limitation, wait ~1 hour or upgrade to PRO ($9/month)

Model timeout/unavailable:

  • High demand on free tier, try different model or retry later

Space sleeping:

  • Spaces sleep after inactivity, first load may be slow

Testing Locally

# Ensure .env exists with HF_TOKEN
python app.py

# Test each model:
# 1. Select model from dropdown
# 2. Send test message
# 3. Verify response generation
# 4. Change model and verify chat resets

Deployment Notes

README.md YAML Header:

  • Required for Spaces configuration
  • Specifies SDK, Python version, app file
  • Auto-detected by Hugging Face

Environment Variables in Spaces:

  • Set via Settings β†’ Repository secrets
  • Name must match exactly: HF_TOKEN
  • Never commit tokens to repository

Free Tier Constraints:

  • CPU only (no GPU)
  • Auto-sleep after inactivity
  • Rate limits on API calls
  • May experience slower inference