AIMan / README.md
PioTio's picture
Upload README.md with huggingface_hub
42bdd3b verified
---
title: Nanbeige2.5 Chat
emoji: 🦙
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: "3.50.0"
python_version: "3.10"
app_file: app.py
pinned: false
---
# Nanbeige2.5 — Gradio Chat Space
Lightweight Gradio chat UI for `PioTio/Nanbeige2.5` suitable for deployment on Hugging Face Spaces.
## Features
- Streaming and non-streaming generation ✅
- Tokenizer ↔ model sanity fixes (avoids SentencePiece `piece id` errors) ✅
- 4-bit BitsAndBytes load when GPU + `bitsandbytes` are available ✅
- Optional LoRA adapter application (requires `peft`) ✅
- Controls: temperature, top-p, top-k, max tokens, max-history ✅
**Quick CPU tip:** This Space **may be CPU-only**. Full `PioTio/Nanbeige2.5` on CPU is extremely slow — use the **Load fast CPU demo (distilgpt2)** button for quick responses, enable GPU in Space settings for production use, or check **Force CPU generation** to run Nanbeige on CPU (very slow and not recommended).
## Deployment (Hugging Face Spaces)
1. Create a new Space (Gradio runtime).
2. Upload these files (`app.py`, `requirements.txt`, `README.md`).
3. In Space settings choose **Hardware accelerator: GPU** (recommended).
After pushing these files the Space will build automatically — open the Space page and monitor logs for errors. If you prefer, you can create the `app.py` directly in the web UI instead of pushing from Git.
**Tip:** keep `bitsandbytes` in `requirements.txt` if you plan to enable 4-bit loading on GPU; remove or pin it if the build log shows dependency issues.
- If you see `piece id is out of range` errors the app will attempt to auto-fix tokenizer/model alignment.
- To apply a LoRA adapter after starting the app, paste the adapter HF repo in the LoRA field and click **Apply LoRA adapter**.
## Recommended hardware
- GPU (T4 / A10 / A100) for real-time streaming; CPU-only may be slow for inference.
## Troubleshooting
- If model load fails on Spaces, check the logs for memory OOM; switch to GPU or enable `bitsandbytes` 4-bit.
- For adapter load failures, ensure adapter repo exists and `peft` is present in `requirements.txt`.
---