Spaces:

PioTio
/

AIMan

Running

App Files Files Community

AIMan / README.md

PioTio

Upload README.md with huggingface_hub

42bdd3b verified 11 days ago

preview code

raw

history blame contribute delete

2.12 kB

A newer version of the Gradio SDK is available: 6.7.0

Upgrade

metadata

title: Nanbeige2.5 — Chat
emoji: 🦙
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 3.50.0
python_version: '3.10'
app_file: app.py
pinned: false

Nanbeige2.5 — Gradio Chat Space

Lightweight Gradio chat UI for PioTio/Nanbeige2.5 suitable for deployment on Hugging Face Spaces.

Features

Streaming and non-streaming generation ✅
Tokenizer ↔ model sanity fixes (avoids SentencePiece piece id errors) ✅
4-bit BitsAndBytes load when GPU + bitsandbytes are available ✅
Optional LoRA adapter application (requires peft) ✅
Controls: temperature, top-p, top-k, max tokens, max-history ✅

Quick CPU tip: This Space may be CPU-only. Full PioTio/Nanbeige2.5 on CPU is extremely slow — use the Load fast CPU demo (distilgpt2) button for quick responses, enable GPU in Space settings for production use, or check Force CPU generation to run Nanbeige on CPU (very slow and not recommended).

Deployment (Hugging Face Spaces)

Create a new Space (Gradio runtime).
Upload these files (app.py, requirements.txt, README.md).
In Space settings choose Hardware accelerator: GPU (recommended).

After pushing these files the Space will build automatically — open the Space page and monitor logs for errors. If you prefer, you can create the app.py directly in the web UI instead of pushing from Git.

Tip: keep bitsandbytes in requirements.txt if you plan to enable 4-bit loading on GPU; remove or pin it if the build log shows dependency issues.

If you see piece id is out of range errors the app will attempt to auto-fix tokenizer/model alignment.
To apply a LoRA adapter after starting the app, paste the adapter HF repo in the LoRA field and click Apply LoRA adapter.

Recommended hardware

GPU (T4 / A10 / A100) for real-time streaming; CPU-only may be slow for inference.

Troubleshooting

If model load fails on Spaces, check the logs for memory OOM; switch to GPU or enable bitsandbytes 4-bit.
For adapter load failures, ensure adapter repo exists and peft is present in requirements.txt.