Spaces:

PioTio
/

AIMan

Running

App Files Files Community

AIMan / README.md

PioTio

Upload README.md with huggingface_hub

42bdd3b verified 12 days ago

preview code

raw

history blame contribute delete

2.12 kB

	---
	title: Nanbeige2.5 — Chat
	emoji: 🦙
	colorFrom: indigo
	colorTo: purple
	sdk: gradio
	sdk_version: "3.50.0"
	python_version: "3.10"
	app_file: app.py
	pinned: false
	---

	# Nanbeige2.5 — Gradio Chat Space

	Lightweight Gradio chat UI for `PioTio/Nanbeige2.5` suitable for deployment on Hugging Face Spaces.

	## Features
	- Streaming and non-streaming generation ✅
	- Tokenizer ↔ model sanity fixes (avoids SentencePiece `piece id` errors) ✅
	- 4-bit BitsAndBytes load when GPU + `bitsandbytes` are available ✅
	- Optional LoRA adapter application (requires `peft`) ✅
	- Controls: temperature, top-p, top-k, max tokens, max-history ✅

	Quick CPU tip: This Space may be CPU-only. Full `PioTio/Nanbeige2.5` on CPU is extremely slow — use the Load fast CPU demo (distilgpt2) button for quick responses, enable GPU in Space settings for production use, or check Force CPU generation to run Nanbeige on CPU (very slow and not recommended).

	## Deployment (Hugging Face Spaces)
	1. Create a new Space (Gradio runtime).
	2. Upload these files (`app.py`, `requirements.txt`, `README.md`).
	3. In Space settings choose Hardware accelerator: GPU (recommended).

	After pushing these files the Space will build automatically — open the Space page and monitor logs for errors. If you prefer, you can create the `app.py` directly in the web UI instead of pushing from Git.

	Tip: keep `bitsandbytes` in `requirements.txt` if you plan to enable 4-bit loading on GPU; remove or pin it if the build log shows dependency issues.
	- If you see `piece id is out of range` errors the app will attempt to auto-fix tokenizer/model alignment.
	- To apply a LoRA adapter after starting the app, paste the adapter HF repo in the LoRA field and click Apply LoRA adapter.

	## Recommended hardware
	- GPU (T4 / A10 / A100) for real-time streaming; CPU-only may be slow for inference.

	## Troubleshooting
	- If model load fails on Spaces, check the logs for memory OOM; switch to GPU or enable `bitsandbytes` 4-bit.
	- For adapter load failures, ensure adapter repo exists and `peft` is present in `requirements.txt`.

	---