Spaces:
Running on Zero
Running on Zero
Swap in-app model to openbmb/MiniCPM4.1-8B (OpenBMB sponsor prize)
Browse filesMiniCPM4.1-8B is <=32B and a hybrid reasoning model, so it honors
apply_chat_template(enable_thinking=False) for snappy, in-character replies.
Pipeline unchanged (trust_remote_code, bfloat16, streamed, @spaces.GPU);
Qwen3-8B kept documented as the easy swap-back.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
app.py
CHANGED
|
@@ -24,16 +24,18 @@ from shards import SHARD_ORDER, SHARDS
|
|
| 24 |
|
| 25 |
# ---------------------------------------------------------------------------- #
|
| 26 |
# Model — keep the id in one easy-to-swap constant.
|
| 27 |
-
#
|
| 28 |
-
#
|
| 29 |
-
#
|
|
|
|
|
|
|
| 30 |
# ---------------------------------------------------------------------------- #
|
| 31 |
-
MODEL_ID = "
|
| 32 |
|
| 33 |
MODEL = None
|
| 34 |
TOKENIZER = None
|
| 35 |
|
| 36 |
-
#
|
| 37 |
GEN_KWARGS = dict(
|
| 38 |
max_new_tokens=512,
|
| 39 |
do_sample=True,
|
|
|
|
| 24 |
|
| 25 |
# ---------------------------------------------------------------------------- #
|
| 26 |
# Model — keep the id in one easy-to-swap constant.
|
| 27 |
+
# Active: openbmb/MiniCPM4.1-8B (<=32B; qualifies for the OpenBMB sponsor prize)
|
| 28 |
+
# Swap-in: Qwen/Qwen3-8B (strong dialogue, ZeroGPU-friendly, LoRA-ready)
|
| 29 |
+
# Step up: Qwen/Qwen3-14B (if replies feel thin and compute allows)
|
| 30 |
+
# Both MiniCPM4.1 and Qwen3 are hybrid reasoning models that honor
|
| 31 |
+
# apply_chat_template(enable_thinking=False) for snappy, non-thinking replies.
|
| 32 |
# ---------------------------------------------------------------------------- #
|
| 33 |
+
MODEL_ID = "openbmb/MiniCPM4.1-8B"
|
| 34 |
|
| 35 |
MODEL = None
|
| 36 |
TOKENIZER = None
|
| 37 |
|
| 38 |
+
# Non-thinking sampling settings (snappy, in-character replies).
|
| 39 |
GEN_KWARGS = dict(
|
| 40 |
max_new_tokens=512,
|
| 41 |
do_sample=True,
|