Spaces:

build-small-hackathon
/

Cosmere_Codex

Running on Zero

Maxluria Claude Opus 4.8 commited on 15 days ago

Commit

c756cfd

1 Parent(s): 8027c32

Swap in-app model to openbmb/MiniCPM4.1-8B (OpenBMB sponsor prize)

MiniCPM4.1-8B is <=32B and a hybrid reasoning model, so it honors
apply_chat_template(enable_thinking=False) for snappy, in-character replies.
Pipeline unchanged (trust_remote_code, bfloat16, streamed, @spaces.GPU);
Qwen3-8B kept documented as the easy swap-back.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Files changed (1) hide show

app.py +7 -5

app.py CHANGED Viewed

@@ -24,16 +24,18 @@ from shards import SHARD_ORDER, SHARDS
 # ---------------------------------------------------------------------------- #
 # Model — keep the id in one easy-to-swap constant.
-#   Primary:  Qwen/Qwen3-8B  (<=32B, strong dialogue, ZeroGPU-friendly, LoRA-ready)
-#   Step up:  Qwen/Qwen3-14B  (if replies feel thin and compute allows)
-#   Sponsor:  openbmb/MiniCPM4.1-8B  (qualifies for the OpenBMB special prize)
 # ---------------------------------------------------------------------------- #
-MODEL_ID = "Qwen/Qwen3-8B"
 MODEL = None
 TOKENIZER = None
-# Qwen3 non-thinking sampling settings (snappy, in-character replies).
 GEN_KWARGS = dict(
     max_new_tokens=512,
     do_sample=True,

 # ---------------------------------------------------------------------------- #
 # Model — keep the id in one easy-to-swap constant.
+#   Active:   openbmb/MiniCPM4.1-8B  (<=32B; qualifies for the OpenBMB sponsor prize)
+#   Swap-in:  Qwen/Qwen3-8B          (strong dialogue, ZeroGPU-friendly, LoRA-ready)
+#   Step up:  Qwen/Qwen3-14B         (if replies feel thin and compute allows)
+# Both MiniCPM4.1 and Qwen3 are hybrid reasoning models that honor
+# apply_chat_template(enable_thinking=False) for snappy, non-thinking replies.
 # ---------------------------------------------------------------------------- #
+MODEL_ID = "openbmb/MiniCPM4.1-8B"
 MODEL = None
 TOKENIZER = None
+# Non-thinking sampling settings (snappy, in-character replies).
 GEN_KWARGS = dict(
     max_new_tokens=512,
     do_sample=True,