Maxluria Claude Opus 4.8 commited on
Commit
c756cfd
·
1 Parent(s): 8027c32

Swap in-app model to openbmb/MiniCPM4.1-8B (OpenBMB sponsor prize)

Browse files

MiniCPM4.1-8B is <=32B and a hybrid reasoning model, so it honors
apply_chat_template(enable_thinking=False) for snappy, in-character replies.
Pipeline unchanged (trust_remote_code, bfloat16, streamed, @spaces.GPU);
Qwen3-8B kept documented as the easy swap-back.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Files changed (1) hide show
  1. app.py +7 -5
app.py CHANGED
@@ -24,16 +24,18 @@ from shards import SHARD_ORDER, SHARDS
24
 
25
  # ---------------------------------------------------------------------------- #
26
  # Model — keep the id in one easy-to-swap constant.
27
- # Primary: Qwen/Qwen3-8B (<=32B, strong dialogue, ZeroGPU-friendly, LoRA-ready)
28
- # Step up: Qwen/Qwen3-14B (if replies feel thin and compute allows)
29
- # Sponsor: openbmb/MiniCPM4.1-8B (qualifies for the OpenBMB special prize)
 
 
30
  # ---------------------------------------------------------------------------- #
31
- MODEL_ID = "Qwen/Qwen3-8B"
32
 
33
  MODEL = None
34
  TOKENIZER = None
35
 
36
- # Qwen3 non-thinking sampling settings (snappy, in-character replies).
37
  GEN_KWARGS = dict(
38
  max_new_tokens=512,
39
  do_sample=True,
 
24
 
25
  # ---------------------------------------------------------------------------- #
26
  # Model — keep the id in one easy-to-swap constant.
27
+ # Active: openbmb/MiniCPM4.1-8B (<=32B; qualifies for the OpenBMB sponsor prize)
28
+ # Swap-in: Qwen/Qwen3-8B (strong dialogue, ZeroGPU-friendly, LoRA-ready)
29
+ # Step up: Qwen/Qwen3-14B (if replies feel thin and compute allows)
30
+ # Both MiniCPM4.1 and Qwen3 are hybrid reasoning models that honor
31
+ # apply_chat_template(enable_thinking=False) for snappy, non-thinking replies.
32
  # ---------------------------------------------------------------------------- #
33
+ MODEL_ID = "openbmb/MiniCPM4.1-8B"
34
 
35
  MODEL = None
36
  TOKENIZER = None
37
 
38
+ # Non-thinking sampling settings (snappy, in-character replies).
39
  GEN_KWARGS = dict(
40
  max_new_tokens=512,
41
  do_sample=True,