Commit History

preliminary_mul_200k: replace with v200064 (vocab 200064, 128-aligned; IDs 0-199999 identical to prior 200000 build) + same special tokens + post-processor
6d07642
verified

cmeister commited on

preliminary_mul_200k: rename PII tokens to spreadsheet order <iban-pii>/<email-pii>/<ip-pii>
5390e17
verified

cmeister commited on

preliminary_mul_200k: rename PII tokens to spreadsheet order <iban-pii>/<email-pii>/<ip-pii>
1412e6f
verified

cmeister commited on

preliminary_mul: rename PII tokens to spreadsheet order <iban-pii>/<email-pii>/<ip-pii>
84e4a1b
verified

cmeister commited on

preliminary_mul: rename PII tokens to spreadsheet order <iban-pii>/<email-pii>/<ip-pii>
0f766ee
verified

cmeister commited on

preliminary_euh: rename PII tokens to spreadsheet order <iban-pii>/<email-pii>/<ip-pii>
ab1efc4
verified

cmeister commited on

preliminary_euh: rename PII tokens to spreadsheet order <iban-pii>/<email-pii>/<ip-pii>
2af867d
verified

cmeister commited on

preliminary_enh: rename PII tokens to spreadsheet order <iban-pii>/<email-pii>/<ip-pii>
6602562
verified

cmeister commited on

preliminary_enh: rename PII tokens to spreadsheet order <iban-pii>/<email-pii>/<ip-pii>
1b277fa
verified

cmeister commited on

preliminary_mul_200k: replace eusino_v2c with frde_kr120_gm130 (French at Apertus parity, German/Spanish below; Mandarin>Korean ordering kept)
05bebe2
verified

cmeister commited on

preliminary_mul_200k: PII special-token rename (tokenizer_config)
89faad7
verified

cmeister commited on

preliminary_mul_200k: rename reserve specials 24/25/26 to <pii-iban>/<pii-email>/<pii-ip>
40140ab
verified

cmeister commited on

preliminary_mul: PII special-token rename (tokenizer_config)
dcf6015
verified

cmeister commited on

preliminary_mul: rename reserve specials 24/25/26 to <pii-iban>/<pii-email>/<pii-ip>
41e96fe
verified

cmeister commited on

preliminary_euh: PII special-token rename (tokenizer_config)
bc3957a
verified

cmeister commited on

preliminary_euh: rename reserve specials 24/25/26 to <pii-iban>/<pii-email>/<pii-ip>
70d10a6
verified

cmeister commited on

preliminary_enh: PII special-token rename (tokenizer_config)
3ff3b43
verified

cmeister commited on

preliminary_enh: rename reserve specials 24/25/26 to <pii-iban>/<pii-email>/<pii-ip>
e910165
verified

cmeister commited on

preliminary_mul: replace with reparam variant (consv2_reparam, v131072, full vocab) + post-processor
0aace17
verified

cmeister commited on

preliminary_mul: add BOS/EOS post-processor (single <s> $A </s>, pair seg-B type_id 1) — now consistent with the other three
8f10870
verified

cmeister commited on

preliminary_euh: replace with frde2 (131k, Fr/De data-boosted, EU6 -0.2; Chinese the cost) + BOS/EOS post-processor
c516a34
verified

cmeister commited on

add preliminary_enh (engfull_eu3, 131k English-preserving) + preliminary_mul_200k (eusino_v2c, 200k), both with BOS/EOS post-processor
0d37289
verified

cmeister commited on

rename preliminary_enh -> preliminary_euh
437e7df
verified

cmeister commited on

preliminary_enh: pair post-processor type_id -> segment B = 1 (Apertus convention); single unchanged
96077fd
verified

cmeister commited on

preliminary_enh: replace with eudata6_gm90k (fixed plus2 repcap8 regex + BOS/EOS post-processor); vocab now exactly 131072
d0d283b
verified

cmeister commited on

Fix eos_token to </s> (id 2) in preliminary_enh + preliminary_mul configs (was wrongly <|assistant_end|>); special_tokens_map regenerated to match base Apertus-70B-2509
894d1ee
verified

cmeister commited on

Add tokenizer_config.json + special_tokens_map.json for preliminary_enh and preliminary_mul (eos=<|assistant_end|>, add_bos_token=false interim pending post-processor decision)
b4ea749
verified

cmeister commited on

Add preliminary_mul — consv2 baseline + eng5g, no tailcuts, vocab 131k sp124 (fixed regex)
c826d5b
verified

cmeister commited on

Replace preliminary_enh with A8 v131k sp124 (fixed regex + 124 specials at IDs 0-123)
b1c4f7a
verified

cmeister commited on

Add preliminary enhanced tokenizer (A8 repcap8 v131k sp120)
db7ff0b
verified

cmeister commited on

initial commit
a09d03f
verified

cmeister commited on