Cascade0.1 Preview1-55K

0.1previewsmall

An early release of Cascade0.1, thus it does NOT represent the final state of the model.

This is the third iteration of the Cascade0.1 Series- First it was in February which went horribly, and now in May. The third iteration from may utilizes the same custom pipeline used in the making of Cascade0, but heavily updated. As we speak, the fourth iteration is in training and its TBD if that will remain the final training script.

What changed vs Cascade0

Well the param config in C0.1 now prioritizes actual intelligence (and speed), both reaching 170M param size

Parameter Cascade 0.1 Cascade 0
Vocab Size 32,000 56,000
Hidden Size 896 768
Intermediate Size 2,384 2,248
Number of Layers 16 16
Attention Heads 14 14
KV Heads 7 -
Max Position Embeddings 1,512 1,512
Tie Word Embeddings True False
Use Cache False False

It fixes the biggest bug of the base Cascade0 which was the presence of instruct datasets in raw pretraining. Cascade0.1 utilizes only a pure text database, which is a far better base for future SFT. Another fix is the lowercase only bug from the tokenizer

As shown in the benchmarks below, this preview is very close to its 'father' (Cascade0-Base), despite having 355K less steps into training. (C0-Base had 400K steps in in 2 weeks. C0.1 took 4 hours and 55K steps) ( and its all pure raw text :ppppp )

cascade_comparison_red_blue

because this is a preview, i wont release any GGUFs for now

Downloads last month
27
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ARMZyany/Cascade0.1-Preview1-0505-Step55K

Quantizations
1 model

Dataset used to train ARMZyany/Cascade0.1-Preview1-0505-Step55K

Collection including ARMZyany/Cascade0.1-Preview1-0505-Step55K