Cascade0.1 Preview2-112K

An early release of Cascade0.1, thus it does NOT represent the final state of the model.

This is the fourth iteration of the Cascade0.1 Series- First it was in February which went horribly, and now in May. It shows promising results but still needs more trainingl; but since the results are nice, ive decided to put this checkpoint too

What changed vs Cascade0

Well the param config in C0.1 now prioritizes actual intelligence (and speed), both reaching 170M param size

Parameter	Cascade 0.1	Cascade 0
Vocab Size	32,000	56,000
Hidden Size	896	768
Intermediate Size	2,384	2,248
Number of Layers	16	16
Attention Heads	14	14
KV Heads	7	-
Max Position Embeddings	1,512	1,512
Tie Word Embeddings	True	False
Use Cache	False	False

It fixes the biggest bug of the base Cascade0 which was the presence of instruct datasets in raw pretraining. Cascade0.1 utilizes only a pure text database, which is a far better base for future SFT. Another fix is the lowercase only bug from the tokenizer

As shown in the benchmarks below, this preview is very close to its 'father' (Cascade0-Base), despite having less steps into training. (C0-Base had 400K steps in in 2 weeks. C0.1 took 4 hours and 55K steps) ( and its all pure raw text :ppppp ) Pre2 took 7-8 hours and 112k steps to beat both C0-Base and Pre1.

Downloads last month: 12

Safetensors

Model size

0.2B params

Tensor type

F32

Dataset used to train ARMZyany/Cascade0.1-170M-Preview2-0805-Step112K

Collection including ARMZyany/Cascade0.1-170M-Preview2-0805-Step112K

Cascade0.1-Preview Series

Collection

2 items • Updated 18 days ago