FailLabs
Collection
WITH EXPLANATIONS - Total failures and dead-ends. Learn from my mistakes. • 5 items • Updated
YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Ooof, my man ain't feeling so hot, I'd pass on this one for now. Inverting and merging 20b Llama 2 models works quite well, evening out the gradients between slices. However, these 13b Mistrals seem to HATE it, I assume due to the unbalanced nature of my recipe. More study is required.
merge_method: dare_ties
base_model: athirdpath/BigMistral-13b
model: athirdpath/NeuralHermes-Mistral-13b
weight: 0.60 / density: 0.35
model: athirdpath/NeuralHermes-Mistral-13b-INV
weight: 0.40 / density: 0.30
int8_mask: true
dtype: bfloat16