srinivasbilla
/

tinymix-8x1b

Text Generation

text-generation-inference

Model card Files Files and versions

eastwind commited on Jan 2, 2024

Commit

038bf33

·

1 Parent(s): f4f8c0c

Create README.md

Files changed (1) hide show

README.md +59 -0

README.md ADDED Viewed

	@@ -0,0 +1,59 @@

+---
+license: apache-2.0
+language:
+- en
+---
+<div align="center">
+# TinyMix-8x1b
+</div>
+This is a MoE-ification of [TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T](https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T) using the [Mixtral branch of mergekit](https://github.com/cg123/mergekit)
+The Goal was to MoE-fy the TinyLlama model and then use this as a base model to further train from. The intuition being finetuning 8x1b should give better performance than finetuning 1b by itself.
+More work coming!
+# Inference Template
+This is a merge of the base model, so treat it like a completion.
+```
+llm.generate('Quantum Tunneling is')
+```
+## Mergekit Config
+```
+base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
+gate_mode: hidden
+dtype: bfloat16
+experts:
+  - source_model: /TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
+    positive_prompts: [""]
+  - source_model: /TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
+    positive_prompts: [""]
+  - source_model: /TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
+    positive_prompts: [""]
+  - source_model: /TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
+    positive_prompts: [""]
+  - source_model: /TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
+    positive_prompts: [""]
+  - source_model: /TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
+    positive_prompts: [""]
+  - source_model: /TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
+    positive_prompts: [""]
+  - source_model: /TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
+    positive_prompts: [""]
+```
+# Eval
+Thanks to u/mhenrichsen for thr HellaSwag score
+```
+|  Tasks  |Version|Filter|n-shot| Metric |Value |   |Stderr|
+|---------|-------|------|-----:|--------|-----:|---|-----:|
+|hellaswag|Yaml   |none  |     0|acc     |0.4659|±  |0.0050|
+|         |       |none  |     0|acc\_norm|0.6044|±  |0.0049|
+```