KoalaAI
/

Bamboo-400M

Text Generation

text-generation-inference

Model card Files Files and versions

DarwinAnim8or commited on Jul 29, 2024

Commit

de4059b

·

verified ·

1 Parent(s): 754751b

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -5,7 +5,7 @@ datasets:
 ---
 # Bamboo 400M
-This is a WIP model trained only on public domain (CC0) datasets, primarily in the English language.
 Further training is planned & ongoing, but currently no multi-language datasets are in use or planned; though this may change in the future and the current datasets *can* contain languages other than English.
 ## License
@@ -14,7 +14,7 @@ Though the training data of this model is CC0, the model itself is not. The mode
 ## Planned updates
 As mentioned, a few updates are planned:
 * Further training on more CC0 data, this model's weights will be updated as we pretrain on more of the listed datasets.
-* Experiment with exteding the context length using YaRN to 32k tokens.
 * Fine-tuning the resulting model for instruct, code and storywriting. These will then be combined using MergeKit to create a MoE model.
 * Release a GGUF version and an extended context version of the base model
@@ -27,7 +27,7 @@ This table tracks the performance of our model on various tasks over time.
 | 2024-07-27        | acc      | 27.40% ± 0.92% | 25.52% ± 0.44% | 52.71% ± 3.01% | 39.52% ± 1.11% | 36.29% |
 ## Legend
-- Date: The date of each evaluation run
 - Metric: The evaluation metric used (acc = accuracy)
 - Task columns: Results for each task in the format "Percentage ± Standard Error"

 ---
 # Bamboo 400M
+This is a WIP foundational (aka base) model trained only on public domain (CC0) datasets, primarily in the English language.
 Further training is planned & ongoing, but currently no multi-language datasets are in use or planned; though this may change in the future and the current datasets *can* contain languages other than English.
 ## License
 ## Planned updates
 As mentioned, a few updates are planned:
 * Further training on more CC0 data, this model's weights will be updated as we pretrain on more of the listed datasets.
+* Experiment with extending the context length using YaRN to 32k tokens.
 * Fine-tuning the resulting model for instruct, code and storywriting. These will then be combined using MergeKit to create a MoE model.
 * Release a GGUF version and an extended context version of the base model
 | 2024-07-27        | acc      | 27.40% ± 0.92% | 25.52% ± 0.44% | 52.71% ± 3.01% | 39.52% ± 1.11% | 36.29% |
 ## Legend
+- Date: The date of the model that the evaluation was run on. Pretraining is ongoing and tests are re-run with that date's model.
 - Metric: The evaluation metric used (acc = accuracy)
 - Task columns: Results for each task in the format "Percentage ± Standard Error"