Update README.md
Browse files
README.md
CHANGED
|
@@ -5,7 +5,7 @@ datasets:
|
|
| 5 |
---
|
| 6 |
|
| 7 |
# Bamboo 400M
|
| 8 |
-
This is a WIP model trained only on public domain (CC0) datasets, primarily in the English language.
|
| 9 |
Further training is planned & ongoing, but currently no multi-language datasets are in use or planned; though this may change in the future and the current datasets *can* contain languages other than English.
|
| 10 |
|
| 11 |
## License
|
|
@@ -14,7 +14,7 @@ Though the training data of this model is CC0, the model itself is not. The mode
|
|
| 14 |
## Planned updates
|
| 15 |
As mentioned, a few updates are planned:
|
| 16 |
* Further training on more CC0 data, this model's weights will be updated as we pretrain on more of the listed datasets.
|
| 17 |
-
* Experiment with
|
| 18 |
* Fine-tuning the resulting model for instruct, code and storywriting. These will then be combined using MergeKit to create a MoE model.
|
| 19 |
* Release a GGUF version and an extended context version of the base model
|
| 20 |
|
|
@@ -27,7 +27,7 @@ This table tracks the performance of our model on various tasks over time.
|
|
| 27 |
| 2024-07-27 | acc | 27.40% ± 0.92% | 25.52% ± 0.44% | 52.71% ± 3.01% | 39.52% ± 1.11% | 36.29% |
|
| 28 |
|
| 29 |
## Legend
|
| 30 |
-
- Date: The date of
|
| 31 |
- Metric: The evaluation metric used (acc = accuracy)
|
| 32 |
- Task columns: Results for each task in the format "Percentage ± Standard Error"
|
| 33 |
|
|
|
|
| 5 |
---
|
| 6 |
|
| 7 |
# Bamboo 400M
|
| 8 |
+
This is a WIP foundational (aka base) model trained only on public domain (CC0) datasets, primarily in the English language.
|
| 9 |
Further training is planned & ongoing, but currently no multi-language datasets are in use or planned; though this may change in the future and the current datasets *can* contain languages other than English.
|
| 10 |
|
| 11 |
## License
|
|
|
|
| 14 |
## Planned updates
|
| 15 |
As mentioned, a few updates are planned:
|
| 16 |
* Further training on more CC0 data, this model's weights will be updated as we pretrain on more of the listed datasets.
|
| 17 |
+
* Experiment with extending the context length using YaRN to 32k tokens.
|
| 18 |
* Fine-tuning the resulting model for instruct, code and storywriting. These will then be combined using MergeKit to create a MoE model.
|
| 19 |
* Release a GGUF version and an extended context version of the base model
|
| 20 |
|
|
|
|
| 27 |
| 2024-07-27 | acc | 27.40% ± 0.92% | 25.52% ± 0.44% | 52.71% ± 3.01% | 39.52% ± 1.11% | 36.29% |
|
| 28 |
|
| 29 |
## Legend
|
| 30 |
+
- Date: The date of the model that the evaluation was run on. Pretraining is ongoing and tests are re-run with that date's model.
|
| 31 |
- Metric: The evaluation metric used (acc = accuracy)
|
| 32 |
- Task columns: Results for each task in the format "Percentage ± Standard Error"
|
| 33 |
|