KoalaAI
/

Bamboo-400M

Text Generation

text-generation-inference

Model card Files Files and versions

DarwinAnim8or commited on Jul 28, 2024

Commit

6defa90

·

verified ·

1 Parent(s): 19e72e8

Update test results

Files changed (1) hide show

README.md +27 -2

README.md CHANGED Viewed

@@ -18,8 +18,33 @@ As mentioned, a few updates are planned:
 * Fine-tuning the resulting model for instruct, code and storywriting. These will then be combined using MergeKit to create a MoE model.
 * Release a GGUF version and an extended context version of the base model
-## Test Results
-TBD
 # Tokenizer
 Our tokenizer was trained from scratch on 500,000 samples from the Openwebtext dataset. Like Mistral, we use the LlamaTokenizerFast as our tokenizer class; in legacy mode.

 * Fine-tuning the resulting model for instruct, code and storywriting. These will then be combined using MergeKit to create a MoE model.
 * Release a GGUF version and an extended context version of the base model
+## Model Performance Tracking
+This table tracks the performance of our model on various tasks over time.
+| Date (YYYY-MM-DD) | Metric   | arc_easy      | hellaswag     | sglue_rte     | truthfulqa    | Avg |
+|-------------------|----------|---------------|---------------|---------------|---------------| ---- |
+| 2024-07-27        | acc      | 27.40% ± 0.92% | 25.52% ± 0.44% | 52.71% ± 3.01% | 39.52% ± 1.11% | 36.29% |
+|                   | acc_norm | 27.95% ± 0.92% | 25.03% ± 0.43% | -             | -             | - |
+### Legend
+- Date: The date of each evaluation run
+- Metric: The evaluation metric used (acc = accuracy, acc_norm = normalized accuracy)
+- Task columns: Results for each task in the format "Percentage ± Standard Error"
+### Notes
+- All accuracy values are presented as percentages
+- Empty cells indicate that the task was not evaluated on that date or for that metric
+- Standard errors are also converted to percentages for consistency
+### Legend
+- Task: The name of the evaluation task
+- Metric: The evaluation metric used (acc = accuracy, acc_norm = normalized accuracy)
+- Date columns: The date of each evaluation run, with results in the format "Value ± Standard Error"
+### Notes
+- All accuracy values are on a scale from 0 to 1
+- Empty cells indicate that the task was not evaluated on that date
 # Tokenizer
 Our tokenizer was trained from scratch on 500,000 samples from the Openwebtext dataset. Like Mistral, we use the LlamaTokenizerFast as our tokenizer class; in legacy mode.