Text Generation
Transformers
Safetensors
mistral
text-generation-inference
DarwinAnim8or commited on
Commit
6defa90
·
verified ·
1 Parent(s): 19e72e8

Update test results

Browse files
Files changed (1) hide show
  1. README.md +27 -2
README.md CHANGED
@@ -18,8 +18,33 @@ As mentioned, a few updates are planned:
18
  * Fine-tuning the resulting model for instruct, code and storywriting. These will then be combined using MergeKit to create a MoE model.
19
  * Release a GGUF version and an extended context version of the base model
20
 
21
- ## Test Results
22
- TBD
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23
 
24
  # Tokenizer
25
  Our tokenizer was trained from scratch on 500,000 samples from the Openwebtext dataset. Like Mistral, we use the LlamaTokenizerFast as our tokenizer class; in legacy mode.
 
18
  * Fine-tuning the resulting model for instruct, code and storywriting. These will then be combined using MergeKit to create a MoE model.
19
  * Release a GGUF version and an extended context version of the base model
20
 
21
+ ## Model Performance Tracking
22
+
23
+ This table tracks the performance of our model on various tasks over time.
24
+
25
+ | Date (YYYY-MM-DD) | Metric | arc_easy | hellaswag | sglue_rte | truthfulqa | Avg |
26
+ |-------------------|----------|---------------|---------------|---------------|---------------| ---- |
27
+ | 2024-07-27 | acc | 27.40% ± 0.92% | 25.52% ± 0.44% | 52.71% ± 3.01% | 39.52% ± 1.11% | 36.29% |
28
+ | | acc_norm | 27.95% ± 0.92% | 25.03% ± 0.43% | - | - | - |
29
+
30
+ ### Legend
31
+ - Date: The date of each evaluation run
32
+ - Metric: The evaluation metric used (acc = accuracy, acc_norm = normalized accuracy)
33
+ - Task columns: Results for each task in the format "Percentage ± Standard Error"
34
+
35
+ ### Notes
36
+ - All accuracy values are presented as percentages
37
+ - Empty cells indicate that the task was not evaluated on that date or for that metric
38
+ - Standard errors are also converted to percentages for consistency
39
+
40
+ ### Legend
41
+ - Task: The name of the evaluation task
42
+ - Metric: The evaluation metric used (acc = accuracy, acc_norm = normalized accuracy)
43
+ - Date columns: The date of each evaluation run, with results in the format "Value ± Standard Error"
44
+
45
+ ### Notes
46
+ - All accuracy values are on a scale from 0 to 1
47
+ - Empty cells indicate that the task was not evaluated on that date
48
 
49
  # Tokenizer
50
  Our tokenizer was trained from scratch on 500,000 samples from the Openwebtext dataset. Like Mistral, we use the LlamaTokenizerFast as our tokenizer class; in legacy mode.