Raymond-dev-546730 commited on
Commit
378cea0
·
verified ·
1 Parent(s): 67ab1b1

Update Training/Training_Documentation.txt

Browse files
Training/Training_Documentation.txt CHANGED
@@ -13,9 +13,12 @@ Training Dataset: Custom curated dataset for medical reasoning
13
  Dataset Specifications
14
  ---------------------
15
 
16
- Total Token Count: 38,514,400
17
  Total Sample Count: 29,500
18
- Average Tokens/Sample: 1305.57
 
 
 
19
  Dataset Creation: Created from a combination of public medical reasoning datasets from OpenAI o1 and DeepSeek-R1, along with additional reasoning chains created using Claude Sonnet 4 extended thinking
20
 
21
  Training Configuration
 
13
  Dataset Specifications
14
  ---------------------
15
 
16
+ Total Token Count: 31,929,580
17
  Total Sample Count: 29,500
18
+ Average Tokens/Sample: 1082.36
19
+ Max Token Count: 9,803
20
+ Min Token Count: 237
21
+ Tokens Counted Using: tiktoken (cl100k_base encoding)
22
  Dataset Creation: Created from a combination of public medical reasoning datasets from OpenAI o1 and DeepSeek-R1, along with additional reasoning chains created using Claude Sonnet 4 extended thinking
23
 
24
  Training Configuration