tspersian commited on
Commit
bb4f97c
·
1 Parent(s): be6798e
Files changed (1) hide show
  1. README.md +14 -7
README.md CHANGED
@@ -40,6 +40,13 @@ Batch encode:
40
  tokenizer.batch_encode(["یک متن طولانی"])
41
  ```
42
 
 
 
 
 
 
 
 
43
  ## Special Tokens
44
 
45
  - **user Token:** `<|user|>`
@@ -52,16 +59,16 @@ tokenizer.batch_encode(["یک متن طولانی"])
52
  - **Model Type:** BPE
53
  - **Vocabulary Size:** 265,703
54
  - **Character Coverage:** 99.9%
55
- - **Total Number of Text Samples: 1,147,036
56
- - **Total Number of Tokens: 1,490,338
57
- - **Average Token Length: 4.51
58
- - **Corpus Size (in bytes): 1,792,210,410
59
 
60
  ## Training Details
61
 
62
- - **Training Data: Mana Persian corpus
63
- - **Training Script: Mana Trainer
64
- - **Script Version: 1.2
65
 
66
  ## License
67
 
 
40
  tokenizer.batch_encode(["یک متن طولانی"])
41
  ```
42
 
43
+ ## Benchmark
44
+
45
+ - **Current Date and Time:** 2024-11-06 16:12:50
46
+ - **Mana Batch Encode Time:** 0.10711932182312012 seconds
47
+ - **Mana Batch Encode Memory Usage:** 13.203125 KB
48
+ - **Total characters in large_texts:** 131000
49
+
50
  ## Special Tokens
51
 
52
  - **user Token:** `<|user|>`
 
59
  - **Model Type:** BPE
60
  - **Vocabulary Size:** 265,703
61
  - **Character Coverage:** 99.9%
62
+ - **Total Number of Text Samples:** 1,147,036
63
+ - **Total Number of Tokens:** 1,490,338
64
+ - **Average Token Length:** 4.51
65
+ - **Corpus Size (in bytes):** 1,792,210,410
66
 
67
  ## Training Details
68
 
69
+ - **Training Data:** Mana Persian corpus
70
+ - **Training Script:** Mana Trainer
71
+ - **Script Version:** 1.2
72
 
73
  ## License
74