jackJessada commited on
Commit
6f82f66
·
verified ·
1 Parent(s): 9ff0e8b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -2
README.md CHANGED
@@ -35,7 +35,7 @@ The training corpus consists of the following datasets:
35
  |----------|-------------|
36
  | Business & Finance | 736,071,807 |
37
  | News | 1,700,662,378 |
38
- | Education | 554,889,778 |
39
  | Social | 211,000,000 |
40
  | Government | 40,492,117 |
41
  | Medical | 42,987,587 |
@@ -44,7 +44,6 @@ The training corpus consists of the following datasets:
44
  | Research Articles | 4,185,649,758 |
45
  | Law | 467,994,847 |
46
  | Travel | 6,948,290 |
47
- | Buddhism | 21,600,000 |
48
  | Others | 4,410,619 |
49
 
50
  *Token counts calculated using Qwen3 Tokenizer
 
35
  |----------|-------------|
36
  | Business & Finance | 736,071,807 |
37
  | News | 1,700,662,378 |
38
+ | Education | 576,489,778 |
39
  | Social | 211,000,000 |
40
  | Government | 40,492,117 |
41
  | Medical | 42,987,587 |
 
44
  | Research Articles | 4,185,649,758 |
45
  | Law | 467,994,847 |
46
  | Travel | 6,948,290 |
 
47
  | Others | 4,410,619 |
48
 
49
  *Token counts calculated using Qwen3 Tokenizer