Update README.md
Browse files
README.md
CHANGED
|
@@ -6,7 +6,8 @@ datasets:
|
|
| 6 |
|
| 7 |
# TFM-tokenizer
|
| 8 |
|
| 9 |
-
|
|
|
|
| 10 |
This tokenizer was trained on 2M samples from:
|
| 11 |
- Web-EN 50%
|
| 12 |
- Web-ZH 20%
|
|
|
|
| 6 |
|
| 7 |
# TFM-tokenizer
|
| 8 |
|
| 9 |
+
TFM-tokenizer is trained based on [SmallCorpus](https://huggingface.co/datasets/SmallDoge/SmallCorpus), supporting table understanding, document retrieval, tool invocation, and reasoning.
|
| 10 |
+
|
| 11 |
This tokenizer was trained on 2M samples from:
|
| 12 |
- Web-EN 50%
|
| 13 |
- Web-ZH 20%
|