| datasets: | |
| - manu/tok_corpus | |
| language: | |
| - fr | |
| - en | |
| BPE Tokenizer fitted on a custom corpus, with digit separation, byte fallback and other features from LlamaTokenizer. | |
| Only fitted on 1,000,000 samples. |
| datasets: | |
| - manu/tok_corpus | |
| language: | |
| - fr | |
| - en | |
| BPE Tokenizer fitted on a custom corpus, with digit separation, byte fallback and other features from LlamaTokenizer. | |
| Only fitted on 1,000,000 samples. |