Spaces:

toksuite
/

README

Running

App Files Files Community

README / README.md

gsaltintas's picture

Update README.md

d46d1c3 verified 2 months ago

|

history blame contribute delete

815 Bytes

	---
	title: README
	emoji: 📊
	colorFrom: purple
	colorTo: yellow
	sdk: static
	pinned: false
	license: mit
	---

	<p align="center">
	<img src="./toksuite-pipeline.png" alt="TokSuite Logo"/>
	</p>

	TokSuite is a collection of models and benchmarks designed to isolate and study the impact of tokenization on language model behavior across English, Chinese, Turkish, Italian, and Farsi languages, as well as STEM and mathematical text. It includes fourteen models that share the same architecture, training data, training budget, and initialization but differ only in their tokenizers, alongside a set of benchmarks that evaluate performance under real-world perturbations that affect tokenization.

	Our code is available at [https://github.com/r-three/Tokenizers](https://github.com/r-three/Tokenizers).