๐ Releasing a new series of 8 zeroshot classifiers: better performance, fully commercially useable thanks to synthetic data, up to 8192 tokens, run on any hardware.
Summary: ๐ค The zeroshot-v2.0-c series replaces commercially restrictive training data with synthetic data generated with mistralai/Mixtral-8x7B-Instruct-v0.1 (Apache 2.0). All models are released under the MIT license. ๐ฆพ The best model performs 17%-points better across 28 tasks vs. facebook/bart-large-mnli (the most downloaded commercially-friendly baseline). ๐ The series includes a multilingual variant fine-tuned from BAAI/bge-m3 for zeroshot classification in 100+ languages and with a context window of 8192 tokens ๐ชถ The models are 0.2 - 0.6 B parameters small, so they run on any hardware. The base-size models are +2x faster than bart-large-mnli while performing significantly better. ๐ค The models are not generative LLMs, they are efficient encoder-only models specialized in zeroshot classification through the universal NLI task. ๐ค For users where commercially restrictive training data is not an issue, I've also trained variants with even more human data for improved performance.
Next steps: โ๏ธ I'll publish a blog post with more details soon ๐ฎ There are several improvements I'm planning for v2.1. Especially the multilingual model has room for improvement.