MultiLangModel

1. Introduction

MultiLangModel excels at translation and multilingual tasks. This checkpoint is selected based on the best translation benchmark score.

	Benchmark	MLModel-v1	MLModel-v2	MultiLangModel
Core Reasoning Tasks	Math Reasoning	0.510	0.535	0.550
	Logical Reasoning	0.789	0.801	0.736
	Common Sense	0.716	0.702	0.700
Language Understanding	Reading Comprehension	0.671	0.685	0.644
	Question Answering	0.582	0.599	0.819
	Text Classification	0.803	0.811	0.792
	Sentiment Analysis	0.777	0.781	0.607
Generation Tasks	Code Generation	0.615	0.631	0.828
	Creative Writing	0.588	0.579	0.758
	Dialogue Generation	0.621	0.635	0.767
	Summarization	0.745	0.755	0.804
Specialized Capabilities	Translation	0.782	0.799	0.804
	Knowledge Retrieval	0.651	0.668	0.610
	Instruction Following	0.733	0.749	0.758
	Safety Evaluation	0.718	0.701	0.739

MultiLangModel achieves top performance on translation tasks while maintaining strong results across all other benchmarks.

Open an issue on GitHub.