Model Card for Model ID

Medium-sized ModernBERT trained on a custom corpus written mainly in Simplified Chinese using WordLevel tokenization (equivalently, tokenization determined by the corpus files). The custom corpus consists of the entire Chinese Treebank 9.0 and the first half of the "XIN_CMN"-portion of the Tagged Chinese Gigaword Version 2.0.