--- license: apache-2.0 tags: - anima - modernbert base_model_relation: finetune base_model: - circlestone-labs/Anima --- # Cosmos BERT BERT for Anima/Cosmos. This is *not* an adapter model, but rather an early replacement for the T5/Qwen model. This means that the T5, Qwen, and LLM adapter files are about to say goodbye. It was trained on both T5 (text) and the [AnimaTextToImagePipeline](https://huggingface.co/nightknocker/tdrussell-secret-model-diffusers) (text-image pairs). ![](images/preview.png) ## LoRA support Character adapters created by kohya-ss/sd-scripts are compatible with the BERT text encoder. This new text encoder seemingly recognizes the [trigger words](https://huggingface.co/datasets/newtextdoc1111/danbooru-tag-csv) without issue. ## Mixing @tags ![](images/mix.png) ## What has changed #### CLIP and LongCLIP - Read the model configuration. Note that the token length is no longer limited to 77 or [248](https://huggingface.co/nightknocker/sdxs-1b-image-to-longclip-encoder). ### SD models - Compared to the old CLIPTextModel, it supports longer text input and has a modernized architecture. - See the References section. None of the retrained text encoders has poorer text understanding than the CLIP models. Furthermore, they demonstrated improved understanding of [gestures, spatial relations, and colors](https://huggingface.co/nightknocker/rosaceae-t5gemma-adapter). ## Z-Image and Qwen - LLMs have redundant knowledge (2511.07384, 2403.03853). Thus, resorting to smaller language models does not result in irrecoverable knowledge loss, as has been [demonstrated](https://huggingface.co/nightknocker/recurrent-qwen3-z-image-turbo). This is particularly true for specialized anime models. ## Subject-Focused Attention - In an SVO sentence structure, CLIPs focus too much on the subject, text encoders are undertrained for certain verbs and cannot reliably identify the object's position. ## Inference ```python # Use the default ModernBertConfig. bert = CosmosBert.from_pretrained('nightknocker/cosmos-bert') tokenizer = AutoTokenizer.from_pretrained('nightknocker/cosmos-bert') inputs = tokenizer(text, return_tensors='pt').to('cuda') crossattn_emb = bert.forward(**inputs, return_dict=True).last_hidden_state ``` ## References - [Recurrent Qwen](https://huggingface.co/nightknocker/recurrent-qwen3-z-image-turbo) - [Recurrent Gemma](https://huggingface.co/nightknocker/recurrent-t5gemma-l-l-ul2-encoder) - [Rosaceae](https://huggingface.co/nightknocker/rosaceae-t5gemma-adapter) ## Datasets - anime-art-multicaptions (multicharacter interactions) - danbooru2025-metadata - danbooru wikis full - [eyes](https://huggingface.co/datasets/nightknocker/anima-eyes-never-lie) - [rouwei 0.8](https://huggingface.co/datasets/nightknocker/rouwei-eyes-never-lie)