nightknocker
/

cosmos-bert

Model card Files Files and versions

nightknocker commited on 3 days ago

Commit

72a0e8d

·

verified ·

1 Parent(s): fff074b

Update README.md

Files changed (1) hide show

README.md +6 -0

README.md CHANGED Viewed

@@ -34,6 +34,12 @@ It was trained on both T5 (text) and the [AnimaTextToImagePipeline](https://hugg
 ## Z-Image and Qwen
 - LLMs have redundant knowledge (2511.07384, 2403.03853). Thus, resorting to smaller language models does not result in irrecoverable knowledge loss, as has been [demonstrated](https://huggingface.co/nightknocker/recurrent-qwen3-z-image-turbo). This is particularly true for specialized anime models.
 ## Inference

 ## Z-Image and Qwen
 - LLMs have redundant knowledge (2511.07384, 2403.03853). Thus, resorting to smaller language models does not result in irrecoverable knowledge loss, as has been [demonstrated](https://huggingface.co/nightknocker/recurrent-qwen3-z-image-turbo). This is particularly true for specialized anime models.
+## Subject-Focused Attention
+In an SVO sentence structure, CLIPs focus too much on the subject, text encoders are undertrained for certain verbs and cannot reliably identify the object's position.
+This repo is an experiment to address these issues. The spatial knowledge is explicitly encoded, so the attention modules are not overwhelmed by the task.
 ## Inference