Update README.md
Browse files
README.md
CHANGED
|
@@ -34,6 +34,12 @@ It was trained on both T5 (text) and the [AnimaTextToImagePipeline](https://hugg
|
|
| 34 |
## Z-Image and Qwen
|
| 35 |
|
| 36 |
- LLMs have redundant knowledge (2511.07384, 2403.03853). Thus, resorting to smaller language models does not result in irrecoverable knowledge loss, as has been [demonstrated](https://huggingface.co/nightknocker/recurrent-qwen3-z-image-turbo). This is particularly true for specialized anime models.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 37 |
|
| 38 |
## Inference
|
| 39 |
|
|
|
|
| 34 |
## Z-Image and Qwen
|
| 35 |
|
| 36 |
- LLMs have redundant knowledge (2511.07384, 2403.03853). Thus, resorting to smaller language models does not result in irrecoverable knowledge loss, as has been [demonstrated](https://huggingface.co/nightknocker/recurrent-qwen3-z-image-turbo). This is particularly true for specialized anime models.
|
| 37 |
+
|
| 38 |
+
## Subject-Focused Attention
|
| 39 |
+
|
| 40 |
+
In an SVO sentence structure, CLIPs focus too much on the subject, text encoders are undertrained for certain verbs and cannot reliably identify the object's position.
|
| 41 |
+
|
| 42 |
+
This repo is an experiment to address these issues. The spatial knowledge is explicitly encoded, so the attention modules are not overwhelmed by the task.
|
| 43 |
|
| 44 |
## Inference
|
| 45 |
|