Update README.md
Browse files
README.md
CHANGED
|
@@ -19,6 +19,22 @@ It was trained on both T5 (text) and the [AnimaTextToImagePipeline](https://hugg
|
|
| 19 |
|
| 20 |

|
| 21 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 22 |
## Inference
|
| 23 |
|
| 24 |
```python
|
|
|
|
| 19 |
|
| 20 |

|
| 21 |
|
| 22 |
+
## What has changed
|
| 23 |
+
|
| 24 |
+
#### CLIP and LongCLIP
|
| 25 |
+
|
| 26 |
+
- Read the model configuration. Note that the token length is no longer limited to 77 or [248](https://huggingface.co/nightknocker/sdxs-1b-image-to-longclip-encoder).
|
| 27 |
+
|
| 28 |
+
### SDXL models
|
| 29 |
+
|
| 30 |
+
- Compared to the old CLIPTextModel, it supports longer text input and has a modernized architecture.
|
| 31 |
+
|
| 32 |
+
- See the References section. None of the retrained text encoders has poorer text understanding than the CLIP models. Furthermore, they demonstrated improved understanding of [gestures, spatial relations, and colors](https://huggingface.co/nightknocker/rosaceae-t5gemma-adapter).
|
| 33 |
+
|
| 34 |
+
## Z-Image and Qwen
|
| 35 |
+
|
| 36 |
+
- LLMs have redundant knowledge (2511.07384, 2403.03853). Thus, resorting to smaller language models does not result in irrecoverable knowledge loss, as has been demonstrated. This is particularly true for specialized anime models.
|
| 37 |
+
|
| 38 |
## Inference
|
| 39 |
|
| 40 |
```python
|