Update README.md
Browse files
README.md
CHANGED
|
@@ -7,26 +7,29 @@ model-index:
|
|
| 7 |
results: []
|
| 8 |
---
|
| 9 |
|
| 10 |
-
|
| 11 |
-
should probably proofread and complete it, then remove this comment. -->
|
| 12 |
|
| 13 |
-
|
| 14 |
|
| 15 |
-
|
|
|
|
|
|
|
| 16 |
|
| 17 |
## Model description
|
|
|
|
| 18 |
|
| 19 |
-
|
| 20 |
|
| 21 |
-
|
| 22 |
|
| 23 |
-
|
| 24 |
|
| 25 |
-
|
| 26 |
|
| 27 |
-
More information needed
|
| 28 |
|
| 29 |
-
## Training
|
|
|
|
|
|
|
| 30 |
|
| 31 |
### Training hyperparameters
|
| 32 |
|
|
@@ -41,9 +44,13 @@ The following hyperparameters were used during training:
|
|
| 41 |
- training_steps: 1500000
|
| 42 |
- mixed_precision_training: Apex, opt level O1
|
| 43 |
|
| 44 |
-
###
|
| 45 |
-
|
|
|
|
| 46 |
|
|
|
|
|
|
|
|
|
|
| 47 |
|
| 48 |
### Framework versions
|
| 49 |
|
|
|
|
| 7 |
results: []
|
| 8 |
---
|
| 9 |
|
| 10 |
+
# PIXEL-base-german
|
|
|
|
| 11 |
|
| 12 |
+
`pixel-base-german` is a [https://arxiv.org/abs/2207.06991](PIXEL) model trained on the [stefan-it/german-dbmdz-bert-corpus](German DBMDZ BERT Corpus).
|
| 13 |
|
| 14 |
+
We trained the model using the architecture and [https://github.com/xplip/pixel](codebase) proposed in the 2023 Rust et al. paper [https://arxiv.org/abs/2207.06991](Language Modelling with Pixels).
|
| 15 |
+
|
| 16 |
+
This model has been used in the paper Evaluating Pixel Language Models on Non-Standardized Languages, accepted at COLING 2025.
|
| 17 |
|
| 18 |
## Model description
|
| 19 |
+
*Description from [https://huggingface.co/Team-PIXEL/pixel-base]()*
|
| 20 |
|
| 21 |
+
PIXEL consists of three major components: a text renderer, which draws text as an image; an encoder, which encodes the unmasked regions of the rendered image; and a decoder, which reconstructs the masked regions at the pixel level. It is built on ViT-MAE.
|
| 22 |
|
| 23 |
+
During pretraining, the renderer produces images containing the training sentences. Patches of these images are linearly projected to obtain patch embeddings (as opposed to having an embedding matrix like e.g. in BERT), and 25% of the patches are masked out. The encoder, which is a Vision Transformer (ViT), then only processes the unmasked patches. The lightweight decoder with hidden size 512 and 8 transformer layers inserts learnable mask tokens into the encoder's output sequence and learns to reconstruct the raw pixel values at the masked positions.
|
| 24 |
|
| 25 |
+
After pretraining, the decoder can be discarded leaving an 86M parameter encoder, upon which task-specific classification heads can be stacked. Alternatively, the decoder can be retained and PIXEL can be used as a pixel-level generative language model (see Figures 3 and 6 in the paper for examples).
|
| 26 |
|
| 27 |
+
For more details on how PIXEL works, please check the paper and the codebase linked above.
|
| 28 |
|
|
|
|
| 29 |
|
| 30 |
+
## Training and evaluation data
|
| 31 |
+
|
| 32 |
+
This model has been trained renderized German text from [https://huggingface.co/Team-PIXEL/pixel-base]().
|
| 33 |
|
| 34 |
### Training hyperparameters
|
| 35 |
|
|
|
|
| 44 |
- training_steps: 1500000
|
| 45 |
- mixed_precision_training: Apex, opt level O1
|
| 46 |
|
| 47 |
+
### How to use
|
| 48 |
+
```
|
| 49 |
+
from pixel import PIXELConfig, PIXELForPreTraining
|
| 50 |
|
| 51 |
+
config = PIXELConfig.from_pretrained("amunozo/pixel-base-german")
|
| 52 |
+
model = PIXELForPreTraining.from_pretrained("amunozo/pixel-base-german", config=config)
|
| 53 |
+
```
|
| 54 |
|
| 55 |
### Framework versions
|
| 56 |
|