Update README.md
Browse files
README.md
CHANGED
|
@@ -12,6 +12,7 @@ language:
|
|
| 12 |
- en
|
| 13 |
- da
|
| 14 |
- de
|
|
|
|
| 15 |
license: cc-by-4.0
|
| 16 |
tags:
|
| 17 |
- pretraining
|
|
@@ -19,7 +20,7 @@ tags:
|
|
| 19 |
|
| 20 |
# mELECTRA (Multilingual ELECTRA)
|
| 21 |
|
| 22 |
-
mELECTRA is an [Electra](https://arxiv.org/abs/2003.10555)-based model pretrained on a diverse multilingual corpus. It supports multiple languages, including **Swedish (SE), Slovenian (SL), Slovak (SK), Portuguese (PT), Polish (PL), Norwegian (NO), Italian (IT), Croatian (HR), French (FR), English (EN), Danish (DK), German (DE), and Czech (CZ)**. The model can be fine-tuned for various NLP tasks such as text classification, named entity recognition, and masked token prediction.
|
| 23 |
|
| 24 |
This model is released under the [CC BY 4.0 license](https://creativecommons.org/licenses/by/4.0/), allowing commercial use. If you encounter any issues, please visit our [GitHub repository](https://github.com/your-repo/mELECTRA).
|
| 25 |
|
|
@@ -28,7 +29,7 @@ This model is released under the [CC BY 4.0 license](https://creativecommons.org
|
|
| 28 |
## Model Details
|
| 29 |
|
| 30 |
- **Architecture:** ELECTRA-Small
|
| 31 |
-
- **Languages Supported:** Swedish, Slovenian, Slovak, Portuguese, Polish, Norwegian, Italian, Croatian, French, English, Danish, German, Czech
|
| 32 |
- **Pretraining Data:** Multilingual corpus (news articles, Wikipedia, and web texts)
|
| 33 |
- **Vocabulary:** SentencePiece-based tokenizer (`m.model`)
|
| 34 |
|
|
|
|
| 12 |
- en
|
| 13 |
- da
|
| 14 |
- de
|
| 15 |
+
- sp
|
| 16 |
license: cc-by-4.0
|
| 17 |
tags:
|
| 18 |
- pretraining
|
|
|
|
| 20 |
|
| 21 |
# mELECTRA (Multilingual ELECTRA)
|
| 22 |
|
| 23 |
+
mELECTRA is an [Electra](https://arxiv.org/abs/2003.10555)-based model pretrained on a diverse multilingual corpus. It supports multiple languages, including **Swedish (SE), Slovenian (SL), Slovak (SK), Spanish (SP), Portuguese (PT), Polish (PL), Norwegian (NO), Italian (IT), Croatian (HR), French (FR), English (EN), Danish (DK), German (DE), and Czech (CZ)**. The model can be fine-tuned for various NLP tasks such as text classification, named entity recognition, and masked token prediction.
|
| 24 |
|
| 25 |
This model is released under the [CC BY 4.0 license](https://creativecommons.org/licenses/by/4.0/), allowing commercial use. If you encounter any issues, please visit our [GitHub repository](https://github.com/your-repo/mELECTRA).
|
| 26 |
|
|
|
|
| 29 |
## Model Details
|
| 30 |
|
| 31 |
- **Architecture:** ELECTRA-Small
|
| 32 |
+
- **Languages Supported:** Swedish, Slovenian, Slovak, Portuguese, Spanish, Polish, Norwegian, Italian, Croatian, French, English, Danish, German, Czech
|
| 33 |
- **Pretraining Data:** Multilingual corpus (news articles, Wikipedia, and web texts)
|
| 34 |
- **Vocabulary:** SentencePiece-based tokenizer (`m.model`)
|
| 35 |
|