Update README.md
Browse files
README.md
CHANGED
|
@@ -28,14 +28,6 @@ language:
|
|
| 28 |
---
|
| 29 |
# Dactory models
|
| 30 |
|
| 31 |
-
* **Model name**: Dactory models
|
| 32 |
-
* **Languages**: Bulgarian, Czech, Danish, German, Greek, English, Spanish, Estonian, Finnish, French, Irish, Croatian, Hungarian, Italian, Lithuanian, Latvian, Maltese, Dutch, Polish, Portuguese, Romanian, Slovak, Slovenian, Swedish
|
| 33 |
-
* **Author**: Kyutai
|
| 34 |
-
* **Model type**: Classification
|
| 35 |
-
* **License**: CC-BY-SA 4.0
|
| 36 |
-
* **Version**: 1.0
|
| 37 |
-
* **Released**: April 2025
|
| 38 |
-
|
| 39 |
## Model description
|
| 40 |
|
| 41 |
This is a set of fastText-based models to evaluate the quality and domain of text, in the 24 official languages of the European Union.
|
|
@@ -48,6 +40,14 @@ Stack Exchange websites related to STEM (`stem`), Humanities (`hum`), pop cultur
|
|
| 48 |
The models were trained to distinguish lines sampled uniformly from these different sources.
|
| 49 |
To get training data for the languages other than English, we translated the English training set with MADLAD, except for the `rand` and `wiki` labels, for which data is readily available in all languages.
|
| 50 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 51 |
## Use cases
|
| 52 |
|
| 53 |
These models can we used to evaluate the quality of text, by estimating how similar it is to text from high quality sources.
|
|
|
|
| 28 |
---
|
| 29 |
# Dactory models
|
| 30 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 31 |
## Model description
|
| 32 |
|
| 33 |
This is a set of fastText-based models to evaluate the quality and domain of text, in the 24 official languages of the European Union.
|
|
|
|
| 40 |
The models were trained to distinguish lines sampled uniformly from these different sources.
|
| 41 |
To get training data for the languages other than English, we translated the English training set with MADLAD, except for the `rand` and `wiki` labels, for which data is readily available in all languages.
|
| 42 |
|
| 43 |
+
* **Model name**: Dactory models
|
| 44 |
+
* **Languages**: Bulgarian, Czech, Danish, German, Greek, English, Spanish, Estonian, Finnish, French, Irish, Croatian, Hungarian, Italian, Lithuanian, Latvian, Maltese, Dutch, Polish, Portuguese, Romanian, Slovak, Slovenian, Swedish
|
| 45 |
+
* **Developed by**: Kyutai
|
| 46 |
+
* **Model type**: Classification
|
| 47 |
+
* **License**: CC-BY-SA 4.0
|
| 48 |
+
* **Version**: 1.0
|
| 49 |
+
* **Released**: April 2025
|
| 50 |
+
|
| 51 |
## Use cases
|
| 52 |
|
| 53 |
These models can we used to evaluate the quality of text, by estimating how similar it is to text from high quality sources.
|