Added logistic regression language classifier model
Browse files- README.md +2 -38
- config.json +6 -0
- model/language_classifier.joblib +3 -0
- model_card.md +11 -0
README.md
CHANGED
|
@@ -1,40 +1,4 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: mit
|
| 3 |
-
language:
|
| 4 |
-
- ru
|
| 5 |
-
pipeline_tag: text-classification
|
| 6 |
-
tags:
|
| 7 |
-
- tuvan
|
| 8 |
-
- russian
|
| 9 |
-
- binary classifier
|
| 10 |
-
---
|
| 11 |
-
# GitHub
|
| 12 |
-
|
| 13 |
-
<!-- Provide a quick summary of what the model is/does. -->
|
| 14 |
-
|
| 15 |
-
TuRu - Tuvan/Russian binary classifier model [GitHub](https://github.com/tarbagan/tuvalang/tree/main/turu).
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
## How to use
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
```python
|
| 23 |
-
from tensorflow.keras.models import load_model
|
| 24 |
-
|
| 25 |
-
model = load_model('turu.h5')
|
| 26 |
-
|
| 27 |
-
text_to_predict = ["""
|
| 28 |
-
Президент ооң бодалы-биле алырга, регионалдыг-даа, муниципалдыг-даа деңнелде деткиир ужурлуг регионнарда спортчу инфраструктура хөгжүлдезиниң айтырыын көрген.
|
| 29 |
-
Ооң келир үеде президент программазының угланыышкыны ол апаарын Владимир Путин чугаалаан.
|
| 30 |
-
"""]
|
| 31 |
-
|
| 32 |
-
sequences = tokenizer.texts_to_sequences(text_to_predict)
|
| 33 |
-
padded = pad_sequences(sequences, maxlen=10)
|
| 34 |
-
|
| 35 |
-
prediction = model.predict(padded)
|
| 36 |
-
print(prediction)
|
| 37 |
-
|
| 38 |
-
```
|
| 39 |
|
|
|
|
| 40 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
|
| 2 |
+
# Language Classifier
|
| 3 |
|
| 4 |
+
This model is trained to classify text as either Russian or Tuvan language.
|
config.json
ADDED
|
@@ -0,0 +1,6 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
|
| 2 |
+
{
|
| 3 |
+
"model_type": "logistic_regression",
|
| 4 |
+
"language": ["russian", "tuvan"],
|
| 5 |
+
"pipeline_tag": "text-classification"
|
| 6 |
+
}
|
model/language_classifier.joblib
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:93552bc0072004f7cceece81b1ffd546743d530c55d00fe1dd7703e5a35b87b6
|
| 3 |
+
size 14610753
|
model_card.md
ADDED
|
@@ -0,0 +1,11 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
|
| 2 |
+
---
|
| 3 |
+
tags:
|
| 4 |
+
- language-classification
|
| 5 |
+
- russian
|
| 6 |
+
- tuvan
|
| 7 |
+
---
|
| 8 |
+
|
| 9 |
+
# Language Classifier
|
| 10 |
+
|
| 11 |
+
This model is trained to classify text as either Russian or Tuvan language. It is based on a logistic regression classifier.
|