Migrate model card from transformers-repo
Browse filesRead announcement at https://discuss.huggingface.co/t/announcement-all-model-cards-will-be-migrated-to-hf-co-model-repos/2755
Original file history: https://github.com/huggingface/transformers/commits/master/model_cards/savasy/bert-turkish-text-classification/README.md
README.md
ADDED
|
@@ -0,0 +1,102 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
language: tr
|
| 3 |
+
---
|
| 4 |
+
|
| 5 |
+
# Turkish Text Classification
|
| 6 |
+
|
| 7 |
+
This model is a fine-tune model of https://github.com/stefan-it/turkish-bert by using text classification data where there are 7 categories as follows
|
| 8 |
+
|
| 9 |
+
```
|
| 10 |
+
code_to_label={
|
| 11 |
+
'LABEL_0': 'dunya ',
|
| 12 |
+
'LABEL_1': 'ekonomi ',
|
| 13 |
+
'LABEL_2': 'kultur ',
|
| 14 |
+
'LABEL_3': 'saglik ',
|
| 15 |
+
'LABEL_4': 'siyaset ',
|
| 16 |
+
'LABEL_5': 'spor ',
|
| 17 |
+
'LABEL_6': 'teknoloji '}
|
| 18 |
+
|
| 19 |
+
```
|
| 20 |
+
|
| 21 |
+
|
| 22 |
+
## Data
|
| 23 |
+
The following Turkish benchmark dataset is used for fine-tuning
|
| 24 |
+
|
| 25 |
+
https://www.kaggle.com/savasy/ttc4900
|
| 26 |
+
|
| 27 |
+
## Quick Start
|
| 28 |
+
|
| 29 |
+
Bewgin with installing transformers as follows
|
| 30 |
+
> pip install transformers
|
| 31 |
+
|
| 32 |
+
```
|
| 33 |
+
# Code:
|
| 34 |
+
# import libraries
|
| 35 |
+
from transformers import pipeline, AutoModelForTokenClassification, AutoTokenizer, AutoModelForSequenceClassification
|
| 36 |
+
tokenizer= AutoTokenizer.from_pretrained("savasy/bert-turkish-text-classification")
|
| 37 |
+
|
| 38 |
+
# build and load model, it take time depending on your internet connection
|
| 39 |
+
model= AutoModelForSequenceClassification.from_pretrained("savasy/bert-turkish-text-classification")
|
| 40 |
+
|
| 41 |
+
# make pipeline
|
| 42 |
+
nlp=pipeline("sentiment-analysis", model=model, tokenizer=tokenizer)
|
| 43 |
+
|
| 44 |
+
# apply model
|
| 45 |
+
nlp("bla bla")
|
| 46 |
+
# [{'label': 'LABEL_2', 'score': 0.4753005802631378}]
|
| 47 |
+
|
| 48 |
+
code_to_label={
|
| 49 |
+
'LABEL_0': 'dunya ',
|
| 50 |
+
'LABEL_1': 'ekonomi ',
|
| 51 |
+
'LABEL_2': 'kultur ',
|
| 52 |
+
'LABEL_3': 'saglik ',
|
| 53 |
+
'LABEL_4': 'siyaset ',
|
| 54 |
+
'LABEL_5': 'spor ',
|
| 55 |
+
'LABEL_6': 'teknoloji '}
|
| 56 |
+
|
| 57 |
+
code_to_label[nlp("bla bla")[0]['label']]
|
| 58 |
+
# > 'kultur '
|
| 59 |
+
```
|
| 60 |
+
|
| 61 |
+
## How the model was trained
|
| 62 |
+
|
| 63 |
+
```
|
| 64 |
+
|
| 65 |
+
## loading data for Turkish text classification
|
| 66 |
+
import pandas as pd
|
| 67 |
+
# https://www.kaggle.com/savasy/ttc4900
|
| 68 |
+
df=pd.read_csv("7allV03.csv")
|
| 69 |
+
df.columns=["labels","text"]
|
| 70 |
+
df.labels=pd.Categorical(df.labels)
|
| 71 |
+
|
| 72 |
+
traind_df=...
|
| 73 |
+
eval_df=...
|
| 74 |
+
|
| 75 |
+
# model
|
| 76 |
+
from simpletransformers.classification import ClassificationModel
|
| 77 |
+
import torch,sklearn
|
| 78 |
+
|
| 79 |
+
model_args = {
|
| 80 |
+
"use_early_stopping": True,
|
| 81 |
+
"early_stopping_delta": 0.01,
|
| 82 |
+
"early_stopping_metric": "mcc",
|
| 83 |
+
"early_stopping_metric_minimize": False,
|
| 84 |
+
"early_stopping_patience": 5,
|
| 85 |
+
"evaluate_during_training_steps": 1000,
|
| 86 |
+
"fp16": False,
|
| 87 |
+
"num_train_epochs":3
|
| 88 |
+
}
|
| 89 |
+
|
| 90 |
+
model = ClassificationModel(
|
| 91 |
+
"bert",
|
| 92 |
+
"dbmdz/bert-base-turkish-cased",
|
| 93 |
+
use_cuda=cuda_available,
|
| 94 |
+
args=model_args,
|
| 95 |
+
num_labels=7
|
| 96 |
+
)
|
| 97 |
+
model.train_model(train_df, acc=sklearn.metrics.accuracy_score)
|
| 98 |
+
```
|
| 99 |
+
For other training models please check https://simpletransformers.ai/
|
| 100 |
+
|
| 101 |
+
|
| 102 |
+
For the detailed usage of Turkish Text Classification please check [python notebook](https://github.com/savasy/TurkishTextClassification/blob/master/Bert_base_Text_Classification_for_Turkish.ipynb)
|