|
|
--- |
|
|
datasets: |
|
|
- scoup123/AffixChecker |
|
|
language: |
|
|
- tr |
|
|
metrics: |
|
|
- accuracy |
|
|
pipeline_tag: text-classification |
|
|
--- |
|
|
# Model Card for Model ID |
|
|
|
|
|
### Model Description |
|
|
Given 2 words in Turkish, the model predicts whether they share an affix or not. Fine-tuned on dbmdz/bert-base-turkish-cased, |
|
|
fine-tuned on a task similar to NLI, but on word level and with 2 labels. It was created as a final project for one of my classes. |
|
|
|
|
|
|
|
|
|
|
|
- **Developed by:** Scoup123 |
|
|
- **Model type:** BERT |
|
|
- **Language(s) (NLP):** Turkish |
|
|
- **Finetuned from model [optional]:** dbmdz/bert-base-turkish-cased |
|
|
|
|
|
### Model Sources [optional] |
|
|
|
|
|
<!-- Provide the basic links for the model. --> |
|
|
|
|
|
- **Repository:** [More Information Needed] |
|
|
- **Paper [optional]:** in-works |
|
|
- |
|
|
|
|
|
## Uses |
|
|
|
|
|
It can be used in morphological analyzing tasks. |
|
|
### Direct Use |
|
|
|
|
|
It can probably be used without additional finetuning on Turkish. |
|
|
|
|
|
## Training Details |
|
|
|
|
|
### Training Data |
|
|
|
|
|
scoup123/affixfinder |
|
|
|
|
|
The dataset used was generated from a generated dataset mentioned in the paper titled Turkish language resources: Morphological parser, morphological disambiguator and web corpus. |
|
|
|
|
|
|
|
|
## Evaluation |
|
|
|
|
|
Test Accuracy: 0.9874 |
|
|
Precision: 0.9874 |
|
|
Recall: 0.9874 |
|
|
F1 Score: 0.9874 |
|
|
|
|
|
**It should be used with caution as these scores are too high. |
|
|
|
|
|
### Testing Data, Factors & Metrics |
|
|
|
|
|
#### Testing Data |
|
|
|
|
|
A testing split data was created from the training data |
|
|
|
|
|
#### Summary |
|
|
|
|
|
This model aims to create an affix identifier for Turkish. |
|
|
|
|
|
## Model Examination [optional] |
|
|
|
|
|
I have just created it, so further testing needed to check if it actually works. Additionally, you should check it if it works before using it. |
|
|
|
|
|
[More Information Needed] |
|
|
|
|
|
## Environmental Impact |
|
|
|
|
|
|
|
|
|
|
|
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). |
|
|
|
|
|
- **Hardware Type:** Free Colab T4 GPU |
|
|
- **Hours used:** ~2.5 hours |
|
|
- **Cloud Provider:** Google |
|
|
- **Compute Region:** Europe |
|
|
- **Carbon Emitted:** [More Information Needed] |
|
|
|
|
|
|
|
|
## Citation [optional] |
|
|
|
|
|
**APA:** |
|
|
|
|
|
Sak, H., Güngör, T., & Saraçlar, M. (2008). Turkish language resources: Morphological parser, morphological disambiguator and web corpus. |
|
|
In Advances in natural language processing (pp. 417-427). Springer Berlin Heidelberg. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## Model Card Authors [optional] |
|
|
|
|
|
Kaan Bayar |
|
|
|
|
|
## Model Card Contact |
|
|
|
|
|
kaan.bayar13@gmail.com |