scoup123
/

affixIdentifier

Text Classification

feature-extraction

text-embeddings-inference

Model card Files Files and versions

affixIdentifier / README.md

scoup123's picture

Update README.md

eefee81 verified about 2 years ago

|

history blame contribute delete

2.42 kB

	---
	datasets:
	- scoup123/AffixChecker
	language:
	- tr
	metrics:
	- accuracy
	pipeline_tag: text-classification
	---
	# Model Card for Model ID

	### Model Description
	Given 2 words in Turkish, the model predicts whether they share an affix or not. Fine-tuned on dbmdz/bert-base-turkish-cased,
	fine-tuned on a task similar to NLI, but on word level and with 2 labels. It was created as a final project for one of my classes.



	- Developed by: Scoup123
	- Model type: BERT
	- Language(s) (NLP): Turkish
	- Finetuned from model [optional]: dbmdz/bert-base-turkish-cased

	### Model Sources [optional]

	<!-- Provide the basic links for the model. -->

	- Repository: [More Information Needed]
	- Paper [optional]: in-works
	-

	## Uses

	It can be used in morphological analyzing tasks.
	### Direct Use

	It can probably be used without additional finetuning on Turkish.

	## Training Details

	### Training Data

	scoup123/affixfinder

	The dataset used was generated from a generated dataset mentioned in the paper titled Turkish language resources: Morphological parser, morphological disambiguator and web corpus.


	## Evaluation

	Test Accuracy: 0.9874
	Precision: 0.9874
	Recall: 0.9874
	F1 Score: 0.9874

	**It should be used with caution as these scores are too high.

	### Testing Data, Factors & Metrics

	#### Testing Data

	A testing split data was created from the training data

	#### Summary

	This model aims to create an affix identifier for Turkish.

	## Model Examination [optional]

	I have just created it, so further testing needed to check if it actually works. Additionally, you should check it if it works before using it.

	[More Information Needed]

	## Environmental Impact



	Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).

	- Hardware Type: Free Colab T4 GPU
	- Hours used: ~2.5 hours
	- Cloud Provider: Google
	- Compute Region: Europe
	- Carbon Emitted: [More Information Needed]


	## Citation [optional]

	APA:

	Sak, H., Güngör, T., & Saraçlar, M. (2008). Turkish language resources: Morphological parser, morphological disambiguator and web corpus.
	In Advances in natural language processing (pp. 417-427). Springer Berlin Heidelberg.




	## Model Card Authors [optional]

	Kaan Bayar

	## Model Card Contact

	kaan.bayar13@gmail.com