Instructions to use mys/electra-base-turkish-cased-ner with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use mys/electra-base-turkish-cased-ner with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("token-classification", model="mys/electra-base-turkish-cased-ner")# Load model directly from transformers import AutoTokenizer, AutoModelForTokenClassification tokenizer = AutoTokenizer.from_pretrained("mys/electra-base-turkish-cased-ner") model = AutoModelForTokenClassification.from_pretrained("mys/electra-base-turkish-cased-ner") - Notebooks
- Google Colab
- Kaggle
What is this
A NER model for Turkish with 48 categories trained on the dataset Shrinked TWNERTC Turkish NER Data by Behçet Şentürk, which is itself a filtered and cleaned version of the following automatically labeled dataset:
Sahin, H. Bahadir; Eren, Mustafa Tolga; Tirkaz, Caglar; Sonmez, Ozan; Yildiz, Eray (2017), “English/Turkish Wikipedia Named-Entity Recognition and Text Categorization Dataset”, Mendeley Data, v1 http://dx.doi.org/10.17632/cdcztymf4k.1
Backbone model
The backbone model is electra-base-turkish-cased-discriminator, and I finetuned it for token classification.
I'm continuing to figure out if it is possible to improve accuracy with this dataset, but it is already usable for non-critic applications. You can reach out to me on Twitter for discussions and issues. I will also release a notebook to finetune NER models with Shrinked TWNERTC as well as sample inference code to demonstrate what's possible with this model.
- Downloads last month
- 4