IsmatS commited on
Commit
93f6a50
·
verified ·
1 Parent(s): a2cc2e8

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +138 -3
README.md CHANGED
@@ -1,3 +1,138 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # mBERT Azerbaijani NER Model
2
+
3
+ [![Hugging Face Model](https://img.shields.io/badge/Hugging%20Face-Model-blue)](https://huggingface.co/IsmatS/mbert-az-ner)
4
+
5
+ This model is a fine-tuned version of **mBERT** (Multilingual BERT) for Named Entity Recognition (NER) in the Azerbaijani language. It recognizes several entity types commonly used in Azerbaijani text, providing solid performance on tasks requiring entity extraction, such as personal names, locations, organizations, and dates.
6
+
7
+ ## Model Details
8
+
9
+ - **Base Model**: `bert-base-multilingual-cased`
10
+ - **Fine-tuned on**: [Azerbaijani Named Entity Recognition Dataset](https://huggingface.co/datasets/LocalDoc/azerbaijani-ner-dataset)
11
+ - **Task**: Named Entity Recognition (NER)
12
+ - **Language**: Azerbaijani (az)
13
+ - **Dataset**: Custom Azerbaijani NER dataset with entity tags such as `PERSON`, `LOCATION`, `ORGANISATION`, `DATE`, etc.
14
+
15
+ ### Data Source
16
+
17
+ The model was trained on the [Azerbaijani NER Dataset](https://huggingface.co/datasets/LocalDoc/azerbaijani-ner-dataset), which provides annotated data with 25 distinct entity types specifically for the Azerbaijani language. This dataset is an invaluable resource for improving NLP tasks in Azerbaijani, including entity recognition and language understanding.
18
+
19
+ ### Entity Types
20
+ The model recognizes the following entities:
21
+ - **PERSON**: Names of people
22
+ - **LOCATION**: Geographical locations
23
+ - **ORGANISATION**: Companies, institutions
24
+ - **DATE**: Dates and periods
25
+ - **MONEY**: Monetary values
26
+ - **TIME**: Time expressions
27
+ - **GPE**: Countries, cities, states
28
+ - **FACILITY**: Buildings, landmarks, etc.
29
+ - **EVENT**: Events and occurrences
30
+ - **...and more**
31
+
32
+ For the full list of entities, please refer to the dataset description.
33
+
34
+ ## Performance Metrics
35
+
36
+ ### Epoch-wise Performance
37
+
38
+ | Epoch | Training Loss | Validation Loss | Precision | Recall | F1 | Accuracy |
39
+ |-------|---------------|-----------------|-----------|--------|--------|----------|
40
+ | 1 | 0.295200 | 0.265711 | 0.715424 | 0.622853 | 0.665937 | 0.919136 |
41
+ | 2 | 0.248600 | 0.252083 | 0.721036 | 0.637979 | 0.676970 | 0.921439 |
42
+ | 3 | 0.206800 | 0.253372 | 0.704872 | 0.650684 | 0.676695 | 0.920898 |
43
+
44
+ ### Evaluation Summary (Epoch 3)
45
+
46
+ - **Evaluation Loss**: 0.253372
47
+ - **Evaluation Precision**: 0.704872
48
+ - **Evaluation Recall**: 0.650684
49
+ - **Evaluation F1**: 0.676695
50
+ - **Evaluation Accuracy**: 0.920898
51
+
52
+ ## Usage
53
+
54
+ You can use this model with the Hugging Face `transformers` library to perform NER on Azerbaijani text. Here’s an example:
55
+
56
+ ### Installation
57
+
58
+ Make sure you have the `transformers` library installed:
59
+
60
+ ```bash
61
+ pip install transformers
62
+ ```
63
+
64
+ ### Inference Example
65
+
66
+ Load the model and tokenizer, then run the NER pipeline on Azerbaijani text:
67
+
68
+ ```python
69
+ from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline
70
+
71
+ # Load the model and tokenizer
72
+ model_name = "IsmatS/mbert-az-ner"
73
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
74
+ model = AutoModelForTokenClassification.from_pretrained(model_name)
75
+
76
+ # Set up the NER pipeline
77
+ nlp_ner = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy="simple")
78
+
79
+ # Example sentence
80
+ sentence = "Bakı şəhərində Azərbaycan Respublikasının prezidenti İlham Əliyev."
81
+ entities = nlp_ner(sentence)
82
+
83
+ # Display entities
84
+ for entity in entities:
85
+ print(f"Entity: {entity['word']}, Label: {entity['entity_group']}, Score: {entity['score']}")
86
+ ```
87
+
88
+ ### Sample Output
89
+ ```json
90
+ [
91
+ {
92
+ "entity_group": "PERSON",
93
+ "score": 0.97,
94
+ "word": "İlham Əliyev",
95
+ "start": 34,
96
+ "end": 46
97
+ },
98
+ {
99
+ "entity_group": "LOCATION",
100
+ "score": 0.95,
101
+ "word": "Bakı",
102
+ "start": 0,
103
+ "end": 4
104
+ }
105
+ ]
106
+ ```
107
+
108
+ ## Training Details
109
+
110
+ - **Training Data**: This model was fine-tuned on the [Azerbaijani NER Dataset](https://huggingface.co/datasets/LocalDoc/azerbaijani-ner-dataset) with 25 entity types.
111
+ - **Training Framework**: Hugging Face `transformers`
112
+ - **Optimizer**: AdamW
113
+ - **Epochs**: 3
114
+ - **Batch Size**: 64
115
+ - **Evaluation Metric**: F1-score
116
+
117
+ ## Limitations
118
+
119
+ - The model is trained specifically for the Azerbaijani language and may not generalize well to other languages.
120
+ - Certain rare entities may be misclassified due to limited training data in those categories.
121
+
122
+ ## Citation
123
+
124
+ If you use this model in your research or application, please consider citing:
125
+
126
+ ```
127
+ @model{ismats_mbert_az_ner_2024,
128
+ title={mBERT Azerbaijani NER Model},
129
+ author={Ismat Samadov},
130
+ year={2024},
131
+ publisher={Hugging Face},
132
+ url={https://huggingface.co/IsmatS/mbert-az-ner}
133
+ }
134
+ ```
135
+
136
+ ## License
137
+
138
+ This model is available under the [MIT License](https://opensource.org/licenses/MIT).