Update README.md
Browse files
README.md
CHANGED
|
@@ -1,6 +1,38 @@
|
|
| 1 |
---
|
| 2 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
---
|
|
|
|
| 4 |
# MWirelabs/NortheastNER
|
| 5 |
|
| 6 |
**NortheastNER** is a Named Entity Recognition (NER) model fine-tuned by [MWirelabs](https://huggingface.co/MWirelabs) to recognize entities specific to **Northeast India**.
|
|
@@ -10,8 +42,6 @@ It is based on [xlm-roberta-base](https://huggingface.co/xlm-roberta-base) and t
|
|
| 10 |
|
| 11 |
## π What it can recognize
|
| 12 |
|
| 13 |
-
The model is trained to extract the following entity types:
|
| 14 |
-
|
| 15 |
* **PLACES** β States, districts, villages, regions (e.g., *Shillong*, *Tura*, *Ri-Bhoi*)
|
| 16 |
* **TRIBES** β Indigenous tribes & sub-tribes (e.g., *Khasi*, *Nyishi*, *Wancho*)
|
| 17 |
* **FESTIVALS** β Local festivals (e.g., *Wangala*, *Losar*, *Nyokum Yullo*)
|
|
@@ -39,6 +69,28 @@ Evaluated on a 5k-sentence dev set:
|
|
| 39 |
|
| 40 |
---
|
| 41 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 42 |
## π Usage
|
| 43 |
|
| 44 |
```python
|
|
@@ -65,11 +117,27 @@ Output:
|
|
| 65 |
|
| 66 |
---
|
| 67 |
|
| 68 |
-
##
|
| 69 |
|
| 70 |
-
|
| 71 |
-
|
| 72 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 73 |
|
| 74 |
---
|
| 75 |
|
|
|
|
| 1 |
---
|
| 2 |
+
language: en
|
| 3 |
+
license: cc-by-nc-4.0
|
| 4 |
+
tags:
|
| 5 |
+
- token-classification
|
| 6 |
+
- ner
|
| 7 |
+
- northeast-india
|
| 8 |
+
- low-resource
|
| 9 |
+
- xlm-roberta
|
| 10 |
+
metrics:
|
| 11 |
+
- f1
|
| 12 |
+
- precision
|
| 13 |
+
- recall
|
| 14 |
+
model-index:
|
| 15 |
+
- name: MWirelabs/NortheastNER
|
| 16 |
+
results:
|
| 17 |
+
- task:
|
| 18 |
+
type: token-classification
|
| 19 |
+
name: Named Entity Recognition
|
| 20 |
+
dataset:
|
| 21 |
+
name: Custom Northeast India Gazetteers + News Corpus
|
| 22 |
+
type: custom
|
| 23 |
+
split: dev
|
| 24 |
+
metrics:
|
| 25 |
+
- name: Overall F1
|
| 26 |
+
type: f1
|
| 27 |
+
value: 0.964
|
| 28 |
+
- name: Precision
|
| 29 |
+
type: precision
|
| 30 |
+
value: 0.962
|
| 31 |
+
- name: Recall
|
| 32 |
+
type: recall
|
| 33 |
+
value: 0.967
|
| 34 |
---
|
| 35 |
+
|
| 36 |
# MWirelabs/NortheastNER
|
| 37 |
|
| 38 |
**NortheastNER** is a Named Entity Recognition (NER) model fine-tuned by [MWirelabs](https://huggingface.co/MWirelabs) to recognize entities specific to **Northeast India**.
|
|
|
|
| 42 |
|
| 43 |
## π What it can recognize
|
| 44 |
|
|
|
|
|
|
|
| 45 |
* **PLACES** β States, districts, villages, regions (e.g., *Shillong*, *Tura*, *Ri-Bhoi*)
|
| 46 |
* **TRIBES** β Indigenous tribes & sub-tribes (e.g., *Khasi*, *Nyishi*, *Wancho*)
|
| 47 |
* **FESTIVALS** β Local festivals (e.g., *Wangala*, *Losar*, *Nyokum Yullo*)
|
|
|
|
| 69 |
|
| 70 |
---
|
| 71 |
|
| 72 |
+
## βοΈ Training Setup
|
| 73 |
+
|
| 74 |
+
* **Base model**: `xlm-roberta-base`
|
| 75 |
+
* **Max sequence length**: 256
|
| 76 |
+
* **Batch size**: 16
|
| 77 |
+
* **Learning rate**: 3e-5
|
| 78 |
+
* **Epochs**: 3
|
| 79 |
+
* **Weight decay**: 0.01
|
| 80 |
+
* **Optimizer**: AdamW
|
| 81 |
+
* **Framework**: HuggingFace Transformers Trainer API
|
| 82 |
+
|
| 83 |
+
### π§ Environment
|
| 84 |
+
|
| 85 |
+
* **Transformers**: 4.44.2
|
| 86 |
+
* **Datasets**: 2.20.0
|
| 87 |
+
* **Evaluate**: 0.4.2
|
| 88 |
+
* **PyTorch**: 2.3.0+cu121
|
| 89 |
+
* **Python**: 3.11
|
| 90 |
+
* **Hardware**: Single NVIDIA A4500 GPU (20 GB VRAM), 62 GB RAM, 12 vCPU
|
| 91 |
+
|
| 92 |
+
---
|
| 93 |
+
|
| 94 |
## π Usage
|
| 95 |
|
| 96 |
```python
|
|
|
|
| 117 |
|
| 118 |
---
|
| 119 |
|
| 120 |
+
## π License
|
| 121 |
|
| 122 |
+
This model is licensed under the **Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)** license.
|
| 123 |
+
|
| 124 |
+
You are free to use, share, and adapt the model for non-commercial purposes with attribution.
|
| 125 |
+
|
| 126 |
+
---
|
| 127 |
+
|
| 128 |
+
## π Citation
|
| 129 |
+
|
| 130 |
+
If you use this model in your research, please cite:
|
| 131 |
+
|
| 132 |
+
```bibtex
|
| 133 |
+
@misc{mwirelabs2025northeastner,
|
| 134 |
+
title = {NortheastNER: A Domain-Specific Named Entity Recognition Model for Northeast India},
|
| 135 |
+
author = {MWirelabs},
|
| 136 |
+
year = {2025},
|
| 137 |
+
publisher = {Hugging Face},
|
| 138 |
+
howpublished = {\url{https://huggingface.co/MWirelabs/NortheastNER}},
|
| 139 |
+
}
|
| 140 |
+
```
|
| 141 |
|
| 142 |
---
|
| 143 |
|