Badnyal commited on
Commit
572aeff
Β·
verified Β·
1 Parent(s): fa78085

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +75 -7
README.md CHANGED
@@ -1,6 +1,38 @@
1
  ---
2
- library_name: transformers
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
4
  # MWirelabs/NortheastNER
5
 
6
  **NortheastNER** is a Named Entity Recognition (NER) model fine-tuned by [MWirelabs](https://huggingface.co/MWirelabs) to recognize entities specific to **Northeast India**.
@@ -10,8 +42,6 @@ It is based on [xlm-roberta-base](https://huggingface.co/xlm-roberta-base) and t
10
 
11
  ## πŸ”Ž What it can recognize
12
 
13
- The model is trained to extract the following entity types:
14
-
15
  * **PLACES** β†’ States, districts, villages, regions (e.g., *Shillong*, *Tura*, *Ri-Bhoi*)
16
  * **TRIBES** β†’ Indigenous tribes & sub-tribes (e.g., *Khasi*, *Nyishi*, *Wancho*)
17
  * **FESTIVALS** β†’ Local festivals (e.g., *Wangala*, *Losar*, *Nyokum Yullo*)
@@ -39,6 +69,28 @@ Evaluated on a 5k-sentence dev set:
39
 
40
  ---
41
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
42
  ## πŸš€ Usage
43
 
44
  ```python
@@ -65,11 +117,27 @@ Output:
65
 
66
  ---
67
 
68
- ## πŸ“Œ Notes
69
 
70
- * Optimized for **Northeast India domain texts**: news, culture, tourism, indigenous knowledge.
71
- * Best performance on **PLACES** and **TRIBES**.
72
- * Model will be continuously improved with more curated datasets (flora, fauna, festivals).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
73
 
74
  ---
75
 
 
1
  ---
2
+ language: en
3
+ license: cc-by-nc-4.0
4
+ tags:
5
+ - token-classification
6
+ - ner
7
+ - northeast-india
8
+ - low-resource
9
+ - xlm-roberta
10
+ metrics:
11
+ - f1
12
+ - precision
13
+ - recall
14
+ model-index:
15
+ - name: MWirelabs/NortheastNER
16
+ results:
17
+ - task:
18
+ type: token-classification
19
+ name: Named Entity Recognition
20
+ dataset:
21
+ name: Custom Northeast India Gazetteers + News Corpus
22
+ type: custom
23
+ split: dev
24
+ metrics:
25
+ - name: Overall F1
26
+ type: f1
27
+ value: 0.964
28
+ - name: Precision
29
+ type: precision
30
+ value: 0.962
31
+ - name: Recall
32
+ type: recall
33
+ value: 0.967
34
  ---
35
+
36
  # MWirelabs/NortheastNER
37
 
38
  **NortheastNER** is a Named Entity Recognition (NER) model fine-tuned by [MWirelabs](https://huggingface.co/MWirelabs) to recognize entities specific to **Northeast India**.
 
42
 
43
  ## πŸ”Ž What it can recognize
44
 
 
 
45
  * **PLACES** β†’ States, districts, villages, regions (e.g., *Shillong*, *Tura*, *Ri-Bhoi*)
46
  * **TRIBES** β†’ Indigenous tribes & sub-tribes (e.g., *Khasi*, *Nyishi*, *Wancho*)
47
  * **FESTIVALS** β†’ Local festivals (e.g., *Wangala*, *Losar*, *Nyokum Yullo*)
 
69
 
70
  ---
71
 
72
+ ## βš™οΈ Training Setup
73
+
74
+ * **Base model**: `xlm-roberta-base`
75
+ * **Max sequence length**: 256
76
+ * **Batch size**: 16
77
+ * **Learning rate**: 3e-5
78
+ * **Epochs**: 3
79
+ * **Weight decay**: 0.01
80
+ * **Optimizer**: AdamW
81
+ * **Framework**: HuggingFace Transformers Trainer API
82
+
83
+ ### πŸ”§ Environment
84
+
85
+ * **Transformers**: 4.44.2
86
+ * **Datasets**: 2.20.0
87
+ * **Evaluate**: 0.4.2
88
+ * **PyTorch**: 2.3.0+cu121
89
+ * **Python**: 3.11
90
+ * **Hardware**: Single NVIDIA A4500 GPU (20 GB VRAM), 62 GB RAM, 12 vCPU
91
+
92
+ ---
93
+
94
  ## πŸš€ Usage
95
 
96
  ```python
 
117
 
118
  ---
119
 
120
+ ## πŸ“œ License
121
 
122
+ This model is licensed under the **Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)** license.
123
+
124
+ You are free to use, share, and adapt the model for non-commercial purposes with attribution.
125
+
126
+ ---
127
+
128
+ ## πŸ“– Citation
129
+
130
+ If you use this model in your research, please cite:
131
+
132
+ ```bibtex
133
+ @misc{mwirelabs2025northeastner,
134
+ title = {NortheastNER: A Domain-Specific Named Entity Recognition Model for Northeast India},
135
+ author = {MWirelabs},
136
+ year = {2025},
137
+ publisher = {Hugging Face},
138
+ howpublished = {\url{https://huggingface.co/MWirelabs/NortheastNER}},
139
+ }
140
+ ```
141
 
142
  ---
143