Update README.md
Browse files
README.md
CHANGED
|
@@ -23,8 +23,8 @@ tags:
|
|
| 23 |
- indicnlp
|
| 24 |
---
|
| 25 |
|
| 26 |
-
# IndicNER
|
| 27 |
-
IndicNER is a model trained to complete the task of identifying named entities from sentences in Indian languages. Our model is specifically fine-tuned to the 11 Indian languages mentioned above over millions of sentences. The model is then benchmarked over a human annotated testset and multiple other publicly available Indian NER datasets.
|
| 28 |
The 11 languages covered by IndicNER are: Assamese, Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Oriya, Punjabi, Tamil, Telugu.
|
| 29 |
|
| 30 |
## Training Corpus
|
|
@@ -39,48 +39,9 @@ Update 20 Dec 2022: We released a new paper documenting IndicNER and Naamapadam.
|
|
| 39 |
|
| 40 |
You can use [this Colab notebook](https://colab.research.google.com/drive/1sYa-PDdZQ_c9SzUgnhyb3Fl7j96QBCS8?usp=sharing) for samples on using IndicNER or for finetuning a pre-trained model on Naampadam dataset to build your own NER models.
|
| 41 |
|
| 42 |
-
<!-- citing information -->
|
| 43 |
-
## Citing
|
| 44 |
-
|
| 45 |
-
If you are using IndicNER, please cite the following article:
|
| 46 |
-
```
|
| 47 |
-
@misc{mhaske2022naamapadam,
|
| 48 |
-
doi = {10.48550/ARXIV.2212.10168},
|
| 49 |
-
url = {https://arxiv.org/abs/2212.10168},
|
| 50 |
-
author = {Mhaske, Arnav and Kedia, Harshit and Doddapaneni, Sumanth and Khapra, Mitesh M. and Kumar, Pratyush and Murthy, Rudra and Kunchukuttan, Anoop},
|
| 51 |
-
title = {Naamapadam: A Large-Scale Named Entity Annotated Data for Indic Languages}
|
| 52 |
-
publisher = {arXiv},
|
| 53 |
-
year = {2022},
|
| 54 |
-
copyright = {arXiv.org perpetual, non-exclusive license}
|
| 55 |
-
}
|
| 56 |
-
|
| 57 |
-
```
|
| 58 |
-
We would like to hear from you if:
|
| 59 |
-
|
| 60 |
-
- You are using our resources. Please let us know how you are putting these resources to use.
|
| 61 |
-
- You have any feedback on these resources.
|
| 62 |
|
| 63 |
|
| 64 |
<!-- License -->
|
| 65 |
## License
|
| 66 |
|
| 67 |
-
The IndicNER code (and models) are released under the MIT License.
|
| 68 |
-
|
| 69 |
-
<!-- Contributors -->
|
| 70 |
-
## Contributors
|
| 71 |
-
- Arnav Mhaske <sub> ([AI4Bharat](https://ai4bharat.org), [IITM](https://www.iitm.ac.in)) </sub>
|
| 72 |
-
- Harshit Kedia <sub> ([AI4Bharat](https://ai4bharat.org), [IITM](https://www.iitm.ac.in)) </sub>
|
| 73 |
-
- Sumanth Doddapaneni <sub> ([AI4Bharat](https://ai4bharat.org), [IITM](https://www.iitm.ac.in)) </sub>
|
| 74 |
-
- Mitesh M. Khapra <sub> ([AI4Bharat](https://ai4bharat.org), [IITM](https://www.iitm.ac.in)) </sub>
|
| 75 |
-
- Pratyush Kumar <sub> ([AI4Bharat](https://ai4bharat.org), [Microsoft](https://www.microsoft.com/en-in/), [IITM](https://www.iitm.ac.in)) </sub>
|
| 76 |
-
- Rudra Murthy <sub> ([AI4Bharat](https://ai4bharat.org), [IBM](https://www.ibm.com))</sub>
|
| 77 |
-
- Anoop Kunchukuttan <sub> ([AI4Bharat](https://ai4bharat.org), [Microsoft](https://www.microsoft.com/en-in/), [IITM](https://www.iitm.ac.in)) </sub>
|
| 78 |
-
|
| 79 |
-
This work is the outcome of a volunteer effort as part of the [AI4Bharat initiative](https://ai4bharat.iitm.ac.in).
|
| 80 |
-
|
| 81 |
-
|
| 82 |
-
<!-- Contact -->
|
| 83 |
-
## Contact
|
| 84 |
-
- Anoop Kunchukuttan ([anoop.kunchukuttan@gmail.com](mailto:anoop.kunchukuttan@gmail.com))
|
| 85 |
-
- Rudra Murthy V ([rmurthyv@in.ibm.com](mailto:rmurthyv@in.ibm.com))
|
| 86 |
-
|
|
|
|
| 23 |
- indicnlp
|
| 24 |
---
|
| 25 |
|
| 26 |
+
# fine-tuned IndicNER
|
| 27 |
+
fine-tuned IndicNER is a model trained to complete the task of identifying named entities from sentences in Indian languages. Our model is specifically fine-tuned to the 11 Indian languages mentioned above over millions of sentences. The model is then benchmarked over a human annotated testset and multiple other publicly available Indian NER datasets.
|
| 28 |
The 11 languages covered by IndicNER are: Assamese, Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Oriya, Punjabi, Tamil, Telugu.
|
| 29 |
|
| 30 |
## Training Corpus
|
|
|
|
| 39 |
|
| 40 |
You can use [this Colab notebook](https://colab.research.google.com/drive/1sYa-PDdZQ_c9SzUgnhyb3Fl7j96QBCS8?usp=sharing) for samples on using IndicNER or for finetuning a pre-trained model on Naampadam dataset to build your own NER models.
|
| 41 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 42 |
|
| 43 |
|
| 44 |
<!-- License -->
|
| 45 |
## License
|
| 46 |
|
| 47 |
+
The fine-tuned-IndicNER code (and models) are released under the MIT License.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|