Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,42 @@
|
|
| 1 |
---
|
| 2 |
-
license:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
language:
|
| 4 |
+
- en
|
| 5 |
+
tags:
|
| 6 |
+
- Token Classification
|
| 7 |
+
widget:
|
| 8 |
+
- text: "Monitored Natural Attenuation (MNA) and, if necessary as a contingency, In Situ Chemical Oxidation (ISCO) to address ISCO involves the injection of a strong chemical oxidant to chemically treat the before the ISCO contingency can be implemented at the spill site."
|
| 9 |
+
example_title: "example 1"
|
| 10 |
+
- text: "Site was identified as a potential source of groundwater contamination after the City performed Assessments were investigated further for potential contamination."
|
| 11 |
+
example_title: "example 2"
|
| 12 |
+
- text: "TCE releases from the UST is probably a major contributor to groundwater contamination in this area."
|
| 13 |
+
example_title: "example 3"
|
| 14 |
---
|
| 15 |
+
## About the Model
|
| 16 |
+
An Environmental Named Entity Recognition model, trained on dataset from USEPA to recognize environmental due diligence (7 entities) from a given text corpus (remediation reports, record of decision, 5 year record etc). This model was built on top of distilbert-base-uncased
|
| 17 |
+
|
| 18 |
+
- Dataset: https://data.mendeley.com/datasets/tx6vmd4g9p/4
|
| 19 |
+
- Dataset Reasearch Paper: https://doi.org/10.1016/j.dib.2022.108579
|
| 20 |
+
|
| 21 |
+
## Usage
|
| 22 |
+
The easiest way is to load the inference api from huggingface and second method is through the pipeline object offered by transformers library.
|
| 23 |
+
```python
|
| 24 |
+
|
| 25 |
+
# Use a pipeline as a high-level helper
|
| 26 |
+
from transformers import pipeline
|
| 27 |
+
pipe = pipeline("token-classification", model="d4data/EnviDueDiligence_NER")
|
| 28 |
+
|
| 29 |
+
# Load model directly
|
| 30 |
+
from transformers import AutoTokenizer, AutoModelForTokenClassification
|
| 31 |
+
tokenizer = AutoTokenizer.from_pretrained("d4data/EnviDueDiligence_NER")
|
| 32 |
+
model = AutoModelForTokenClassification.from_pretrained("d4data/EnviDueDiligence_NER")
|
| 33 |
+
|
| 34 |
+
```
|
| 35 |
+
|
| 36 |
+
## Author
|
| 37 |
+
This model is part of the Research topic "Environmental Due Diligence" conducted by Deepak John Reji, Afreen Aman. If you use this work (code, model or dataset), please cite:
|
| 38 |
+
> Aman, A. and Reji, D.J., 2022. EnvBert: An NLP model for Environmental Due Diligence data classification. Software Impacts, 14, p.100427.
|
| 39 |
+
|
| 40 |
+
## You can support me here :)
|
| 41 |
+
<a href="https://www.buymeacoffee.com/deepakjohnreji" target="_blank"><img src="https://cdn.buymeacoffee.com/buttons/v2/default-yellow.png" alt="Buy Me A Coffee" style="height: 60px !important;width: 217px !important;" ></a>
|
| 42 |
+
|