Update README.md
Browse files
README.md
CHANGED
|
@@ -1,124 +1,32 @@
|
|
| 1 |
-
<<<<<<< HEAD
|
| 2 |
---
|
| 3 |
license: apache-2.0
|
| 4 |
language:
|
| 5 |
- en
|
| 6 |
-
|
| 7 |
-
|
|
|
|
| 8 |
tags:
|
| 9 |
-
-
|
| 10 |
-
-
|
| 11 |
-
- biology
|
| 12 |
-
datasets:
|
| 13 |
-
- nasa-impact/nasa-smd-IR-benchmark
|
| 14 |
-
- nasa-impact/nasa-smd-qa-benchmark
|
| 15 |
-
- ibm/Climate-Change-NER
|
| 16 |
---
|
| 17 |
|
| 18 |
-
#
|
| 19 |
-
|
| 20 |
-
|
| 21 |
|
| 22 |
## Model Details
|
| 23 |
- **Base Model**: RoBERTa
|
| 24 |
- **Tokenizer**: Custom
|
| 25 |
- **Parameters**: 125M
|
| 26 |
-
-
|
| 27 |
-
- **Distilled Version**: You can download a distilled version of the model (30 Million Parameters) here: https://huggingface.co/nasa-impact/nasa-smd-ibm-distil-v0.1
|
| 28 |
-
|
| 29 |
## Training Data
|
| 30 |
-
-
|
| 31 |
-
-
|
| 32 |
-
- AMS Publications
|
| 33 |
-
- Scientific papers from Astrophysics Data Systems (ADS)
|
| 34 |
-
- PubMed abstracts
|
| 35 |
-
- PubMedCentral (PMC) (commercial license subset)
|
| 36 |
-
|
| 37 |
-

|
| 38 |
-
|
| 39 |
-
## Training Procedure
|
| 40 |
-
- **Framework**: fairseq 0.12.1 with PyTorch 1.9.1
|
| 41 |
-
- **transformers Version**: 4.2.0
|
| 42 |
-
- **Strategy**: Masked Language Modeling (MLM)
|
| 43 |
-
|
| 44 |
-
## Evaluation
|
| 45 |
-
- BLURB Benchmark
|
| 46 |
-
- Pruned SQuAD2.0 (SQ2) Benchmark (Amazon Rainforest, Oxygen, Geology and NASA ES QAs)
|
| 47 |
-
- NASA SMD Expert QA Benchmark (WIP)
|
| 48 |
-
|
| 49 |
-
|
| 50 |
-

|
| 51 |
-
|
| 52 |
-

|
| 53 |
-
|
| 54 |
-
Please refer to the following dataset cards for further benchmarks and evaluation
|
| 55 |
-
- NASA IR Benchmark - https://huggingface.co/datasets/nasa-impact/nasa-smd-IR-benchmark
|
| 56 |
-
- NASA SMD Expert QA Benchmark - https://huggingface.co/datasets/nasa-impact/nasa-smd-qa-benchmark
|
| 57 |
-
- Climate CHange Benchmark - https://huggingface.co/datasets/ibm/Climate-Change-NER
|
| 58 |
-
|
| 59 |
-
## Uses
|
| 60 |
-
- Named Entity Recognition (NER)
|
| 61 |
-
- Information Retrieval
|
| 62 |
-
- Sentence Transformers
|
| 63 |
-
- Extractive QA
|
| 64 |
-
|
| 65 |
-
For NASA SMD related, scientific usecases.
|
| 66 |
-
|
| 67 |
-
## Note
|
| 68 |
-
|
| 69 |
-
Accompanying paper can be found here: https://arxiv.org/abs/2405.10725
|
| 70 |
-
|
| 71 |
-
|
| 72 |
-
## Citation
|
| 73 |
-
If you find this work useful, please cite using the following bibtex citation:
|
| 74 |
-
|
| 75 |
-
```bibtex
|
| 76 |
-
@misc {nasa-impact_2023,
|
| 77 |
-
author = {Masayasu Maraoka and Bishwaranjan Bhattacharjee and Muthukumaran Ramasubramanian and Ikhsa Gurung and Rahul Ramachandran and Manil Maskey and Kaylin Bugbee and Rong Zhang and Yousef El Kurdi and Bharath Dandala and Mike Little and Elizabeth Fancher and Lauren Sanders and Sylvain Costes and Sergi Blanco-Cuaresma and Kelly Lockhart and Thomas Allen and Felix Grazes and Megan Ansdell and Alberto Accomazzi and Sanaz Vahidinia and Ryan McGranaghan and Armin Mehrabian and Tsendgar Lee},
|
| 78 |
-
title = { nasa-smd-ibm-v0.1 (Revision f01d42f) },
|
| 79 |
-
year = 2023,
|
| 80 |
-
url = { https://huggingface.co/nasa-impact/nasa-smd-ibm-v0.1 },
|
| 81 |
-
doi = { 10.57967/hf/1429 },
|
| 82 |
-
publisher = { Hugging Face }
|
| 83 |
-
}
|
| 84 |
-
|
| 85 |
-
```
|
| 86 |
-
|
| 87 |
-
## Attribution
|
| 88 |
|
| 89 |
-
IBM Research
|
| 90 |
-
- Masayasu Muraoka
|
| 91 |
-
- Bishwaranjan Bhattacharjee
|
| 92 |
-
- Rong Zhang
|
| 93 |
-
- Yousef El Kurdi
|
| 94 |
-
- Bharath Dandala
|
| 95 |
|
| 96 |
-
|
| 97 |
-
- Muthukumaran Ramasubramanian
|
| 98 |
-
- Iksha Gurung
|
| 99 |
-
- Rahul Ramachandran
|
| 100 |
-
- Manil Maskey
|
| 101 |
-
- Kaylin Bugbee
|
| 102 |
-
- Mike Little
|
| 103 |
-
- Elizabeth Fancher
|
| 104 |
-
- Lauren Sanders
|
| 105 |
-
- Sylvain Costes
|
| 106 |
-
- Sergi Blanco-Cuaresma
|
| 107 |
-
- Kelly Lockhart
|
| 108 |
-
- Thomas Allen
|
| 109 |
-
- Felix Grazes
|
| 110 |
-
- Megan Ansdell
|
| 111 |
-
- Alberto Accomazzi
|
| 112 |
-
- Sanaz Vahidinia
|
| 113 |
-
- Ryan McGranaghan
|
| 114 |
-
- Armin Mehrabian
|
| 115 |
-
- Tsendgar Lee
|
| 116 |
|
| 117 |
-
##
|
|
|
|
| 118 |
|
| 119 |
-
|
| 120 |
-
=======
|
| 121 |
-
---
|
| 122 |
-
license: mit
|
| 123 |
-
---
|
| 124 |
-
>>>>>>> 7a770c80b4a3414639536260229365f67ac0ea54
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
language:
|
| 4 |
- en
|
| 5 |
+
base_model:
|
| 6 |
+
- nasa-impact/nasa-smd-ibm-v0.1
|
| 7 |
+
pipeline_tag: token-classification
|
| 8 |
tags:
|
| 9 |
+
- astronomy
|
| 10 |
+
- uat
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
---
|
| 12 |
|
| 13 |
+
# INDUS - UAT Labeler
|
| 14 |
+
Indus-UAT-Labeler (nasa-smd-ibm-v0.1_UAT_Labeler) is a RoBERTa-based, Encoder-only transformer model, domain-adapted for NASA Science Mission Directorate (SMD) applications. It's fine-tuned on scientific journals and articles relevant to NASA SMD, aiming to enhance natural language technologies like information retrieval and intelligent search.
|
| 15 |
+
This specific fork was finetuned on SciX Digital Library (https://scixplorer.org/, formerly NASA-ADS) proprietary data to label text with UAT labels (https://astrothesaurus.org/)
|
| 16 |
|
| 17 |
## Model Details
|
| 18 |
- **Base Model**: RoBERTa
|
| 19 |
- **Tokenizer**: Custom
|
| 20 |
- **Parameters**: 125M
|
| 21 |
+
-
|
|
|
|
|
|
|
| 22 |
## Training Data
|
| 23 |
+
- 18K titles, abstracts, body and ackownledgments from recent, quality astronomy papers
|
| 24 |
+
- approximately 217M tokens
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 25 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 26 |
|
| 27 |
+
<!-- ## Note -->
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 28 |
|
| 29 |
+
<!-- ## Citation -->
|
| 30 |
+
<!-- If you find this work useful, please cite using the following bibtex citation: -->
|
| 31 |
|
| 32 |
+
<!-- ## Disclaimer -->
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|