fgrezes commited on
Commit
15c2649
·
verified ·
1 Parent(s): 926616a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -107
README.md CHANGED
@@ -1,124 +1,32 @@
1
- <<<<<<< HEAD
2
  ---
3
  license: apache-2.0
4
  language:
5
  - en
6
- library_name: transformers
7
- pipeline_tag: fill-mask
 
8
  tags:
9
- - earth science
10
- - climate
11
- - biology
12
- datasets:
13
- - nasa-impact/nasa-smd-IR-benchmark
14
- - nasa-impact/nasa-smd-qa-benchmark
15
- - ibm/Climate-Change-NER
16
  ---
17
 
18
- # Model Card for nasa-smd-ibm-v0.1 (Indus)
19
-
20
- nasa-smd-ibm-v0.1 (Currently named as Indus) is a RoBERTa-based, Encoder-only transformer model, domain-adapted for NASA Science Mission Directorate (SMD) applications. It's fine-tuned on scientific journals and articles relevant to NASA SMD, aiming to enhance natural language technologies like information retrieval and intelligent search.
21
 
22
  ## Model Details
23
  - **Base Model**: RoBERTa
24
  - **Tokenizer**: Custom
25
  - **Parameters**: 125M
26
- - **Pretraining Strategy**: Masked Language Modeling (MLM)
27
- - **Distilled Version**: You can download a distilled version of the model (30 Million Parameters) here: https://huggingface.co/nasa-impact/nasa-smd-ibm-distil-v0.1
28
-
29
  ## Training Data
30
- - Wikipedia English (Feb 1, 2020)
31
- - AGU Publications
32
- - AMS Publications
33
- - Scientific papers from Astrophysics Data Systems (ADS)
34
- - PubMed abstracts
35
- - PubMedCentral (PMC) (commercial license subset)
36
-
37
- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/61099e5d86580d4580767226/H0-q9N7IwXQqLdEaCCgm-.png)
38
-
39
- ## Training Procedure
40
- - **Framework**: fairseq 0.12.1 with PyTorch 1.9.1
41
- - **transformers Version**: 4.2.0
42
- - **Strategy**: Masked Language Modeling (MLM)
43
-
44
- ## Evaluation
45
- - BLURB Benchmark
46
- - Pruned SQuAD2.0 (SQ2) Benchmark (Amazon Rainforest, Oxygen, Geology and NASA ES QAs)
47
- - NASA SMD Expert QA Benchmark (WIP)
48
-
49
-
50
- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/61099e5d86580d4580767226/EtCC3U_tMCv3bfLqQdqQm.png)
51
-
52
- ![Pruned SQ2 Benchmark](https://cdn-uploads.huggingface.co/production/uploads/61099e5d86580d4580767226/ruh6-IyiNlUiK21Ej4lDM.png)
53
-
54
- Please refer to the following dataset cards for further benchmarks and evaluation
55
- - NASA IR Benchmark - https://huggingface.co/datasets/nasa-impact/nasa-smd-IR-benchmark
56
- - NASA SMD Expert QA Benchmark - https://huggingface.co/datasets/nasa-impact/nasa-smd-qa-benchmark
57
- - Climate CHange Benchmark - https://huggingface.co/datasets/ibm/Climate-Change-NER
58
-
59
- ## Uses
60
- - Named Entity Recognition (NER)
61
- - Information Retrieval
62
- - Sentence Transformers
63
- - Extractive QA
64
-
65
- For NASA SMD related, scientific usecases.
66
-
67
- ## Note
68
-
69
- Accompanying paper can be found here: https://arxiv.org/abs/2405.10725
70
-
71
-
72
- ## Citation
73
- If you find this work useful, please cite using the following bibtex citation:
74
-
75
- ```bibtex
76
- @misc {nasa-impact_2023,
77
- author = {Masayasu Maraoka and Bishwaranjan Bhattacharjee and Muthukumaran Ramasubramanian and Ikhsa Gurung and Rahul Ramachandran and Manil Maskey and Kaylin Bugbee and Rong Zhang and Yousef El Kurdi and Bharath Dandala and Mike Little and Elizabeth Fancher and Lauren Sanders and Sylvain Costes and Sergi Blanco-Cuaresma and Kelly Lockhart and Thomas Allen and Felix Grazes and Megan Ansdell and Alberto Accomazzi and Sanaz Vahidinia and Ryan McGranaghan and Armin Mehrabian and Tsendgar Lee},
78
- title = { nasa-smd-ibm-v0.1 (Revision f01d42f) },
79
- year = 2023,
80
- url = { https://huggingface.co/nasa-impact/nasa-smd-ibm-v0.1 },
81
- doi = { 10.57967/hf/1429 },
82
- publisher = { Hugging Face }
83
- }
84
-
85
- ```
86
-
87
- ## Attribution
88
 
89
- IBM Research
90
- - Masayasu Muraoka
91
- - Bishwaranjan Bhattacharjee
92
- - Rong Zhang
93
- - Yousef El Kurdi
94
- - Bharath Dandala
95
 
96
- NASA SMD
97
- - Muthukumaran Ramasubramanian
98
- - Iksha Gurung
99
- - Rahul Ramachandran
100
- - Manil Maskey
101
- - Kaylin Bugbee
102
- - Mike Little
103
- - Elizabeth Fancher
104
- - Lauren Sanders
105
- - Sylvain Costes
106
- - Sergi Blanco-Cuaresma
107
- - Kelly Lockhart
108
- - Thomas Allen
109
- - Felix Grazes
110
- - Megan Ansdell
111
- - Alberto Accomazzi
112
- - Sanaz Vahidinia
113
- - Ryan McGranaghan
114
- - Armin Mehrabian
115
- - Tsendgar Lee
116
 
117
- ## Disclaimer
 
118
 
119
- This Encoder-only model is currently in an experimental phase. We are working to improve the model's capabilities and performance, and as we progress, we invite the community to engage with this model, provide feedback, and contribute to its evolution.
120
- =======
121
- ---
122
- license: mit
123
- ---
124
- >>>>>>> 7a770c80b4a3414639536260229365f67ac0ea54
 
 
1
  ---
2
  license: apache-2.0
3
  language:
4
  - en
5
+ base_model:
6
+ - nasa-impact/nasa-smd-ibm-v0.1
7
+ pipeline_tag: token-classification
8
  tags:
9
+ - astronomy
10
+ - uat
 
 
 
 
 
11
  ---
12
 
13
+ # INDUS - UAT Labeler
14
+ Indus-UAT-Labeler (nasa-smd-ibm-v0.1_UAT_Labeler) is a RoBERTa-based, Encoder-only transformer model, domain-adapted for NASA Science Mission Directorate (SMD) applications. It's fine-tuned on scientific journals and articles relevant to NASA SMD, aiming to enhance natural language technologies like information retrieval and intelligent search.
15
+ This specific fork was finetuned on SciX Digital Library (https://scixplorer.org/, formerly NASA-ADS) proprietary data to label text with UAT labels (https://astrothesaurus.org/)
16
 
17
  ## Model Details
18
  - **Base Model**: RoBERTa
19
  - **Tokenizer**: Custom
20
  - **Parameters**: 125M
21
+ -
 
 
22
  ## Training Data
23
+ - 18K titles, abstracts, body and ackownledgments from recent, quality astronomy papers
24
+ - approximately 217M tokens
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
25
 
 
 
 
 
 
 
26
 
27
+ <!-- ## Note -->
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
 
29
+ <!-- ## Citation -->
30
+ <!-- If you find this work useful, please cite using the following bibtex citation: -->
31
 
32
+ <!-- ## Disclaimer -->