Update README.md
Browse files
README.md
CHANGED
|
@@ -32,7 +32,6 @@ tags:
|
|
| 32 |
## Model Details
|
| 33 |
|
| 34 |
### Model Description
|
| 35 |
-
|
| 36 |
`netrias/cancer_harmonization_gpt2large` is a fine-tuned GPT2-Large model for harmonizing cancer-related biomedical terms. It converts variant input terms, such as synonyms, abbreviations, misspellings, and rephrasings, into standardized terms for consistent metadata curation.
|
| 37 |
|
| 38 |
- **Developed by:** [Netrias, LLC](https://www.netrias.com/)
|
|
@@ -42,7 +41,6 @@ tags:
|
|
| 42 |
- **Fine-tuned from model:** [openai-community/gpt2-large](https://huggingface.co/openai-community/gpt2-large)
|
| 43 |
|
| 44 |
### Model Sources
|
| 45 |
-
|
| 46 |
- **Repository:** [netrias/cancer_harmonization_gpt2large](https://huggingface.co/netrias/cancer_harmonization_gpt2large)
|
| 47 |
- **Paper:** [Metadata Harmonization from Biological Datasets with Language Models](https://doi.org/10.1101/2025.01.15.633281)
|
| 48 |
|
|
@@ -64,7 +62,6 @@ This model was trained on a narrow domain, limiting its applicability to other b
|
|
| 64 |
Limit use to the supported domain, and validate outputs before applying them in downstream applications.
|
| 65 |
|
| 66 |
## How to Get Started with the Model
|
| 67 |
-
|
| 68 |
Prompt the model using a structured sentence of the form: `The standardized form of "your input term" is "`. It returns the most likely standardized term, followed by a closing quote. The example below uses Hugging Face's `pipeline` to generate the top 5 completions using beam search:
|
| 69 |
|
| 70 |
```python
|
|
@@ -151,7 +148,6 @@ Results are disaggregated by dictionary inclusion: whether the gold standard app
|
|
| 151 |
High performance on the validation and ID test sets indicates effective learning of known representations. Lower performance on OOD terms suggests reduced generalization to unseen standards and highlights the importance of human review for unfamiliar inputs.
|
| 152 |
|
| 153 |
## Environmental Impact
|
| 154 |
-
|
| 155 |
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
|
| 156 |
|
| 157 |
- **Hardware Type:** NVIDIA A10G
|
|
|
|
| 32 |
## Model Details
|
| 33 |
|
| 34 |
### Model Description
|
|
|
|
| 35 |
`netrias/cancer_harmonization_gpt2large` is a fine-tuned GPT2-Large model for harmonizing cancer-related biomedical terms. It converts variant input terms, such as synonyms, abbreviations, misspellings, and rephrasings, into standardized terms for consistent metadata curation.
|
| 36 |
|
| 37 |
- **Developed by:** [Netrias, LLC](https://www.netrias.com/)
|
|
|
|
| 41 |
- **Fine-tuned from model:** [openai-community/gpt2-large](https://huggingface.co/openai-community/gpt2-large)
|
| 42 |
|
| 43 |
### Model Sources
|
|
|
|
| 44 |
- **Repository:** [netrias/cancer_harmonization_gpt2large](https://huggingface.co/netrias/cancer_harmonization_gpt2large)
|
| 45 |
- **Paper:** [Metadata Harmonization from Biological Datasets with Language Models](https://doi.org/10.1101/2025.01.15.633281)
|
| 46 |
|
|
|
|
| 62 |
Limit use to the supported domain, and validate outputs before applying them in downstream applications.
|
| 63 |
|
| 64 |
## How to Get Started with the Model
|
|
|
|
| 65 |
Prompt the model using a structured sentence of the form: `The standardized form of "your input term" is "`. It returns the most likely standardized term, followed by a closing quote. The example below uses Hugging Face's `pipeline` to generate the top 5 completions using beam search:
|
| 66 |
|
| 67 |
```python
|
|
|
|
| 148 |
High performance on the validation and ID test sets indicates effective learning of known representations. Lower performance on OOD terms suggests reduced generalization to unseen standards and highlights the importance of human review for unfamiliar inputs.
|
| 149 |
|
| 150 |
## Environmental Impact
|
|
|
|
| 151 |
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
|
| 152 |
|
| 153 |
- **Hardware Type:** NVIDIA A10G
|