|
|
--- |
|
|
language: |
|
|
- en |
|
|
license: apache-2.0 |
|
|
tags: |
|
|
- biomedical |
|
|
- clinical |
|
|
- ul2 |
|
|
- t5 |
|
|
- encoder-decoder |
|
|
- pretraining |
|
|
- text2text-generation |
|
|
- medical |
|
|
--- |
|
|
|
|
|
# PubMedUL2 & MedUL2 |
|
|
|
|
|
## Model Description |
|
|
|
|
|
**PubMedUL2** and **MedUL2** are a family of **domain-specific UL2/T5-style encoder–decoder language models** pretrained on large-scale biomedical and medical corpora using the **UL2 (Mixture-of-Denoisers)** objective. |
|
|
|
|
|
- **PubMedUL2** models are pretrained on **25 million PubMed abstracts** |
|
|
- **MedUL2** models are pretrained on **PubMed abstracts + clinical notes + additional medical documents** |
|
|
- All models use a **T5-efficient architecture**, inspired by Google’s efficient T5 variants |
|
|
|
|
|
These checkpoints are **pretraining-only models** and **must be fine-tuned** before use on downstream tasks. |
|
|
|
|
|
--- |
|
|
|
|
|
## Pretraining Objective: UL2 (Mixture-of-Denoisers) |
|
|
|
|
|
These models were pretrained using **UL2**, a unified framework that formulates language modeling objectives as **denoising tasks**. |
|
|
|
|
|
UL2 introduces a **Mixture-of-Denoisers (MoD)** approach that samples from multiple denoising paradigms during pretraining. |
|
|
|
|
|
### Denoising Tasks |
|
|
|
|
|
UL2 pretraining uses a mixture of three denoising tasks: |
|
|
|
|
|
1. **R-denoising (Regular Span Corruption)** |
|
|
- Equivalent to standard T5 span corruption |
|
|
- Optimized for language understanding tasks |
|
|
|
|
|
2. **X-denoising (Extreme Span Corruption)** |
|
|
- Uses very large masked spans |
|
|
- Encourages long-form generation and abstraction |
|
|
|
|
|
3. **S-denoising (Sequential / PrefixLM)** |
|
|
- Prefix language modeling similar to causal LM |
|
|
- Suitable for sequence-to-sequence and generative tasks |
|
|
|
|
|
### Paradigm Tokens (Mode Switching) |
|
|
|
|
|
During pretraining, a **paradigm token** is inserted at the beginning of each input: |
|
|
|
|
|
| Token | Mode | Recommended Use | |
|
|
|------|------|------------------| |
|
|
| `[NLU]` | R-denoising | Classification, QA, retrieval | |
|
|
| `[NLG]` | X-denoising | Mixed understanding & generation | |
|
|
| `[S2S]` | S-denoising | Generative / causal tasks | |
|
|
|
|
|
**Important:** |
|
|
For best performance, the same token should be **prepended during fine-tuning and inference**. |
|
|
|
|
|
--- |
|
|
|
|
|
## Architecture |
|
|
|
|
|
- Encoder–decoder Transformer (T5-style) |
|
|
- Uses **T5-efficient architecture** |
|
|
- Compatible with Hugging Face `T5ForConditionalGeneration` |
|
|
|
|
|
--- |
|
|
|
|
|
## Intended Uses |
|
|
|
|
|
These models are intended to be **fine-tuned** for: |
|
|
|
|
|
- Biomedical and clinical **text classification** |
|
|
- **Question answering** |
|
|
- **Summarization** of medical literature or clinical notes |
|
|
- **Text generation** in medical contexts |
|
|
|
|
|
--- |
|
|
|
|
|
## Limitations |
|
|
|
|
|
- ❌ Not instruction-tuned |
|
|
- ❌ No supervised training |
|
|
- ❌ Not suitable for zero-shot use |
|
|
|
|
|
These checkpoints are **self-supervised pretraining models only** and require task-specific fine-tuning. |
|
|
|
|
|
--- |
|
|
|
|
|
## Fine-Tuning Recommendations |
|
|
|
|
|
- **Avoid mixed precision** (fp16 / bf16) initially |
|
|
- Fine-tuning is more stable in **fp32** |
|
|
- Always prepend one of `[NLU]`, `[NLG]`, or `[S2S]` to input text |
|
|
- Suggested defaults: |
|
|
- Classification / QA → `[NLU]` |
|
|
- Causal or generative tasks → `[S2S]` |
|
|
- Mixed tasks → `[NLG]` |
|
|
|
|
|
--- |
|
|
|
|
|
## Model Parameter Summary |
|
|
|
|
|
| Model Name | Parameter Count | Description | Access |
|
|
|-----------|----------------|------------|------------| |
|
|
| `pubmedul2-tiny-nl6` | **19.26M** | Tiny UL2-style model with 6 layers | Open |
|
|
| `pubmedul2-mini-nl8` | **50.12M** | Mini UL2 with 8 layers | Open |
|
|
| `pubmedul2-small` | **60.52M** | Small UL2 variant | Open |
|
|
| `pubmedul2-small-nl24` | **192.73M** | Small UL2 with 24 layers | Open |
|
|
| `medul2-base` | **222.93M** | Base UL2/T5-style model | Open |
|
|
| `pubmedul2-base` | **222.93M** | Base UL2/T5-style model | Open |
|
|
| `medul2-base-nl36` | **619.44M** | Base UL2 with 36 layers | Gated commercial |
|
|
| `pubmedul2-base-nl36` | **619.44M** | Base UL2 with 36 layers | Gated commercial |
|
|
| `medul2-large` | **737.72M** | Large UL2/T5-style model | Gated non-commercial |
|
|
| `pubmedul2-large` | **737.72M** | Large UL2/T5-style model | Gated non-commercial |
|
|
| `medul2-large-nl36` | **1090.14M** | Very large UL2 with 36 layers | Access on Request |
|
|
|
|
|
--- |
|
|
|
|
|
## Named Entity Recognition (NER) Evaluation |
|
|
|
|
|
We evaluate PubMedUL2 and MedUL2 models on a biomedical **Named Entity Recognition (NER)** task using multiple matching criteria to better capture boundary-level performance. |
|
|
|
|
|
The evaluation reports **entity-level F1 scores** across different biomedical entity types and model sizes. |
|
|
|
|
|
### Exact Match F1 |
|
|
|
|
|
An entity prediction is considered correct only if both the **entity span and label exactly match** the gold annotation. |
|
|
|
|
|
| entity_type | medul2-base | pubmedul2-base | pubmedul2-mini-nl8 | pubmedul2-small | pubmedul2-tiny-nl6 | |
|
|
|:--------------|--------------:|-----------------:|---------------------:|------------------:|---------------------:| |
|
|
| cell_line | 0.42 | 0.43 | 0.44 | 0.43 | 0.35 | |
|
|
| cell_type | 0.59 | 0.58 | 0.59 | 0.58 | 0.52 | |
|
|
| chemical | 0.76 | 0.75 | 0.72 | 0.72 | 0.56 | |
|
|
| disease | 0.7 | 0.73 | 0.7 | 0.68 | 0.63 | |
|
|
| dna | 0.59 | 0.55 | 0.54 | 0.55 | 0.45 | |
|
|
| gene | 0.62 | 0.59 | 0.6 | 0.59 | 0.55 | |
|
|
| protein | 0.59 | 0.58 | 0.58 | 0.59 | 0.55 | |
|
|
| rna | 0.6 | 0.56 | 0.55 | 0.6 | 0.56 | |
|
|
| species | 0.66 | 0.67 | 0.58 | 0.63 | 0.54 | |
|
|
|
|
|
--- |
|
|
|
|
|
### Partial Match F1 |
|
|
|
|
|
A prediction is counted as correct if it **partially overlaps** with a gold entity of the same type. |
|
|
|
|
|
| entity_type | medul2-base | pubmedul2-base | pubmedul2-mini-nl8 | pubmedul2-small | pubmedul2-tiny-nl6 | |
|
|
|:--------------|--------------:|-----------------:|---------------------:|------------------:|---------------------:| |
|
|
| cell_line | 0.48 | 0.49 | 0.48 | 0.48 | 0.41 | |
|
|
| cell_type | 0.66 | 0.64 | 0.66 | 0.65 | 0.59 | |
|
|
| chemical | 0.79 | 0.78 | 0.76 | 0.75 | 0.6 | |
|
|
| disease | 0.82 | 0.84 | 0.8 | 0.79 | 0.74 | |
|
|
| dna | 0.65 | 0.61 | 0.6 | 0.61 | 0.53 | |
|
|
| gene | 0.76 | 0.74 | 0.74 | 0.73 | 0.68 | |
|
|
| protein | 0.66 | 0.66 | 0.66 | 0.67 | 0.64 | |
|
|
| rna | 0.68 | 0.63 | 0.64 | 0.66 | 0.65 | |
|
|
| species | 0.68 | 0.7 | 0.61 | 0.65 | 0.56 | |
|
|
|
|
|
--- |
|
|
|
|
|
### IoU Match F1 |
|
|
|
|
|
Predictions are evaluated using **Intersection-over-Union (IoU)** overlap between predicted and gold spans, providing a softer boundary-based metric. |
|
|
|
|
|
| entity_type | medul2-base | pubmedul2-base | pubmedul2-mini-nl8 | pubmedul2-small | pubmedul2-tiny-nl6 | |
|
|
|:--------------|--------------:|-----------------:|---------------------:|------------------:|---------------------:| |
|
|
| cell_line | 0.5 | 0.5 | 0.5 | 0.5 | 0.42 | |
|
|
| cell_type | 0.67 | 0.66 | 0.68 | 0.67 | 0.62 | |
|
|
| chemical | 0.83 | 0.83 | 0.82 | 0.82 | 0.72 | |
|
|
| disease | 0.85 | 0.86 | 0.86 | 0.85 | 0.82 | |
|
|
| dna | 0.65 | 0.62 | 0.62 | 0.62 | 0.55 | |
|
|
| gene | 0.76 | 0.75 | 0.75 | 0.74 | 0.71 | |
|
|
| protein | 0.67 | 0.66 | 0.67 | 0.67 | 0.66 | |
|
|
| rna | 0.68 | 0.65 | 0.66 | 0.67 | 0.67 | |
|
|
| species | 0.72 | 0.74 | 0.65 | 0.69 | 0.58 | |
|
|
|
|
|
--- |
|
|
|
|
|
### Observations |
|
|
|
|
|
- **MedUL2 models** generally outperform PubMedUL2 on clinical-heavy entity types such as *disease* and *chemical* |
|
|
- Performance improves consistently from **tiny → base models** |
|
|
- Boundary-sensitive metrics (Partial / IoU) show significantly higher scores than Exact Match, highlighting boundary ambiguity in biomedical NER |
|
|
|
|
|
--- |
|
|
|
|
|
## Acknowledgements |
|
|
|
|
|
This project would not have been possible without compute generously provided by **Google TPU Research Cloud**. |
|
|
|
|
|
Thanks to: |
|
|
- The **Finnish-NLP** authors for releasing the UL2 objective code, task definitions, and guidance |
|
|
- **Yeb Havinga** for help getting started with the **t5x** framework |
|
|
|
|
|
--- |
|
|
|
|
|
## License |
|
|
|
|
|
Please refer to the individual model repositories for **license and access details**, which may vary depending on training data sources. |
|
|
|