File size: 9,607 Bytes

---
language:
- en
license: apache-2.0
tags:
- biomedical
- clinical
- ul2
- t5
- encoder-decoder
- pretraining
- text2text-generation
- medical
---

# PubMedUL2 & MedUL2

## Model Description

**PubMedUL2** and **MedUL2** are a family of **domain-specific UL2/T5-style encoder–decoder language models** pretrained on large-scale biomedical and medical corpora using the **UL2 (Mixture-of-Denoisers)** objective.

- **PubMedUL2** models are pretrained on **25 million PubMed abstracts**
- **MedUL2** models are pretrained on **PubMed abstracts + clinical notes + additional medical documents**
- All models use a **T5-efficient architecture**, inspired by Google’s efficient T5 variants

These checkpoints are **pretraining-only models** and **must be fine-tuned** before use on downstream tasks.

---

## Pretraining Objective: UL2 (Mixture-of-Denoisers)

These models were pretrained using **UL2**, a unified framework that formulates language modeling objectives as **denoising tasks**.

UL2 introduces a **Mixture-of-Denoisers (MoD)** approach that samples from multiple denoising paradigms during pretraining.

### Denoising Tasks

UL2 pretraining uses a mixture of three denoising tasks:

1. **R-denoising (Regular Span Corruption)**  
   - Equivalent to standard T5 span corruption  
   - Optimized for language understanding tasks

2. **X-denoising (Extreme Span Corruption)**  
   - Uses very large masked spans  
   - Encourages long-form generation and abstraction

3. **S-denoising (Sequential / PrefixLM)**  
   - Prefix language modeling similar to causal LM  
   - Suitable for sequence-to-sequence and generative tasks

### Paradigm Tokens (Mode Switching)

During pretraining, a **paradigm token** is inserted at the beginning of each input:

| Token  | Mode | Recommended Use |
|------|------|------------------|
| `[NLU]` | R-denoising | Classification, QA, retrieval |
| `[NLG]` | X-denoising | Mixed understanding & generation |
| `[S2S]` | S-denoising | Generative / causal tasks |

**Important:**  
For best performance, the same token should be **prepended during fine-tuning and inference**.

---

## Architecture

- Encoder–decoder Transformer (T5-style)
- Uses **T5-efficient architecture**
- Compatible with Hugging Face `T5ForConditionalGeneration`

---

## Intended Uses

These models are intended to be **fine-tuned** for:

- Biomedical and clinical **text classification**
- **Question answering**
- **Summarization** of medical literature or clinical notes
- **Text generation** in medical contexts

---

## Limitations

- ❌ Not instruction-tuned  
- ❌ No supervised training  
- ❌ Not suitable for zero-shot use  

These checkpoints are **self-supervised pretraining models only** and require task-specific fine-tuning.

---

## Fine-Tuning Recommendations

- **Avoid mixed precision** (fp16 / bf16) initially  
  - Fine-tuning is more stable in **fp32**
- Always prepend one of `[NLU]`, `[NLG]`, or `[S2S]` to input text
- Suggested defaults:
  - Classification / QA → `[NLU]`
  - Causal or generative tasks → `[S2S]`
  - Mixed tasks → `[NLG]`

---

## Model Parameter Summary

| Model Name | Parameter Count | Description | Access
|-----------|----------------|------------|------------|
| `pubmedul2-tiny-nl6` | **19.26M** | Tiny UL2-style model with 6 layers | Open
| `pubmedul2-mini-nl8` | **50.12M** | Mini UL2 with 8 layers | Open
| `pubmedul2-small` | **60.52M** | Small UL2 variant | Open
| `pubmedul2-small-nl24` | **192.73M** | Small UL2 with 24 layers | Open
| `medul2-base` | **222.93M** | Base UL2/T5-style model | Open
| `pubmedul2-base` | **222.93M** | Base UL2/T5-style model | Open
| `medul2-base-nl36` | **619.44M** | Base UL2 with 36 layers | Gated commercial
| `pubmedul2-base-nl36` | **619.44M** | Base UL2 with 36 layers | Gated commercial
| `medul2-large` | **737.72M** | Large UL2/T5-style model | Gated non-commercial
| `pubmedul2-large` | **737.72M** | Large UL2/T5-style model | Gated non-commercial
| `medul2-large-nl36` | **1090.14M** | Very large UL2 with 36 layers | Access on Request

---

## Named Entity Recognition (NER) Evaluation

We evaluate PubMedUL2 and MedUL2 models on a biomedical **Named Entity Recognition (NER)** task using multiple matching criteria to better capture boundary-level performance.

The evaluation reports **entity-level F1 scores** across different biomedical entity types and model sizes.

### Exact Match F1

An entity prediction is considered correct only if both the **entity span and label exactly match** the gold annotation.

| entity_type   |   medul2-base |   pubmedul2-base |   pubmedul2-mini-nl8 |   pubmedul2-small |   pubmedul2-tiny-nl6 |
|:--------------|--------------:|-----------------:|---------------------:|------------------:|---------------------:|
| cell_line     |          0.42 |             0.43 |                 0.44 |              0.43 |                 0.35 |
| cell_type     |          0.59 |             0.58 |                 0.59 |              0.58 |                 0.52 |
| chemical      |          0.76 |             0.75 |                 0.72 |              0.72 |                 0.56 |
| disease       |          0.7  |             0.73 |                 0.7  |              0.68 |                 0.63 |
| dna           |          0.59 |             0.55 |                 0.54 |              0.55 |                 0.45 |
| gene          |          0.62 |             0.59 |                 0.6  |              0.59 |                 0.55 |
| protein       |          0.59 |             0.58 |                 0.58 |              0.59 |                 0.55 |
| rna           |          0.6  |             0.56 |                 0.55 |              0.6  |                 0.56 |
| species       |          0.66 |             0.67 |                 0.58 |              0.63 |                 0.54 |

---

### Partial Match F1

A prediction is counted as correct if it **partially overlaps** with a gold entity of the same type.

| entity_type   |   medul2-base |   pubmedul2-base |   pubmedul2-mini-nl8 |   pubmedul2-small |   pubmedul2-tiny-nl6 |
|:--------------|--------------:|-----------------:|---------------------:|------------------:|---------------------:|
| cell_line     |          0.48 |             0.49 |                 0.48 |              0.48 |                 0.41 |
| cell_type     |          0.66 |             0.64 |                 0.66 |              0.65 |                 0.59 |
| chemical      |          0.79 |             0.78 |                 0.76 |              0.75 |                 0.6  |
| disease       |          0.82 |             0.84 |                 0.8  |              0.79 |                 0.74 |
| dna           |          0.65 |             0.61 |                 0.6  |              0.61 |                 0.53 |
| gene          |          0.76 |             0.74 |                 0.74 |              0.73 |                 0.68 |
| protein       |          0.66 |             0.66 |                 0.66 |              0.67 |                 0.64 |
| rna           |          0.68 |             0.63 |                 0.64 |              0.66 |                 0.65 |
| species       |          0.68 |             0.7  |                 0.61 |              0.65 |                 0.56 |

---

### IoU Match F1

Predictions are evaluated using **Intersection-over-Union (IoU)** overlap between predicted and gold spans, providing a softer boundary-based metric.

| entity_type   |   medul2-base |   pubmedul2-base |   pubmedul2-mini-nl8 |   pubmedul2-small |   pubmedul2-tiny-nl6 |
|:--------------|--------------:|-----------------:|---------------------:|------------------:|---------------------:|
| cell_line     |          0.5  |             0.5  |                 0.5  |              0.5  |                 0.42 |
| cell_type     |          0.67 |             0.66 |                 0.68 |              0.67 |                 0.62 |
| chemical      |          0.83 |             0.83 |                 0.82 |              0.82 |                 0.72 |
| disease       |          0.85 |             0.86 |                 0.86 |              0.85 |                 0.82 |
| dna           |          0.65 |             0.62 |                 0.62 |              0.62 |                 0.55 |
| gene          |          0.76 |             0.75 |                 0.75 |              0.74 |                 0.71 |
| protein       |          0.67 |             0.66 |                 0.67 |              0.67 |                 0.66 |
| rna           |          0.68 |             0.65 |                 0.66 |              0.67 |                 0.67 |
| species       |          0.72 |             0.74 |                 0.65 |              0.69 |                 0.58 |

---

### Observations

- **MedUL2 models** generally outperform PubMedUL2 on clinical-heavy entity types such as *disease* and *chemical*
- Performance improves consistently from **tiny → base models**
- Boundary-sensitive metrics (Partial / IoU) show significantly higher scores than Exact Match, highlighting boundary ambiguity in biomedical NER

--- 

## Acknowledgements

This project would not have been possible without compute generously provided by **Google TPU Research Cloud**.

Thanks to:
- The **Finnish-NLP** authors for releasing the UL2 objective code, task definitions, and guidance
- **Yeb Havinga** for help getting started with the **t5x** framework

---

## License

Please refer to the individual model repositories for **license and access details**, which may vary depending on training data sources.