File size: 5,337 Bytes

---
language: "en"
tags:
  - biomedical
  - text-generation
  - BioGPT
  - fine-tuning
license: "cc-by-4.0"
datasets:
  - custom
metrics:
  - perplexity
  - loss
---

# TissueGPT: Fine-Tuned BioGPT for Tissue Engineering Text Generation

## Model Description
**TissueGPT** is a fine-tuned version of [BioGPT](https://huggingface.co/microsoft/BioGPT), specifically tailored for tissue engineering text generation tasks. By leveraging a dataset of biomedical research articles (titles, abstracts, and full texts), TissueGPT is designed to perform tasks such as:

- Summarizing biomedical literature
- Generating coherent biomedical text
- Assisting with scientific writing in life sciences
- Supporting research in tissue engineering, extracellular matrix (ECM) analysis, and related fields

---

## Training Details

### First Round of Training
The initial model was fine-tuned for **3 epochs**, focusing on general adaptation to the biomedical dataset.

#### Hyperparameters
- **Learning Rate**: 5e-5
- **Batch Size**: 8
- **Warmup Steps**: 500
- **Precision**: Mixed precision (`fp16`)
- **Weight Decay**: 0.01
- **Number of Epochs**: 3
- **Save Checkpoints**: Every 10,000 steps, keeping the last 3 checkpoints

#### Training and Validation Metrics
| Epoch | Training Loss | Validation Loss | Perplexity |
|-------|---------------|-----------------|------------|
| 1     | 2.4752        | 2.4286          | 11.34      |
| 2     | 2.3680        | 2.3708          | 10.70      |
| 3     | 2.2954        | 2.3410          | 10.39      |

---

### Second Round of Training
To further improve performance, the model was fine-tuned for **2 additional epochs** with adjusted hyperparameters.

#### Adjusted Hyperparameters
- **Learning Rate**: 3e-5 (reduced for finer updates)
- **Batch Size**: 64 (to utilize the GPU’s full memory)
- **Precision**: `bf16` (optimized for NVIDIA A100)
- **Save Checkpoints**: Every 20,000 steps

#### Training and Validation Metrics
| Epoch | Training Loss | Validation Loss | Perplexity |
|-------|---------------|-----------------|------------|
| 4     | 2.2396        | 2.2395          | 9.43       |
| 5     | 2.2328        | 2.2328          | 9.32       |

### Hardware Used
- **GPU**: NVIDIA A100 80GB
- **Framework**: PyTorch with Hugging Face Transformers library

---

## Evaluation Metrics

### Perplexity
Perplexity is a key metric for evaluating language models, measuring how well the model predicts sequences of text. Lower perplexity indicates better predictive performance.

- **First Round of Training**: Final perplexity = **10.39**
- **Second Round of Training**: Final perplexity = **9.32**

A lower perplexity indicates that the model generates more fluent and coherent text.

### Gradient Norms
- Tracked gradient stability during training.
- Observed Range: **1.05–1.32**, indicating stable training.

### Validation Loss
- Decreasing validation loss across both rounds suggests effective generalization to unseen data.

---

## Model Comparison

| Metric            | First Round | Second Round |
|--------------------|-------------|--------------|
| Final Validation Loss | 2.3410      | 2.2328       |
| Final Perplexity      | 10.39       | 9.32          |

**Key Insights**:
- Additional training epochs led to improved generalization and better predictive performance.
- Perplexity improved by approximately 10% in the second round, demonstrating enhanced text fluency and coherence.

---

## How to Use the Model

### Install Dependencies
Ensure you have `transformers` and `torch` installed:

```bash
pip install transformers torch
```
### Load the Model

``` python
from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "Saeed/TissueGPT"  # Replace with the uploaded repo name
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

input_text = "The extracellular matrix plays a critical role in tissue engineering because"
inputs = tokenizer(input_text, return_tensors="pt")

output = model.generate(**inputs, max_length=50)
print(tokenizer.decode(output[0], skip_special_tokens=True))
```

----------

## Intended Use

-   **Biomedical text generation and summarization**
-   **Assisting researchers, scientists, and medical professionals**
-   **Automated scientific writing** in domains like tissue engineering, and scaffold fabrication.

----------

## Limitations

-   The model is fine-tuned on biomedical literature and may not generalize well to non-biomedical domains.
-   Outputs should always be validated by experts for accuracy, especially in clinical or research-critical contexts.

----------

## Ethical Considerations

-   This model is intended for use in biomedical research and not for clinical diagnosis or patient care.
-   It may generate plausible-sounding but factually incorrect outputs (hallucinations). Always verify generated content.

----------

## Citation

If you use **TissueGPT**, please cite the following:

***The citation details will be provided shortly.***
## License

Licensed under the **CC BY 4.0** License.
## Contact

For questions, issues, or collaboration opportunities, feel free to reach out at:

-   **Name**: Saeed Rafieyan
- **Website**: Sraf.ir
-   **Email**: Raf.Biomed@gmail.com
-   **LinkedIn**: https://www.linkedin.com/in/saeed-rafieyan