File size: 9,702 Bytes
17c6d62 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 |
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
β οΈ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# BioGPT [[biogpt]]
## κ°μ [[overview]]
BioGPTλ Renqian Luo, Liai Sun, Yingce Xia, Tao Qin, Sheng Zhang, Hoifung Poon, Tie-Yan Liuμ μν΄ [BioGPT: generative pre-trained transformer for biomedical text generation and mining](https://academic.oup.com/bib/advance-article/doi/10.1093/bib/bbac409/6713511?guestAccessKey=a66d9b5d-4f83-4017-bb52-405815c907b9) μμ μ μλ λͺ¨λΈμ
λλ€. BioGPTλ μλ¬Όμν ν
μ€νΈ μμ±κ³Ό λ§μ΄λμ μν΄ λλ©μΈμ νΉνλ μμ±ν μ¬μ νμ΅ νΈλμ€ν¬λ¨Έ μΈμ΄ λͺ¨λΈμ
λλ€. BioGPTλ νΈλμ€ν¬λ¨Έ μΈμ΄ λͺ¨λΈ ꡬ쑰λ₯Ό λ°λ₯΄λ©°, 1,500λ§ κ°μ PubMed μ΄λ‘μ μ΄μ©ν΄ μ²μλΆν° νμ΅λμμ΅λλ€.
λ
Όλ¬Έμ μ΄λ‘μ λ€μκ³Ό κ°μ΅λλ€:
*μλ¬Όμν λΆμΌμμ μ¬μ νμ΅λ μΈμ΄ λͺ¨λΈμ μΌλ° μμ°μ΄ μ²λ¦¬ λΆμΌμμμ μ±κ³΅μ μκ°μ λ°μ μ μ λ λ§μ μ£Όλͺ©μ λ°κ³ μμ΅λλ€. μΌλ° μΈμ΄ λΆμΌμμ μ¬μ νμ΅λ μΈμ΄ λͺ¨λΈμ λ κ°μ§ μ£Όμ κ³ν΅μΈ BERT(λ° κ·Έ λ³ν)μ GPT(λ° κ·Έ λ³ν) μ€ μ²« λ²μ§Έλ μλ¬Όμν λΆμΌμμ BioBERTμ PubMedBERTμ κ°μ΄ κ΄λ²μνκ² μ°κ΅¬λμμ΅λλ€. μ΄λ€μ λ€μν λΆλ₯ κΈ°λ°μ μλ¬Όμν μμ
μμ ν° μ±κ³΅μ κ±°λμμ§λ§, μμ± λ₯λ ₯μ λΆμ‘±μ κ·Έλ€μ μ μ© λ²μλ₯Ό μ ννμ΅λλ€. λ³Έ λ
Όλ¬Έμμλ λκ·λͺ¨ μλ¬Όμν λ¬Ένμ μ¬μ νμ΅ν λλ©μΈ νΉν μμ±ν νΈλμ€ν¬λ¨Έ μΈμ΄ λͺ¨λΈμΈ BioGPTλ₯Ό μ μν©λλ€. μ°λ¦¬λ 6κ°μ μλ¬Όμν μμ°μ΄ μ²λ¦¬ μμ
μμ BioGPTλ₯Ό νκ°ν κ²°κ³Ό, λλΆλΆμ μμ
μμ μ΄μ λͺ¨λΈλ³΄λ€ μ°μν μ±λ₯μ 보μμ΅λλ€. νΉν, BC5CDR, KD-DTI, DDI μλ-ν¬-μλ κ΄κ³ μΆμΆ μμ
μμ κ°κ° 44.98%, 38.42%, 40.76%μ F1 μ μλ₯Ό κΈ°λ‘νμμΌλ©°, PubMedQAμμ 78.2%μ μ νλλ₯Ό λ¬μ±ν΄ μλ‘μ΄ κΈ°λ‘μ μΈμ μ΅λλ€. λν ν
μ€νΈ μμ±μ λν μ¬λ‘ μ°κ΅¬λ μλ¬Όμν μ©μ΄μ λν μ μ°½ν μ€λͺ
μ μμ±νλ λ° μμ΄ BioGPTμ μ₯μ μ λμ± μ
μ¦νμ΅λλ€.*
μ΄ λͺ¨λΈμ [kamalkraj](https://huggingface.co/kamalkraj)μ μν΄ κΈ°μ¬λμμ΅λλ€. μλ³Έ μ½λλ [μ¬κΈ°](https://github.com/microsoft/BioGPT)μμ μ°Ύμ μ μμ΅λλ€.
## μ¬μ© ν [[usage-tips]]
- BioGPTλ μ λμ μμΉ μλ² λ©(absolute position embedding)μ μ¬μ©νλ―λ‘, μ
λ ₯μ μΌμͺ½μ΄ μλ μ€λ₯Έμͺ½μμ ν¨λ©νλ κ²μ΄ κΆμ₯λ©λλ€.
- BioGPTλ μΈκ³Όμ μΈμ΄ λͺ¨λΈλ§(Casual Langague Modeling, CLM) λͺ©νλ‘ νμ΅λμκΈ° λλ¬Έμ, λ€μ ν ν°μ μμΈ‘νλ λ° κ°λ ₯ν μ±λ₯μ 보μ
λλ€. μ΄ κΈ°λ₯μ νμ©νμ¬ BioGPTλ ꡬ문μ μΌλ‘ μΌκ΄λ ν
μ€νΈλ₯Ό μμ±ν μ μμΌλ©°, μμ μ€ν¬λ¦½νΈ `run_generation.py`μμ μ΄λ₯Ό νμΈν μ μμ΅λλ€.
- μ΄ λͺ¨λΈμ `past_key_values`(PyTorch μ©)λ₯Ό μ
λ ₯μΌλ‘ λ°μ μ μλλ°, μ΄λ μ΄μ μ κ³μ°λ ν€/κ° μ΄ν
μ
μμ
λλ€. μ΄ κ°μ μ¬μ©νλ©΄ ν
μ€νΈ μμ± μ€ μ΄λ―Έ κ³μ°λ κ°μ λ€μ κ³μ°νμ§ μλλ‘ ν μ μμ΅λλ€. PyTorchμμ `past_key_values` μΈμλ BioGptForCausalLM.forward() λ©μλμμ μμΈν μ€λͺ
λμ΄ μμ΅λλ€.
### Scaled Dot Product Attention(SDPA) μ¬μ© [[using-scaled-dot-product-attention-sdpa]]
PyTorchλ `torch.nn.functional`μ μΌλΆλ‘ μ€μΌμΌλ μ κ³± μ΄ν
μ
(SDPA) μ°μ°μλ₯Ό κΈ°λ³Έμ μΌλ‘ ν¬ν¨ν©λλ€. μ΄ ν¨μλ μ
λ ₯κ³Ό μ¬μ© μ€μΈ νλμ¨μ΄μ λ°λΌ μ¬λ¬ ꡬνμ μ μ©ν μ μμ΅λλ€. μμΈν λ΄μ©μ [곡μ λ¬Έμ](https://pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html) λλ [GPU μΆλ‘ ](https://huggingface.co/docs/transformers/main/en/perf_infer_gpu_one#pytorch-scaled-dot-product-attention) νμ΄μ§λ₯Ό μ°Έμ‘°νμΈμ.
`torch>=2.1.1`μμ ꡬνμ΄ κ°λ₯ν κ²½μ° SDPAλ κΈ°λ³Έμ μΌλ‘ μ¬μ©λλ©°, `attn_implementation="sdpa"`λ₯Ό `from_pretrained()`μμ μ€μ νμ¬ SDPA μ¬μ©μ λͺ
μμ μΌλ‘ μμ²ν μ μμ΅λλ€.
```
from transformers import BioGptForCausalLM
model = BioGptForCausalLM.from_pretrained("microsoft/biogpt", attn_implementation="sdpa", torch_dtype=torch.float16)
```
NVIDIA GeForce RTX 2060-8GB, PyTorch 2.3.1, Ubuntu 20.04 νκ²½μμ `float16` λ° CausalLM ν€λκ° μλ `microsoft/biogpt` λͺ¨λΈλ‘ λ‘컬 λ²€μΉλ§ν¬λ₯Ό μνν κ²°κ³Ό, νλ ¨ μ€ λ€μκ³Ό κ°μ μλ ν₯μμ νμΈνμ΅λλ€.
μ΅μ μ μλ ν₯μμ μν΄ λͺ¨λΈμ λ°μ λ°λ(μ: `torch.float16` λλ `torch.bfloat16`)λ‘ λ‘λνλ κ²μ΄ μ’μ΅λλ€.
| num_training_steps | batch_size | seq_len | is cuda | Time per batch (eager - s) | Time per batch (sdpa - s) | Speedup (%) | Eager peak mem (MB) | sdpa peak mem (MB) | Mem saving (%) |
|--------------------|------------|---------|---------|----------------------------|---------------------------|-------------|---------------------|--------------------|----------------|
| 100 | 1 | 128 | False | 0.038 | 0.031 | 21.301 | 1601.862 | 1601.497 | 0.023 |
| 100 | 1 | 256 | False | 0.039 | 0.034 | 15.084 | 1624.944 | 1625.296 | -0.022 |
| 100 | 2 | 128 | False | 0.039 | 0.033 | 16.820 | 1624.567 | 1625.296 | -0.045 |
| 100 | 2 | 256 | False | 0.065 | 0.059 | 10.255 | 1672.164 | 1672.164 | 0.000 |
| 100 | 4 | 128 | False | 0.062 | 0.058 | 6.998 | 1671.435 | 1672.164 | -0.044 |
| 100 | 4 | 256 | False | 0.113 | 0.100 | 13.316 | 2350.179 | 1848.435 | 27.144 |
| 100 | 8 | 128 | False | 0.107 | 0.098 | 9.883 | 2098.521 | 1848.435 | 13.530 |
| 100 | 8 | 256 | False | 0.222 | 0.196 | 13.413 | 3989.980 | 2986.492 | 33.601 |
NVIDIA GeForce RTX 2060-8GB, PyTorch 2.3.1, Ubuntu 20.04 νκ²½μμ `float16` λ° AutoModel ν€λκ° μλ `microsoft/biogpt` λͺ¨λΈλ‘ μΆλ‘ μ€ λ€μκ³Ό κ°μ μλ ν₯μμ νμΈνμ΅λλ€.
| num_batches | batch_size | seq_len | is cuda | is half | use mask | Per token latency eager (ms) | Per token latency SDPA (ms) | Speedup (%) | Mem eager (MB) | Mem BT (MB) | Mem saved (%) |
|-------------|------------|---------|---------|---------|----------|------------------------------|-----------------------------|-------------|----------------|--------------|---------------|
| 50 | 1 | 64 | True | True | True | 0.115 | 0.098 | 17.392 | 716.998 | 716.998 | 0.000 |
| 50 | 1 | 128 | True | True | True | 0.115 | 0.093 | 24.640 | 730.916 | 730.916 | 0.000 |
| 50 | 2 | 64 | True | True | True | 0.114 | 0.096 | 19.204 | 730.900 | 730.900 | 0.000 |
| 50 | 2 | 128 | True | True | True | 0.117 | 0.095 | 23.529 | 759.262 | 759.262 | 0.000 |
| 50 | 4 | 64 | True | True | True | 0.113 | 0.096 | 18.325 | 759.229 | 759.229 | 0.000 |
| 50 | 4 | 128 | True | True | True | 0.186 | 0.178 | 4.289 | 816.478 | 816.478 | 0.000 |
## 리μμ€ [[resources]]
- [μΈκ³Όμ μΈμ΄ λͺ¨λΈλ§ μμ
κ°μ΄λ](../tasks/language_modeling)
## BioGptConfig [[transformers.BioGptConfig]]
[[autodoc]] BioGptConfig
## BioGptTokenizer [[transformers.BioGptTokenizer]]
[[autodoc]] BioGptTokenizer
- save_vocabulary
## BioGptModel [[transformers.BioGptModel]]
[[autodoc]] BioGptModel
- forward
## BioGptForCausalLM [[transformers.BioGptForCausalLM]]
[[autodoc]] BioGptForCausalLM
- forward
## BioGptForTokenClassification [[transformers.BioGptForTokenClassification]]
[[autodoc]] BioGptForTokenClassification
- forward
## BioGptForSequenceClassification [[transformers.BioGptForSequenceClassification]]
[[autodoc]] BioGptForSequenceClassification
- forward |