| <!--Copyright 2022 The HuggingFace Team. All rights reserved. | |
| Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with | |
| the License. You may obtain a copy of the License at | |
| http://www.apache.org/licenses/LICENSE-2.0 | |
| Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on | |
| an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the | |
| specific language governing permissions and limitations under the License. | |
| β οΈ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be | |
| rendered properly in your Markdown viewer. | |
| --> | |
| # BioGPT [[biogpt]] | |
| ## κ°μ [[overview]] | |
| BioGPTλ Renqian Luo, Liai Sun, Yingce Xia, Tao Qin, Sheng Zhang, Hoifung Poon, Tie-Yan Liuμ μν΄ [BioGPT: generative pre-trained transformer for biomedical text generation and mining](https://academic.oup.com/bib/advance-article/doi/10.1093/bib/bbac409/6713511?guestAccessKey=a66d9b5d-4f83-4017-bb52-405815c907b9) μμ μ μλ λͺ¨λΈμ λλ€. BioGPTλ μλ¬Όμν ν μ€νΈ μμ±κ³Ό λ§μ΄λμ μν΄ λλ©μΈμ νΉνλ μμ±ν μ¬μ νμ΅ νΈλμ€ν¬λ¨Έ μΈμ΄ λͺ¨λΈμ λλ€. BioGPTλ νΈλμ€ν¬λ¨Έ μΈμ΄ λͺ¨λΈ ꡬ쑰λ₯Ό λ°λ₯΄λ©°, 1,500λ§ κ°μ PubMed μ΄λ‘μ μ΄μ©ν΄ μ²μλΆν° νμ΅λμμ΅λλ€. | |
| λ Όλ¬Έμ μ΄λ‘μ λ€μκ³Ό κ°μ΅λλ€: | |
| *μλ¬Όμν λΆμΌμμ μ¬μ νμ΅λ μΈμ΄ λͺ¨λΈμ μΌλ° μμ°μ΄ μ²λ¦¬ λΆμΌμμμ μ±κ³΅μ μκ°μ λ°μ μ μ λ λ§μ μ£Όλͺ©μ λ°κ³ μμ΅λλ€. μΌλ° μΈμ΄ λΆμΌμμ μ¬μ νμ΅λ μΈμ΄ λͺ¨λΈμ λ κ°μ§ μ£Όμ κ³ν΅μΈ BERT(λ° κ·Έ λ³ν)μ GPT(λ° κ·Έ λ³ν) μ€ μ²« λ²μ§Έλ μλ¬Όμν λΆμΌμμ BioBERTμ PubMedBERTμ κ°μ΄ κ΄λ²μνκ² μ°κ΅¬λμμ΅λλ€. μ΄λ€μ λ€μν λΆλ₯ κΈ°λ°μ μλ¬Όμν μμ μμ ν° μ±κ³΅μ κ±°λμμ§λ§, μμ± λ₯λ ₯μ λΆμ‘±μ κ·Έλ€μ μ μ© λ²μλ₯Ό μ ννμ΅λλ€. λ³Έ λ Όλ¬Έμμλ λκ·λͺ¨ μλ¬Όμν λ¬Ένμ μ¬μ νμ΅ν λλ©μΈ νΉν μμ±ν νΈλμ€ν¬λ¨Έ μΈμ΄ λͺ¨λΈμΈ BioGPTλ₯Ό μ μν©λλ€. μ°λ¦¬λ 6κ°μ μλ¬Όμν μμ°μ΄ μ²λ¦¬ μμ μμ BioGPTλ₯Ό νκ°ν κ²°κ³Ό, λλΆλΆμ μμ μμ μ΄μ λͺ¨λΈλ³΄λ€ μ°μν μ±λ₯μ 보μμ΅λλ€. νΉν, BC5CDR, KD-DTI, DDI μλ-ν¬-μλ κ΄κ³ μΆμΆ μμ μμ κ°κ° 44.98%, 38.42%, 40.76%μ F1 μ μλ₯Ό κΈ°λ‘νμμΌλ©°, PubMedQAμμ 78.2%μ μ νλλ₯Ό λ¬μ±ν΄ μλ‘μ΄ κΈ°λ‘μ μΈμ μ΅λλ€. λν ν μ€νΈ μμ±μ λν μ¬λ‘ μ°κ΅¬λ μλ¬Όμν μ©μ΄μ λν μ μ°½ν μ€λͺ μ μμ±νλ λ° μμ΄ BioGPTμ μ₯μ μ λμ± μ μ¦νμ΅λλ€.* | |
| μ΄ λͺ¨λΈμ [kamalkraj](https://huggingface.co/kamalkraj)μ μν΄ κΈ°μ¬λμμ΅λλ€. μλ³Έ μ½λλ [μ¬κΈ°](https://github.com/microsoft/BioGPT)μμ μ°Ύμ μ μμ΅λλ€. | |
| ## μ¬μ© ν [[usage-tips]] | |
| - BioGPTλ μ λμ μμΉ μλ² λ©(absolute position embedding)μ μ¬μ©νλ―λ‘, μ λ ₯μ μΌμͺ½μ΄ μλ μ€λ₯Έμͺ½μμ ν¨λ©νλ κ²μ΄ κΆμ₯λ©λλ€. | |
| - BioGPTλ μΈκ³Όμ μΈμ΄ λͺ¨λΈλ§(Causal Langague Modeling, CLM) λͺ©νλ‘ νμ΅λμκΈ° λλ¬Έμ, λ€μ ν ν°μ μμΈ‘νλ λ° κ°λ ₯ν μ±λ₯μ 보μ λλ€. μ΄ κΈ°λ₯μ νμ©νμ¬ BioGPTλ ꡬ문μ μΌλ‘ μΌκ΄λ ν μ€νΈλ₯Ό μμ±ν μ μμΌλ©°, μμ μ€ν¬λ¦½νΈ `run_generation.py`μμ μ΄λ₯Ό νμΈν μ μμ΅λλ€. | |
| - μ΄ λͺ¨λΈμ `past_key_values`(PyTorch μ©)λ₯Ό μ λ ₯μΌλ‘ λ°μ μ μλλ°, μ΄λ μ΄μ μ κ³μ°λ ν€/κ° μ΄ν μ μμ λλ€. μ΄ κ°μ μ¬μ©νλ©΄ ν μ€νΈ μμ± μ€ μ΄λ―Έ κ³μ°λ κ°μ λ€μ κ³μ°νμ§ μλλ‘ ν μ μμ΅λλ€. PyTorchμμ `past_key_values` μΈμλ BioGptForCausalLM.forward() λ©μλμμ μμΈν μ€λͺ λμ΄ μμ΅λλ€. | |
| ### Scaled Dot Product Attention(SDPA) μ¬μ© [[using-scaled-dot-product-attention-sdpa]] | |
| PyTorchλ `torch.nn.functional`μ μΌλΆλ‘ μ€μΌμΌλ μ κ³± μ΄ν μ (SDPA) μ°μ°μλ₯Ό κΈ°λ³Έμ μΌλ‘ ν¬ν¨ν©λλ€. μ΄ ν¨μλ μ λ ₯κ³Ό μ¬μ© μ€μΈ νλμ¨μ΄μ λ°λΌ μ¬λ¬ ꡬνμ μ μ©ν μ μμ΅λλ€. μμΈν λ΄μ©μ [곡μ λ¬Έμ](https://pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html) λλ [GPU μΆλ‘ ](https://huggingface.co/docs/transformers/main/en/perf_infer_gpu_one#pytorch-scaled-dot-product-attention) νμ΄μ§λ₯Ό μ°Έμ‘°νμΈμ. | |
| `torch>=2.1.1`μμ ꡬνμ΄ κ°λ₯ν κ²½μ° SDPAλ κΈ°λ³Έμ μΌλ‘ μ¬μ©λλ©°, `attn_implementation="sdpa"`λ₯Ό `from_pretrained()`μμ μ€μ νμ¬ SDPA μ¬μ©μ λͺ μμ μΌλ‘ μμ²ν μ μμ΅λλ€. | |
| ``` | |
| from transformers import BioGptForCausalLM | |
| model = BioGptForCausalLM.from_pretrained("microsoft/biogpt", attn_implementation="sdpa", dtype=torch.float16) | |
| ``` | |
| NVIDIA GeForce RTX 2060-8GB, PyTorch 2.3.1, Ubuntu 20.04 νκ²½μμ `float16` λ° CausalLM ν€λκ° μλ `microsoft/biogpt` λͺ¨λΈλ‘ λ‘컬 λ²€μΉλ§ν¬λ₯Ό μνν κ²°κ³Ό, νλ ¨ μ€ λ€μκ³Ό κ°μ μλ ν₯μμ νμΈνμ΅λλ€. | |
| μ΅μ μ μλ ν₯μμ μν΄ λͺ¨λΈμ λ°μ λ°λ(μ: `torch.float16` λλ `torch.bfloat16`)λ‘ λ‘λνλ κ²μ΄ μ’μ΅λλ€. | |
| | num_training_steps | batch_size | seq_len | is cuda | Time per batch (eager - s) | Time per batch (sdpa - s) | Speedup (%) | Eager peak mem (MB) | sdpa peak mem (MB) | Mem saving (%) | | |
| |--------------------|------------|---------|---------|----------------------------|---------------------------|-------------|---------------------|--------------------|----------------| | |
| | 100 | 1 | 128 | False | 0.038 | 0.031 | 21.301 | 1601.862 | 1601.497 | 0.023 | | |
| | 100 | 1 | 256 | False | 0.039 | 0.034 | 15.084 | 1624.944 | 1625.296 | -0.022 | | |
| | 100 | 2 | 128 | False | 0.039 | 0.033 | 16.820 | 1624.567 | 1625.296 | -0.045 | | |
| | 100 | 2 | 256 | False | 0.065 | 0.059 | 10.255 | 1672.164 | 1672.164 | 0.000 | | |
| | 100 | 4 | 128 | False | 0.062 | 0.058 | 6.998 | 1671.435 | 1672.164 | -0.044 | | |
| | 100 | 4 | 256 | False | 0.113 | 0.100 | 13.316 | 2350.179 | 1848.435 | 27.144 | | |
| | 100 | 8 | 128 | False | 0.107 | 0.098 | 9.883 | 2098.521 | 1848.435 | 13.530 | | |
| | 100 | 8 | 256 | False | 0.222 | 0.196 | 13.413 | 3989.980 | 2986.492 | 33.601 | | |
| NVIDIA GeForce RTX 2060-8GB, PyTorch 2.3.1, Ubuntu 20.04 νκ²½μμ `float16` λ° AutoModel ν€λκ° μλ `microsoft/biogpt` λͺ¨λΈλ‘ μΆλ‘ μ€ λ€μκ³Ό κ°μ μλ ν₯μμ νμΈνμ΅λλ€. | |
| | num_batches | batch_size | seq_len | is cuda | is half | use mask | Per token latency eager (ms) | Per token latency SDPA (ms) | Speedup (%) | Mem eager (MB) | Mem BT (MB) | Mem saved (%) | | |
| |-------------|------------|---------|---------|---------|----------|------------------------------|-----------------------------|-------------|----------------|--------------|---------------| | |
| | 50 | 1 | 64 | True | True | True | 0.115 | 0.098 | 17.392 | 716.998 | 716.998 | 0.000 | | |
| | 50 | 1 | 128 | True | True | True | 0.115 | 0.093 | 24.640 | 730.916 | 730.916 | 0.000 | | |
| | 50 | 2 | 64 | True | True | True | 0.114 | 0.096 | 19.204 | 730.900 | 730.900 | 0.000 | | |
| | 50 | 2 | 128 | True | True | True | 0.117 | 0.095 | 23.529 | 759.262 | 759.262 | 0.000 | | |
| | 50 | 4 | 64 | True | True | True | 0.113 | 0.096 | 18.325 | 759.229 | 759.229 | 0.000 | | |
| | 50 | 4 | 128 | True | True | True | 0.186 | 0.178 | 4.289 | 816.478 | 816.478 | 0.000 | | |
| ## 리μμ€ [[resources]] | |
| - [μΈκ³Όμ μΈμ΄ λͺ¨λΈλ§ μμ κ°μ΄λ](../tasks/language_modeling) | |
| ## BioGptConfig [[transformers.BioGptConfig]] | |
| [[autodoc]] BioGptConfig | |
| ## BioGptTokenizer [[transformers.BioGptTokenizer]] | |
| [[autodoc]] BioGptTokenizer | |
| - save_vocabulary | |
| ## BioGptModel [[transformers.BioGptModel]] | |
| [[autodoc]] BioGptModel | |
| - forward | |
| ## BioGptForCausalLM [[transformers.BioGptForCausalLM]] | |
| [[autodoc]] BioGptForCausalLM | |
| - forward | |
| ## BioGptForTokenClassification [[transformers.BioGptForTokenClassification]] | |
| [[autodoc]] BioGptForTokenClassification | |
| - forward | |
| ## BioGptForSequenceClassification [[transformers.BioGptForSequenceClassification]] | |
| [[autodoc]] BioGptForSequenceClassification | |
| - forward |