File size: 22,067 Bytes
17c6d62 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 |
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
â ïž Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# The Transformer model family
2017幎ã«å°å
¥ãããŠä»¥æ¥ã[å
ã®Transformer](https://arxiv.org/abs/1706.03762)ã¢ãã«ã¯ãèªç¶èšèªåŠçïŒNLPïŒã®ã¿ã¹ã¯ãè¶
ããå€ãã®æ°ãããšããµã€ãã£ã³ã°ãªã¢ãã«ãã€ã³ã¹ãã€ã¢ããŸããã[ã¿ã³ãã¯è³ªã®æããããŸããæ§é ãäºæž¬](https://huggingface.co/blog/deep-learning-with-proteins)ããã¢ãã«ã[ããŒã¿ãŒãèµ°ãããããã®ãã¬ãŒãã³ã°](https://huggingface.co/blog/train-decision-transformers)ããã¢ãã«ããããŠ[æç³»åäºæž¬](https://huggingface.co/blog/time-series-transformers)ã®ããã®ã¢ãã«ãªã©ããããŸããTransformerã®ããŸããŸãªããªã¢ã³ããå©çšå¯èœã§ããã倧å±ãèŠèœãšãããšããããŸãããããã®ãã¹ãŠã®ã¢ãã«ã«å
±éããã®ã¯ãå
ã®Transformerã¢ãŒããã¯ãã£ã«åºã¥ããŠããããšã§ããäžéšã®ã¢ãã«ã¯ãšã³ã³ãŒããŸãã¯ãã³ãŒãã®ã¿ã䜿çšããä»ã®ã¢ãã«ã¯äž¡æ¹ã䜿çšããŸããããã¯ãTransformerãã¡ããªãŒå
ã®ã¢ãã«ã®é«ã¬ãã«ã®éããã«ããŽã©ã€ãºãã調æ»ããããã®æçšãªå顿³ãæäŸãã以åã«åºäŒã£ãããšã®ãªãTransformerãçè§£ããã®ã«åœ¹ç«ã¡ãŸãã
å
ã®Transformerã¢ãã«ã«æ
£ããŠããªããããªãã¬ãã·ã¥ãå¿
èŠãªå Žåã¯ãHugging Faceã³ãŒã¹ã®[Transformerã®åäœåç](https://huggingface.co/course/chapter1/4?fw=pt)ç« ããã§ãã¯ããŠãã ããã
<div align="center">
<iframe width="560" height="315" src="https://www.youtube.com/embed/H39Z_720T5s" title="YouTubeãããªãã¬ãŒã€ãŒ"
frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope;
picture-in-picture" allowfullscreen></iframe>
</div>
## Computer vision
<iframe style="border: 1px solid rgba(0, 0, 0, 0.1);" width="1000" height="450" src="https://www.figma.com/embed?embed_host=share&url=https%3A%2F%2Fwww.figma.com%2Ffile%2FacQBpeFBVvrDUlzFlkejoz%2FModelscape-timeline%3Fnode-id%3D0%253A1%26t%3Dm0zJ7m2BQ9oe0WtO-1" allowfullscreen></iframe>
### Convolutional network
é·ãéãç³ã¿èŸŒã¿ãããã¯ãŒã¯ïŒCNNïŒã¯ã³ã³ãã¥ãŒã¿ããžã§ã³ã®ã¿ã¹ã¯ã«ãããŠæ¯é
çãªãã©ãã€ã ã§ãããã[ããžã§ã³Transformer](https://arxiv.org/abs/2010.11929)ã¯ãã®ã¹ã±ãŒã©ããªãã£ãšå¹çæ§ã瀺ããŸãããããã§ããäžéšã®CNNã®æé«ã®ç¹æ§ãç¹ã«ç¹å®ã®ã¿ã¹ã¯ã«ãšã£ãŠã¯éåžžã«åŒ·åãªç¿»èš³äžå€æ§ãªã©ãäžéšã®Transformerã¯ã¢ãŒããã¯ãã£ã«ç³ã¿èŸŒã¿ãçµã¿èŸŒãã§ããŸãã[ConvNeXt](model_doc/convnext)ã¯ãç³ã¿èŸŒã¿ãçŸä»£åããããã«Transformerããèšèšã®éžæè¢ãåãå
¥ããäŸãã°ãConvNeXtã¯ç»åããããã«åå²ããããã«éãªãåããªãã¹ã©ã€ãã£ã³ã°ãŠã£ã³ããŠãšãã°ããŒãã«å容éãå¢å ãããããã®å€§ããªã«ãŒãã«ã䜿çšããŸããConvNeXtã¯ãã¡ã¢ãªå¹çãåäžãããããã©ãŒãã³ã¹ãåäžãããããã«ããã€ãã®ã¬ã€ã€ãŒãã¶ã€ã³ã®éžæè¢ãæäŸããTransformerãšç«¶åçã«ãªããŸãïŒ
### Encoder[[cv-encoder]]
[ããžã§ã³ ãã©ã³ã¹ãã©ãŒããŒïŒViTïŒ](model_doc/vit) ã¯ãç³ã¿èŸŒã¿ã䜿çšããªãã³ã³ãã¥ãŒã¿ããžã§ã³ã¿ã¹ã¯ã®æãéããŸãããViT ã¯æšæºã®ãã©ã³ã¹ãã©ãŒããŒãšã³ã³ãŒããŒã䜿çšããŸãããç»åãæ±ãæ¹æ³ãäž»èŠãªãã¬ãŒã¯ã¹ã«ãŒã§ãããç»åãåºå®ãµã€ãºã®ãããã«åå²ããããããããŒã¯ã³ã®ããã«äœ¿çšããŠåã蟌ã¿ãäœæããŸããViT ã¯ãåœæã®CNNãšç«¶äºåã®ããçµæã瀺ãããã«ãã©ã³ã¹ãã©ãŒããŒã®å¹ççãªã¢ãŒããã¯ãã£ã掻çšããŸãããããã¬ãŒãã³ã°ã«å¿
èŠãªãªãœãŒã¹ãå°ãªããŠæžã¿ãŸãããViT ã«ç¶ããŠãã»ã°ã¡ã³ããŒã·ã§ã³ãæ€åºãªã©ã®å¯ãªããžã§ã³ã¿ã¹ã¯ãåŠçã§ããä»ã®ããžã§ã³ã¢ãã«ãç»å ŽããŸããã
ãããã®ã¢ãã«ã®1ã€ã[Swin](model_doc/swin) ãã©ã³ã¹ãã©ãŒããŒã§ããSwin ãã©ã³ã¹ãã©ãŒããŒã¯ãããå°ããªãµã€ãºã®ãããããéå±€çãªç¹åŸŽãããïŒCNNã®ããã§ ViT ãšã¯ç°ãªããŸãïŒãæ§ç¯ããæ·±å±€ã®ããããšé£æ¥ããããããšããŒãžããŸããæ³šæã¯ããŒã«ã«ãŠã£ã³ããŠå
ã§ã®ã¿èšç®ããããŠã£ã³ããŠã¯æ³šæã®ã¬ã€ã€ãŒéã§ã·ãããããã¢ãã«ãããè¯ãåŠç¿ããã®ããµããŒãããæ¥ç¶ãäœæããŸããSwin ãã©ã³ã¹ãã©ãŒããŒã¯éå±€çãªç¹åŸŽããããçæã§ãããããã»ã°ã¡ã³ããŒã·ã§ã³ãæ€åºãªã©ã®å¯ãªäºæž¬ã¿ã¹ã¯ã«é©ããŠããŸãã[SegFormer](model_doc/segformer) ãéå±€çãªç¹åŸŽããããæ§ç¯ããããã«ãã©ã³ã¹ãã©ãŒããŒãšã³ã³ãŒããŒã䜿çšããŸããããã¹ãŠã®ç¹åŸŽããããçµã¿åãããŠäºæž¬ããããã«ã·ã³ãã«ãªãã«ãã¬ã€ã€ãŒããŒã»ãããã³ïŒMLPïŒãã³ãŒããŒã远å ããŸãã
BeIT ããã³ ViTMAE ãªã©ã®ä»ã®ããžã§ã³ã¢ãã«ã¯ãBERTã®äºåãã¬ãŒãã³ã°ç®æšããã€ã³ã¹ãã¬ãŒã·ã§ã³ãåŸãŸããã[BeIT](model_doc/beit) 㯠*masked image modeling (MIM)* ã«ãã£ãŠäºåãã¬ãŒãã³ã°ãããŠããŸããç»åãããã¯ã©ã³ãã ã«ãã¹ã¯ãããç»åãèŠèŠããŒã¯ã³ã«ããŒã¯ã³åãããŸããBeIT ã¯ãã¹ã¯ããããããã«å¯Ÿå¿ããèŠèŠããŒã¯ã³ãäºæž¬ããããã«ãã¬ãŒãã³ã°ãããŸãã[ViTMAE](model_doc/vitmae) ã䌌ããããªäºåãã¬ãŒãã³ã°ç®æšãæã£ãŠãããèŠèŠããŒã¯ã³ã®ä»£ããã«ãã¯ã»ã«ãäºæž¬ããå¿
èŠããããŸããç°äŸãªã®ã¯ç»åãããã®75%ããã¹ã¯ãããŠããããšã§ãïŒãã³ãŒããŒã¯ãã¹ã¯ãããããŒã¯ã³ãšãšã³ã³ãŒããããããããããã¯ã»ã«ãåæ§ç¯ããŸããäºåãã¬ãŒãã³ã°ã®åŸããã³ãŒããŒã¯æšãŠããããšã³ã³ãŒããŒã¯ããŠã³ã¹ããªãŒã ã®ã¿ã¹ã¯ã§äœ¿çšã§ããç¶æ
ã§ãã
### Decoder[[cv-decoder]]
ãã³ãŒããŒã®ã¿ã®ããžã§ã³ã¢ãã«ã¯çããã§ãããªããªããã»ãšãã©ã®ããžã§ã³ã¢ãã«ã¯ç»å衚çŸãåŠã¶ããã«ãšã³ã³ãŒããŒã䜿çšããããã§ããããããç»åçæãªã©ã®ãŠãŒã¹ã±ãŒã¹ã§ã¯ããã³ãŒããŒã¯èªç¶ãªé©å¿ã§ããããã¯ãGPT-2ãªã©ã®ããã¹ãçæã¢ãã«ããèŠãŠããããã«ã[ImageGPT](model_doc/imagegpt) ã§ãåæ§ã®ã¢ãŒããã¯ãã£ã䜿çšããŸãããã·ãŒã±ã³ã¹å
ã®æ¬¡ã®ããŒã¯ã³ãäºæž¬ãã代ããã«ãç»åå
ã®æ¬¡ã®ãã¯ã»ã«ãäºæž¬ããŸããç»åçæã«å ããŠãImageGPT ã¯ç»ååé¡ã®ããã«ããã¡ã€ã³ãã¥ãŒãã³ã°ã§ããŸãã
### Encoder-decoder[[cv-encoder-decoder]]
ããžã§ã³ã¢ãã«ã¯äžè¬çã«ãšã³ã³ãŒããŒïŒããã¯ããŒã³ãšãåŒã°ããŸãïŒã䜿çšããŠéèŠãªç»åç¹åŸŽãæœåºããããããã©ã³ã¹ãã©ãŒããŒãã³ãŒããŒã«æž¡ãããã«äœ¿çšããŸãã[DETR](model_doc/detr) ã¯äºåãã¬ãŒãã³ã°æžã¿ã®ããã¯ããŒã³ãæã£ãŠããŸããããªããžã§ã¯ãæ€åºã®ããã«å®å
šãªãã©ã³ã¹ãã©ãŒããŒãšã³ã³ãŒããŒãã³ãŒããŒã¢ãŒããã¯ãã£ã䜿çšããŠããŸãããšã³ã³ãŒããŒã¯ç»å衚çŸãåŠã³ããã³ãŒããŒå
ã®ãªããžã§ã¯ãã¯ãšãªïŒåãªããžã§ã¯ãã¯ãšãªã¯ç»åå
ã®é åãŸãã¯ãªããžã§ã¯ãã«çŠç¹ãåœãŠãåŠç¿ãããåã蟌ã¿ã§ãïŒãšçµã¿åãããŸããDETR ã¯åãªããžã§ã¯ãã¯ãšãªã«å¯Ÿããå¢çããã¯ã¹ã®åº§æšãšã¯ã©ã¹ã©ãã«ãäºæž¬ããŸãã
## Natural lanaguage processing
<iframe style="border: 1px solid rgba(0, 0, 0, 0.1);" width="1000" height="450" src="https://www.figma.com/embed?embed_host=share&url=https%3A%2F%2Fwww.figma.com%2Ffile%2FUhbQAZDlpYW5XEpdFy6GoG%2Fnlp-model-timeline%3Fnode-id%3D0%253A1%26t%3D4mZMr4r1vDEYGJ50-1" allowfullscreen></iframe>
### Encoder[[nlp-encoder]]
[BERT](model_doc/bert) ã¯ãšã³ã³ãŒããŒå°çšã®Transformerã§ãå
¥åã®äžéšã®ããŒã¯ã³ãã©ã³ãã ã«ãã¹ã¯ããŠä»ã®ããŒã¯ã³ãèŠãªãããã«ããŠããŸããããã«ãããããŒã¯ã³ããã¹ã¯ããæèã«åºã¥ããŠãã¹ã¯ãããããŒã¯ã³ãäºæž¬ããããšãäºåãã¬ãŒãã³ã°ã®ç®æšã§ããããã«ãããBERTã¯å
¥åã®ããæ·±ããã€è±ããªè¡šçŸãåŠç¿ããã®ã«å·Šå³ã®æèãå®å
šã«æŽ»çšã§ããŸããããããBERTã®äºåãã¬ãŒãã³ã°æŠç¥ã«ã¯ãŸã æ¹åã®äœå°ããããŸããã[RoBERTa](model_doc/roberta) ã¯ããã¬ãŒãã³ã°ãé·æéè¡ãããã倧ããªãããã§ãã¬ãŒãã³ã°ããäºååŠçäžã«äžåºŠã ãã§ãªãåãšããã¯ã§ããŒã¯ã³ãã©ã³ãã ã«ãã¹ã¯ããæ¬¡æäºæž¬ã®ç®æšãåé€ããæ°ããäºåãã¬ãŒãã³ã°ã¬ã·ããå°å
¥ããããšã§ãããæ¹åããŸããã
æ§èœãåäžãããäž»èŠãªæŠç¥ã¯ã¢ãã«ã®ãµã€ãºãå¢ããããšã§ãããå€§èŠæš¡ãªã¢ãã«ã®ãã¬ãŒãã³ã°ã¯èšç®ã³ã¹ããããããŸããèšç®ã³ã¹ããåæžããæ¹æ³ã®1ã€ã¯ã[DistilBERT](model_doc/distilbert) ã®ãããªå°ããªã¢ãã«ã䜿çšããããšã§ããDistilBERTã¯[ç¥èèžç](https://arxiv.org/abs/1503.02531) - å§çž®æè¡ - ã䜿çšããŠãBERTã®ã»ãŒãã¹ãŠã®èšèªçè§£æ©èœãä¿æããªãããããå°ããªããŒãžã§ã³ãäœæããŸãã
ããããã»ãšãã©ã®Transformerã¢ãã«ã¯åŒãç¶ãããå€ãã®ãã©ã¡ãŒã¿ã«çŠç¹ãåœãŠããã¬ãŒãã³ã°å¹çãåäžãããæ°ããã¢ãã«ãç»å ŽããŠããŸãã[ALBERT](model_doc/albert) ã¯ã2ã€ã®æ¹æ³ã§ãã©ã¡ãŒã¿ã®æ°ãæžããããšã«ãã£ãŠã¡ã¢ãªæ¶è²»éãåæžããŸãã倧ããªèªåœåã蟌ã¿ã2ã€ã®å°ããªè¡åã«åå²ããã¬ã€ã€ãŒããã©ã¡ãŒã¿ãå
±æã§ããããã«ããŸãã[DeBERTa](model_doc/deberta) ã¯ãåèªãšãã®äœçœ®ã2ã€ã®ãã¯ãã«ã§å¥ã
ã«ãšã³ã³ãŒãããè§£ãããæ³šææ©æ§ã远å ããŸãããæ³šæã¯ãããã®å¥ã
ã®ãã¯ãã«ããèšç®ãããŸããåèªãšäœçœ®ã®åã蟌ã¿ãå«ãŸããåäžã®ãã¯ãã«ã§ã¯ãªãã[Longformer](model_doc/longformer) ã¯ãç¹ã«é·ãã·ãŒã±ã³ã¹é·ã®ããã¥ã¡ã³ããåŠçããããã«æ³šæãããå¹ççã«ããããšã«çŠç¹ãåœãŠãŸãããåºå®ããããŠã£ã³ããŠãµã€ãºã®åšãã®åããŒã¯ã³ããèšç®ãããããŒã«ã«ãŠã£ã³ããŠä»ã泚æïŒç¹å®ã®ã¿ã¹ã¯ããŒã¯ã³ïŒåé¡ã®ããã® `[CLS]` ãªã©ïŒã®ã¿ã®ããã®ã°ããŒãã«ãªæ³šæãå«ãïŒã®çµã¿åããã䜿çšããŠãå®å
šãªæ³šæè¡åã§ã¯ãªãçãªæ³šæè¡åãäœæããŸãã
### Decoder[[nlp-decoder]]
[GPT-2](model_doc/gpt2)ã¯ãã·ãŒã±ã³ã¹å
ã®æ¬¡ã®åèªãäºæž¬ãããã³ãŒããŒå°çšã®Transformerã§ããã¢ãã«ã¯å
ãèŠãããšãã§ããªãããã«ããŒã¯ã³ãå³ã«ãã¹ã¯ãã"ã®ããèŠ"ãé²ããŸãã倧éã®ããã¹ããäºåãã¬ãŒãã³ã°ããããšã«ãããGPT-2ã¯ããã¹ãçæãéåžžã«åŸæã§ãããã¹ããæ£ç¢ºã§ããããšãããã«ããŠããæææ£ç¢ºã§ã¯ãªãããšããããŸããããããGPT-2ã«ã¯BERTã®äºåãã¬ãŒãã³ã°ããã®åæ¹åã³ã³ããã¹ããäžè¶³ããŠãããç¹å®ã®ã¿ã¹ã¯ã«ã¯é©ããŠããŸããã§ããã[XLNET](model_doc/xlnet)ã¯ãåæ¹åã«åŠç¿ã§ããé åèšèªã¢ããªã³ã°ç®æšïŒPLMïŒã䜿çšããããšã§ãBERTãšGPT-2ã®äºåãã¬ãŒãã³ã°ç®æšã®ãã¹ããçµã¿åãããŠããŸãã
GPT-2ã®åŸãèšèªã¢ãã«ã¯ããã«å€§ããæé·ããä»ã§ã¯*å€§èŠæš¡èšèªã¢ãã«ïŒLLMïŒ*ãšããŠç¥ãããŠããŸããå€§èŠæš¡ãªããŒã¿ã»ããã§äºåãã¬ãŒãã³ã°ãããã°ãLLMã¯ã»ãŒãŒãã·ã§ããåŠç¿ã瀺ãããšããããŸãã[GPT-J](model_doc/gptj)ã¯ã6Bã®ãã©ã¡ãŒã¿ãæã€LLMã§ã400Bã®ããŒã¯ã³ã§ãã¬ãŒãã³ã°ãããŠããŸããGPT-Jã«ã¯[OPT](model_doc/opt)ãç¶ãããã®ãã¡æå€§ã®ã¢ãã«ã¯175Bã§ã180Bã®ããŒã¯ã³ã§ãã¬ãŒãã³ã°ãããŠããŸããåãææã«[BLOOM](model_doc/bloom)ããªãªãŒã¹ããããã®ãã¡ããªãŒã®æå€§ã®ã¢ãã«ã¯176Bã®ãã©ã¡ãŒã¿ãæã¡ã46ã®èšèªãš13ã®ããã°ã©ãã³ã°èšèªã§366Bã®ããŒã¯ã³ã§ãã¬ãŒãã³ã°ãããŠããŸãã
### Encoder-decoder[[nlp-encoder-decoder]]
[BART](model_doc/bart)ã¯ãå
ã®Transformerã¢ãŒããã¯ãã£ãä¿æããŠããŸãããäºåãã¬ãŒãã³ã°ç®æšã*ããã¹ãè£å®*ã®ç Žæã«å€æŽããŠããŸããäžéšã®ããã¹ãã¹ãã³ã¯åäžã®`mask`ããŒã¯ã³ã§çœ®æãããŸãããã³ãŒããŒã¯ç ŽæããŠããªãããŒã¯ã³ãäºæž¬ãïŒæªæ¥ã®ããŒã¯ã³ã¯ãã¹ã¯ãããŸãïŒããšã³ã³ãŒããŒã®é ããç¶æ
ã䜿çšããŠäºæž¬ãè£å©ããŸãã[Pegasus](model_doc/pegasus)ã¯BARTã«äŒŒãŠããŸãããPegasusã¯ããã¹ãã¹ãã³ã®ä»£ããã«æå
šäœããã¹ã¯ããŸãããã¹ã¯ãããèšèªã¢ããªã³ã°ã«å ããŠãPegasusã¯ã®ã£ããæçæïŒGSGïŒã«ãã£ãŠäºåãã¬ãŒãã³ã°ãããŠããŸããGSGã®ç®æšã¯ãææžã«éèŠãªæããã¹ã¯ãããããã`mask`ããŒã¯ã³ã§çœ®æããããšã§ãããã³ãŒããŒã¯æ®ãã®æããåºåãçæããªããã°ãªããŸããã[T5](model_doc/t5)ã¯ããã¹ãŠã®NLPã¿ã¹ã¯ãç¹å®ã®ãã¬ãã£ãã¯ã¹ã䜿çšããŠããã¹ã察ããã¹ãã®åé¡ã«å€æãããããŠããŒã¯ãªã¢ãã«ã§ããããšãã°ããã¬ãã£ãã¯ã¹`Summarize:`ã¯èŠçŽã¿ã¹ã¯ã瀺ããŸããT5ã¯æåž«ãããã¬ãŒãã³ã°ïŒGLUEãšSuperGLUEïŒãšèªå·±æåž«ãããã¬ãŒãã³ã°ïŒããŒã¯ã³ã®15ïŒ
ãã©ã³ãã ã«ãµã³ãã«ãããããã¢ãŠãïŒã«ãã£ãŠäºåãã¬ãŒãã³ã°ãããŠããŸãã
## Audio
<iframe style="border: 1px solid rgba(0, 0, 0, 0.1);" width="1000" height="450" src="https://www.figma.com/embed?embed_host=share&url=https%3A%2F%2Fwww.figma.com%2Ffile%2Fvrchl8jDV9YwNVPWu2W0kK%2Fspeech-and-audio-model-timeline%3Fnode-id%3D0%253A1%26t%3DmM4H8pPMuK23rClL-1" allowfullscreen></iframe>
### Encoder[[audio-encoder]]
[Wav2Vec2](model_doc/wav2vec2) ã¯ãçã®ãªãŒãã£ãªæ³¢åœ¢ããçŽæ¥é³å£°è¡šçŸãåŠç¿ããããã®Transformerãšã³ã³ãŒããŒã䜿çšããŸããããã¯ã察ç
§çãªã¿ã¹ã¯ã§äºååŠç¿ãããäžé£ã®åœã®è¡šçŸããçã®é³å£°è¡šçŸãç¹å®ããŸãã [HuBERT](model_doc/hubert) ã¯Wav2Vec2ã«äŒŒãŠããŸãããç°ãªããã¬ãŒãã³ã°ããã»ã¹ãæã£ãŠããŸããã¿ãŒã²ããã©ãã«ã¯ãé¡äŒŒãããªãŒãã£ãªã»ã°ã¡ã³ããã¯ã©ã¹ã¿ã«å²ãåœãŠããããããé ããŠãããã«ãªãã¯ã©ã¹ã¿ãªã³ã°ã¹ãããã«ãã£ãŠäœæãããŸããé ããŠãããã¯åã蟌ã¿ã«ããããããäºæž¬ãè¡ããŸãã
### Encoder-decoder[[audio-encoder-decoder]]
[Speech2Text](model_doc/speech_to_text) ã¯ãèªåé³å£°èªèïŒASRïŒããã³é³å£°ç¿»èš³ã®ããã«èšèšãããé³å£°ã¢ãã«ã§ãããã®ã¢ãã«ã¯ããªãŒãã£ãªæ³¢åœ¢ããæœåºããããã°ã¡ã«ãã£ã«ã¿ãŒãã³ã¯ãã£ãŒãã£ãŒãåãå
¥ããäºåãã¬ãŒãã³ã°ãããèªå·±ååž°çã«ãã©ã³ã¹ã¯ãªãããŸãã¯ç¿»èš³ãçæããŸãã [Whisper](model_doc/whisper) ãASRã¢ãã«ã§ãããä»ã®å€ãã®é³å£°ã¢ãã«ãšã¯ç°ãªããâš ã©ãã«ä»ã âš ãªãŒãã£ãªãã©ã³ã¹ã¯ãªãã·ã§ã³ããŒã¿ã倧éã«äºåã«åŠç¿ããŠããŒãã·ã§ããããã©ãŒãã³ã¹ãå®çŸããŸããããŒã¿ã»ããã®å€§éšåã«ã¯éè±èªã®èšèªãå«ãŸããŠãããWhisperã¯äœãªãœãŒã¹èšèªã«ã䜿çšã§ããŸããæ§é çã«ã¯ãWhisperã¯Speech2Textã«äŒŒãŠããŸãããªãŒãã£ãªä¿¡å·ã¯ãšã³ã³ãŒããŒã«ãã£ãŠãšã³ã³ãŒãããããã°ã¡ã«ã¹ãã¯ããã°ã©ã ã«å€æãããŸãããã³ãŒããŒã¯ãšã³ã³ãŒããŒã®é ãç¶æ
ãšåã®ããŒã¯ã³ãããã©ã³ã¹ã¯ãªãããèªå·±ååž°çã«çæããŸãã
## Multimodal
<iframe style="border: 1px solid rgba(0, 0, 0, 0.1);" width="1000" height="450" src="https://www.figma.com/embed?embed_host=share&url=https%3A%2F%2Fwww.figma.com%2Ffile%2FcX125FQHXJS2gxeICiY93p%2Fmultimodal%3Fnode-id%3D0%253A1%26t%3DhPQwdx3HFPWJWnVf-1" allowfullscreen></iframe>
### Encoder[[mm-encoder]]
[VisualBERT](model_doc/visual_bert) ã¯ãBERTã®åŸã«ãªãªãŒã¹ãããããžã§ã³èšèªã¿ã¹ã¯åãã®ãã«ãã¢ãŒãã«ã¢ãã«ã§ããããã¯BERTãšäºåãã¬ãŒãã³ã°ãããç©äœæ€åºã·ã¹ãã ãçµã¿åãããç»åç¹åŸŽãããžã¥ã¢ã«åã蟌ã¿ã«æœåºããããã¹ãåã蟌ã¿ãšäžç·ã«BERTã«æž¡ããŸããVisualBERTã¯éãã¹ã¯ããã¹ããåºã«ãããã¹ã¯ããã¹ããäºæž¬ããããã¹ããç»åãšæŽåããŠãããã©ãããäºæž¬ããå¿
èŠããããŸããViTããªãªãŒã¹ãããéã[ViLT](model_doc/vilt) ã¯ç»ååã蟌ã¿ãååŸããããã«ãã®æ¹æ³ãæ¡çšããŸãããç»ååã蟌ã¿ã¯ããã¹ãåã蟌ã¿ãšå
±ã«å
±åã§åŠçãããŸãããããããViLTã¯ç»åããã¹ããããã³ã°ããã¹ã¯èšèªã¢ããªã³ã°ãããã³å
šåèªãã¹ãã³ã°ã«ããäºåãã¬ãŒãã³ã°ãè¡ãããŸãã
[CLIP](model_doc/clip) ã¯ç°ãªãã¢ãããŒããåãã(`ç»å`ã`ããã¹ã`) ã®ãã¢äºæž¬ãè¡ããŸããç»åãšã³ã³ãŒããŒïŒViTïŒãšããã¹ããšã³ã³ãŒããŒïŒTransformerïŒã¯ã(`ç»å`ã`ããã¹ã`) ãã¢ããŒã¿ã»ããäžã§å
±åãã¬ãŒãã³ã°ããã(`ç»å`ã`ããã¹ã`) ãã¢ã®ç»åãšããã¹ãã®åã蟌ã¿ã®é¡äŒŒæ§ãæå€§åããŸããäºåãã¬ãŒãã³ã°åŸãCLIPã䜿çšããŠç»åããããã¹ããäºæž¬ãããããã®éãè¡ãããšãã§ããŸãã[OWL-ViT](model_doc/owlvit) ã¯ããŒãã·ã§ããç©äœæ€åºã®ããã¯ããŒã³ãšããŠCLIPã䜿çšããŠããŸããäºåãã¬ãŒãã³ã°åŸãç©äœæ€åºãããã远å ããã(`ã¯ã©ã¹`ã`ããŠã³ãã£ã³ã°ããã¯ã¹`) ãã¢ã«å¯Ÿããã»ããäºæž¬ãè¡ãããŸãã
### Encoder-decoder[[mm-encoder-decoder]]
å
åŠæåèªèïŒOCRïŒã¯ãéåžžãç»åãçè§£ãããã¹ããçæããããã«è€æ°ã®ã³ã³ããŒãã³ããé¢äžããããã¹ãèªèã¿ã¹ã¯ã§ãã [TrOCR](model_doc/trocr) ã¯ããšã³ãããŒãšã³ãã®Transformerã䜿çšããŠãã®ããã»ã¹ãç°¡ç¥åããŸãããšã³ã³ãŒããŒã¯ç»åãåºå®ãµã€ãºã®ããããšããŠåŠçããããã®ViTã¹ã¿ã€ã«ã®ã¢ãã«ã§ããããã³ãŒããŒã¯ãšã³ã³ãŒããŒã®é ãç¶æ
ãåãå
¥ããããã¹ããèªå·±ååž°çã«çæããŸãã[Donut](model_doc/donut) ã¯OCRããŒã¹ã®ã¢ãããŒãã«äŸåããªãããäžè¬çãªããžã¥ã¢ã«ããã¥ã¡ã³ãçè§£ã¢ãã«ã§ããšã³ã³ãŒããŒãšããŠSwin Transformerããã³ãŒããŒãšããŠå€èšèªBARTã䜿çšããŸãã Donutã¯ç»åãšããã¹ãã®æ³šéã«åºã¥ããŠæ¬¡ã®åèªãäºæž¬ããããšã«ãããããã¹ããèªãããã«äºåãã¬ãŒãã³ã°ãããŸãããã³ãŒããŒã¯ããã³ãããäžããããããŒã¯ã³ã·ãŒã±ã³ã¹ãçæããŸããããã³ããã¯åããŠã³ã¹ããªãŒã ã¿ã¹ã¯ããšã«ç¹å¥ãªããŒã¯ã³ã䜿çšããŠè¡šçŸãããŸããäŸãã°ãããã¥ã¡ã³ãã®è§£æã«ã¯`è§£æ`ããŒã¯ã³ãããããšã³ã³ãŒããŒã®é ãç¶æ
ãšçµã¿åããããŠããã¥ã¡ã³ããæ§é åãããåºåãã©ãŒãããïŒJSONïŒã«è§£æããŸãã
## Reinforcement learning
<iframe style="border: 1px solid rgba(0, 0, 0, 0.1);" width="1000" height="450" src="https://www.figma.com/embed?embed_host=share&url=https%3A%2F%2Fwww.figma.com%2Ffile%2FiB3Y6RvWYki7ZuKO6tNgZq%2Freinforcement-learning%3Fnode-id%3D0%253A1%26t%3DhPQwdx3HFPWJWnVf-1" allowfullscreen></iframe>
### Decoder[[rl-decoder]]
æææ±ºå®ãšè»è·¡ãã©ã³ã¹ãã©ãŒããŒã¯ãç¶æ
ãã¢ã¯ã·ã§ã³ãå ±é
¬ãã·ãŒã±ã³ã¹ã¢ããªã³ã°ã®åé¡ãšããŠæããŸãã [Decision Transformer](model_doc/decision_transformer) ã¯ããªã¿ãŒã³ã»ãã¥ã»ãŽãŒãéå»ã®ç¶æ
ãããã³ã¢ã¯ã·ã§ã³ã«åºã¥ããŠå°æ¥ã®åžæãªã¿ãŒã³ã«ã€ãªããã¢ã¯ã·ã§ã³ã®ç³»åãçæããŸããæåŸã® *K* ã¿ã€ã ã¹ãããã§ã¯ã3ã€ã®ã¢ããªãã£ãããããããŒã¯ã³åã蟌ã¿ã«å€æãããå°æ¥ã®ã¢ã¯ã·ã§ã³ããŒã¯ã³ãäºæž¬ããããã«GPTã®ãããªã¢ãã«ã«ãã£ãŠåŠçãããŸãã[Trajectory Transformer](model_doc/trajectory_transformer) ãç¶æ
ãã¢ã¯ã·ã§ã³ãå ±é
¬ãããŒã¯ã³åããGPTã¢ãŒããã¯ãã£ã§åŠçããŸããå ±é
¬èª¿æŽã«çŠç¹ãåœãŠãDecision Transformerãšã¯ç°ãªããTrajectory Transformerã¯ããŒã ãµãŒãã䜿çšããŠå°æ¥ã®ã¢ã¯ã·ã§ã³ãçæããŸãã
|