DrDavis

Upload folder using huggingface_hub

17c6d62 verified 10 months ago

preview code

raw

history blame contribute delete

7.7 kB

Data2Vec

Overview

Data2Vec モデルは、data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language で Alexei Baevski、Wei-Ning Hsu、Qiantong Xu、バArun Babu, Jiatao Gu and Michael Auli. Data2Vec は、テキスト、音声、画像などのさまざまなデータモダリティにわたる自己教師あり学習のための統一フレームワークを提案します。重要なのは、事前トレーニングの予測ターゲットは、モダリティ固有のコンテキストに依存しないターゲットではなく、入力のコンテキスト化された潜在表現であることです。

論文の要約は次のとおりです。

自己教師あり学習の一般的な考え方はどのモダリティでも同じですが、実際のアルゴリズムと単一のモダリティを念頭に置いて開発されたため、目的は大きく異なります。一般に近づけるために自己教師あり学習では、どちらの音声に対しても同じ学習方法を使用するフレームワークである data2vec を紹介します。 NLP またはコンピュータービジョン。中心となるアイデアは、完全な入力データの潜在的な表現を、標準の Transformer アーキテクチャを使用した自己蒸留セットアップの入力のマスクされたビュー。単語、視覚的トークン、人間の音声単位などのモダリティ固有のターゲットを予測するのではなく、本質的にローカルであるため、data2vec は、からの情報を含む文脈化された潜在表現を予測します。入力全体。音声認識、画像分類、および自然言語理解は、新しい最先端技術や、主流のアプローチに匹敵するパフォーマンスを実証します。モデルとコードは、www.github.com/pytorch/fairseq/tree/master/examples/data2vec. で入手できます。

このモデルは、edugp および patrickvonplaten によって提供されました。 sayakpaul と Rocketknight1 は、TensorFlow のビジョンに Data2Vec を提供しました。

元のコード (NLP および音声用) は、こちらにあります。ビジョンの元のコードはこちらにあります。

Usage tips

Data2VecAudio、Data2VecText、および Data2VecVision はすべて、同じ自己教師あり学習方法を使用してトレーニングされています。
Data2VecAudio の場合、前処理は特徴抽出を含めて [Wav2Vec2Model] と同じです。
Data2VecText の場合、前処理はトークン化を含めて [RobertaModel] と同じです。
Data2VecVision の場合、前処理は特徴抽出を含めて [BeitModel] と同じです。

Resources

Data2Vec の使用を開始するのに役立つ公式 Hugging Face およびコミュニティ (🌎 で示される) リソースのリスト。

[Data2VecVisionForImageClassification] は、このサンプルスクリプトおよびノートブック。
カスタムデータセットで [TFData2VecVisionForImageClassification] を微調整するには、このノートブックを参照してください。）。

Data2VecText ドキュメントリソース

Data2VecAudio ドキュメントリソース

Data2VecVision ドキュメントリソース

ここに含めるリソースの送信に興味がある場合は、お気軽にプルリクエストを開いてください。審査させていただきます。リソースは、既存のリソースを複製するのではなく、何か新しいものを示すことが理想的です。

Data2VecTextConfig

[[autodoc]] Data2VecTextConfig

Data2VecAudioConfig

[[autodoc]] Data2VecAudioConfig

Data2VecVisionConfig

[[autodoc]] Data2VecVisionConfig

Data2VecAudioModel

[[autodoc]] Data2VecAudioModel - forward

Data2VecAudioForAudioFrameClassification

[[autodoc]] Data2VecAudioForAudioFrameClassification - forward

Data2VecAudioForCTC

[[autodoc]] Data2VecAudioForCTC - forward

Data2VecAudioForSequenceClassification

[[autodoc]] Data2VecAudioForSequenceClassification - forward

Data2VecAudioForXVector

[[autodoc]] Data2VecAudioForXVector - forward

Data2VecTextModel

[[autodoc]] Data2VecTextModel - forward

Data2VecTextForCausalLM

[[autodoc]] Data2VecTextForCausalLM - forward

Data2VecTextForMaskedLM

[[autodoc]] Data2VecTextForMaskedLM - forward

Data2VecTextForSequenceClassification

[[autodoc]] Data2VecTextForSequenceClassification - forward

Data2VecTextForMultipleChoice

[[autodoc]] Data2VecTextForMultipleChoice - forward

Data2VecTextForTokenClassification

[[autodoc]] Data2VecTextForTokenClassification - forward

Data2VecTextForQuestionAnswering

[[autodoc]] Data2VecTextForQuestionAnswering - forward

Data2VecVisionModel

[[autodoc]] Data2VecVisionModel - forward

Data2VecVisionForImageClassification

[[autodoc]] Data2VecVisionForImageClassification - forward

Data2VecVisionForSemanticSegmentation

[[autodoc]] Data2VecVisionForSemanticSegmentation - forward

TFData2VecVisionModel

[[autodoc]] TFData2VecVisionModel - call

TFData2VecVisionForImageClassification

[[autodoc]] TFData2VecVisionForImageClassification - call

TFData2VecVisionForSemanticSegmentation

[[autodoc]] TFData2VecVisionForSemanticSegmentation - call