Buckets:
| # CTRL | |
| ## Overview | |
| CTRL モデルは、Nitish Shirish Keskar*、Bryan McCann*、Lav R. Varshney、Caiming Xiong, Richard Socher によって [CTRL: A Conditional Transformer Language Model for Controllable Generation](https://huggingface.co/papers/1909.05858) で提案されました。 | |
| リチャード・ソーチャー。これは、非常に大規模なコーパスの言語モデリングを使用して事前トレーニングされた因果的 (一方向) トランスフォーマーです | |
| 最初のトークンが制御コード (リンク、書籍、Wikipedia など) として予約されている、約 140 GB のテキスト データ。 | |
| 論文の要約は次のとおりです。 | |
| *大規模な言語モデルは有望なテキスト生成機能を示していますが、ユーザーは特定の言語モデルを簡単に制御できません | |
| 生成されたテキストの側面。 16 億 3,000 万パラメータの条件付きトランスフォーマー言語モデルである CTRL をリリースします。 | |
| スタイル、コンテンツ、タスク固有の動作を制御する制御コードを条件付けるように訓練されています。制御コードは | |
| 生のテキストと自然に共生する構造から派生し、教師なし学習の利点を維持しながら、 | |
| テキスト生成をより明示的に制御できるようになります。これらのコードを使用すると、CTRL でどの部分が予測されるのかを予測することもできます。 | |
| トレーニング データにはシーケンスが与えられる可能性が最も高くなります。これにより、大量のデータを分析するための潜在的な方法が提供されます。 | |
| モデルベースのソース帰属を介して。* | |
| このモデルは、[keskarnitishr](https://huggingface.co/keskarnitishr) によって提供されました。元のコードが見つかる | |
| [こちら](https://github.com/salesforce/Salesforce/ctrl)。 | |
| ## Usage tips | |
| - CTRL は制御コードを利用してテキストを生成します。生成を特定の単語や文で開始する必要があります。 | |
| またはリンクして一貫したテキストを生成します。 [元の実装](https://github.com/salesforce/Salesforce/ctrl) を参照してください。 | |
| 詳しくは。 | |
| - CTRL は絶対位置埋め込みを備えたモデルであるため、通常は入力を右側にパディングすることをお勧めします。 | |
| 左。 | |
| - CTRL は因果言語モデリング (CLM) の目的でトレーニングされているため、次の予測に強力です。 | |
| シーケンス内のトークン。この機能を利用すると、CTRL は構文的に一貫したテキストを生成できるようになります。 | |
| *run_generation.py* サンプル スクリプトで確認できます。 | |
| - PyTorch モデルは、以前に計算されたキーと値のアテンション ペアである`past_key_values`を入力として受け取ることができます。 | |
| TensorFlow モデルは`past`を入力として受け入れます。 `past_key_values`値を使用すると、モデルが再計算されなくなります。 | |
| テキスト生成のコンテキストで事前に計算された値。 [`forward`](model_doc/ctrl#transformers.CTRLModel.forward) を参照してください。 | |
| この引数の使用法の詳細については、メソッドを参照してください。 | |
| ## Resources | |
| - [テキスト分類タスクガイド(英語版)](../../en/tasks/sequence_classification) | |
| - [因果言語モデリング タスク ガイド](../tasks/language_modeling) | |
| ## CTRLConfig[[transformers.CTRLConfig]] | |
| #### transformers.CTRLConfig[[transformers.CTRLConfig]] | |
| [Source](https://github.com/huggingface/transformers/blob/main/src/transformers/models/ctrl/configuration_ctrl.py#L24) | |
| This is the configuration class to store the configuration of a CTRLModel. It is used to instantiate a Ctrl | |
| model according to the specified arguments, defining the model architecture. Instantiating a configuration with the | |
| defaults will yield a similar configuration to that of the [Salesforce/ctrl](https://huggingface.co/Salesforce/ctrl) | |
| Configuration objects inherit from [PreTrainedConfig](/docs/transformers/main/ja/main_classes/configuration#transformers.PreTrainedConfig) and can be used to control the model outputs. Read the | |
| documentation from [PreTrainedConfig](/docs/transformers/main/ja/main_classes/configuration#transformers.PreTrainedConfig) for more information. | |
| Examples: | |
| ```python | |
| >>> from transformers import CTRLConfig, CTRLModel | |
| >>> # Initializing a CTRL configuration | |
| >>> configuration = CTRLConfig() | |
| >>> # Initializing a model (with random weights) from the configuration | |
| >>> model = CTRLModel(configuration) | |
| >>> # Accessing the model configuration | |
| >>> configuration = model.config | |
| ``` | |
| **Parameters:** | |
| vocab_size (`int`, *optional*, defaults to `246534`) : Vocabulary size of the model. Defines the number of different tokens that can be represented by the `input_ids`. | |
| n_positions (`int`, *optional*, defaults to `256`) : The maximum sequence length that this model might ever be used with. | |
| n_embd (`int`, *optional*, defaults to `1280`) : Dimensionality of the embeddings and hidden states. | |
| dff (`int`, *optional*, defaults to 8192) : Dimensionality of the inner dimension of the feed forward networks (FFN). | |
| n_layer (`int`, *optional*, defaults to `48`) : Number of hidden layers in the Transformer decoder. | |
| n_head (`int`, *optional*, defaults to `16`) : Number of attention heads for each attention layer in the Transformer decoder. | |
| resid_pdrop (`Union[float, int]`, *optional*, defaults to `0.1`) : The dropout probability for all fully connected layers in the embeddings, encoder, and pooler. | |
| embd_pdrop (`Union[float, int]`, *optional*, defaults to `0.1`) : The dropout ratio for the embeddings. | |
| layer_norm_epsilon (`float`, *optional*, defaults to `1e-06`) : The epsilon used by the layer normalization layers. | |
| initializer_range (`float`, *optional*, defaults to `0.02`) : The standard deviation of the truncated_normal_initializer for initializing all weight matrices. | |
| use_cache (`bool`, *optional*, defaults to `True`) : Whether or not the model should return the last key/values attentions (not used by all models). Only relevant if `config.is_decoder=True` or when the model is a decoder-only generative model. | |
| pad_token_id (`int`, *optional*) : Token id used for padding in the vocabulary. | |
| bos_token_id (`int`, *optional*) : Token id used for beginning-of-stream in the vocabulary. | |
| eos_token_id (`Union[int, list[int]]`, *optional*) : Token id used for end-of-stream in the vocabulary. | |
| tie_word_embeddings (`bool`, *optional*, defaults to `True`) : Whether to tie weight embeddings according to model's `tied_weights_keys` mapping. | |
| ## CTRLTokenizer[[transformers.CTRLTokenizer]] | |
| #### transformers.CTRLTokenizer[[transformers.CTRLTokenizer]] | |
| [Source](https://github.com/huggingface/transformers/blob/main/src/transformers/models/ctrl/tokenization_ctrl.py#L107) | |
| Construct a CTRL tokenizer. Based on Byte-Pair-Encoding. | |
| This tokenizer inherits from [PreTrainedTokenizer](/docs/transformers/main/ja/main_classes/tokenizer#transformers.PythonBackend) which contains most of the main methods. Users should refer to | |
| this superclass for more information regarding those methods. | |
| save_vocabularytransformers.CTRLTokenizer.save_vocabularyhttps://github.com/huggingface/transformers/blob/main/src/transformers/tokenization_python.py#L1358[{"name": "save_directory", "val": ": str"}, {"name": "filename_prefix", "val": ": str | None = None"}]- **save_directory** (`str`) -- | |
| The directory in which to save the vocabulary. | |
| - **filename_prefix** (`str`, *optional*) -- | |
| An optional prefix to add to the named of the saved files.0`tuple[str, ...]`Paths to the files saved, or empty tuple if no files saved. | |
| Default implementation for common vocabulary saving patterns. | |
| Saves self.encoder/self.vocab as JSON, optionally with self.bpe_ranks as merges. | |
| Returns empty tuple if no vocabulary exists. | |
| Override this method if your tokenizer needs custom saving logic (e.g., SentencePiece models, | |
| multiple vocabulary files, or special file formats). | |
| **Parameters:** | |
| vocab_file (`str`) : Path to the vocabulary file. | |
| merges_file (`str`) : Path to the merges file. | |
| unk_token (`str`, *optional*, defaults to `"<unk>"`) : The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this token instead. | |
| **Returns:** | |
| ``tuple[str, ...]`` | |
| Paths to the files saved, or empty tuple if no files saved. | |
| ## CTRLModel[[transformers.CTRLModel]] | |
| #### transformers.CTRLModel[[transformers.CTRLModel]] | |
| [Source](https://github.com/huggingface/transformers/blob/main/src/transformers/models/ctrl/modeling_ctrl.py#L196) | |
| The bare Ctrl Model outputting raw hidden-states without any specific head on top. | |
| This model inherits from [PreTrainedModel](/docs/transformers/main/ja/main_classes/model#transformers.PreTrainedModel). Check the superclass documentation for the generic methods the | |
| library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads | |
| etc.) | |
| This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass. | |
| Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage | |
| and behavior. | |
| forwardtransformers.CTRLModel.forwardhttps://github.com/huggingface/transformers/blob/main/src/transformers/models/ctrl/modeling_ctrl.py#L227[{"name": "input_ids", "val": ": torch.LongTensor | None = None"}, {"name": "past_key_values", "val": ": transformers.cache_utils.Cache | None = None"}, {"name": "attention_mask", "val": ": torch.FloatTensor | None = None"}, {"name": "token_type_ids", "val": ": torch.LongTensor | None = None"}, {"name": "position_ids", "val": ": torch.LongTensor | None = None"}, {"name": "inputs_embeds", "val": ": torch.FloatTensor | None = None"}, {"name": "use_cache", "val": ": bool | None = None"}, {"name": "output_attentions", "val": ": bool | None = None"}, {"name": "output_hidden_states", "val": ": bool | None = None"}, {"name": "return_dict", "val": ": bool | None = None"}, {"name": "**kwargs", "val": ""}]- **input_ids** (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Indices of input sequence tokens in the vocabulary. Padding will be ignored by default. | |
| Indices can be obtained using [AutoTokenizer](/docs/transformers/main/ja/model_doc/auto#transformers.AutoTokenizer). See [PreTrainedTokenizer.encode()](/docs/transformers/main/ja/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.encode) and | |
| [PreTrainedTokenizer.__call__()](/docs/transformers/main/ja/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.__call__) for details. | |
| [What are input IDs?](../glossary#input-ids) | |
| - **past_key_values** (`~cache_utils.Cache`, *optional*) -- | |
| Pre-computed hidden-states (key and values in the self-attention blocks and in the cross-attention | |
| blocks) that can be used to speed up sequential decoding. This typically consists in the `past_key_values` | |
| returned by the model at a previous stage of decoding, when `use_cache=True` or `config.use_cache=True`. | |
| Only `Cache` instance is allowed as input, see our [kv cache guide](https://huggingface.co/docs/transformers/en/kv_cache). | |
| If no `past_key_values` are passed, `DynamicCache` will be initialized by default. | |
| The model will output the same cache format that is fed as input. | |
| If `past_key_values` are used, the user is expected to input only unprocessed `input_ids` (those that don't | |
| have their past key value states given to this model) of shape `(batch_size, unprocessed_length)` instead of all `input_ids` | |
| of shape `(batch_size, sequence_length)`. | |
| - **attention_mask** (`torch.FloatTensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`: | |
| - 1 for tokens that are **not masked**, | |
| - 0 for tokens that are **masked**. | |
| [What are attention masks?](../glossary#attention-mask) | |
| - **token_type_ids** (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Segment token indices to indicate first and second portions of the inputs. Indices are selected in `[0, 1]`: | |
| - 0 corresponds to a *sentence A* token, | |
| - 1 corresponds to a *sentence B* token. | |
| [What are token type IDs?](../glossary#token-type-ids) | |
| - **position_ids** (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Indices of positions of each input sequence tokens in the position embeddings. Selected in the range `[0, config.n_positions - 1]`. | |
| [What are position IDs?](../glossary#position-ids) | |
| - **inputs_embeds** (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*) -- | |
| Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. This | |
| is useful if you want more control over how to convert `input_ids` indices into associated vectors than the | |
| model's internal embedding lookup matrix. | |
| - **use_cache** (`bool`, *optional*) -- | |
| If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding (see | |
| `past_key_values`). | |
| - **output_attentions** (`bool`, *optional*) -- | |
| Whether or not to return the attentions tensors of all attention layers. See `attentions` under returned | |
| tensors for more detail. | |
| - **output_hidden_states** (`bool`, *optional*) -- | |
| Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors for | |
| more detail. | |
| - **return_dict** (`bool`, *optional*) -- | |
| Whether or not to return a [ModelOutput](/docs/transformers/main/ja/main_classes/output#transformers.utils.ModelOutput) instead of a plain tuple.0[BaseModelOutputWithPast](/docs/transformers/main/ja/main_classes/output#transformers.modeling_outputs.BaseModelOutputWithPast) or `tuple(torch.FloatTensor)`A [BaseModelOutputWithPast](/docs/transformers/main/ja/main_classes/output#transformers.modeling_outputs.BaseModelOutputWithPast) or a tuple of | |
| `torch.FloatTensor` (if `return_dict=False` is passed or when `config.return_dict=False`) comprising various | |
| elements depending on the configuration ([CTRLConfig](/docs/transformers/main/ja/model_doc/ctrl#transformers.CTRLConfig)) and inputs. | |
| The [CTRLModel](/docs/transformers/main/ja/model_doc/ctrl#transformers.CTRLModel) forward method, overrides the `__call__` special method. | |
| Although the recipe for forward pass needs to be defined within this function, one should call the `Module` | |
| instance afterwards instead of this since the former takes care of running the pre and post processing steps while | |
| the latter silently ignores them. | |
| - **last_hidden_state** (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`) -- Sequence of hidden-states at the output of the last layer of the model. | |
| If `past_key_values` is used only the last hidden-state of the sequences of shape `(batch_size, 1, | |
| hidden_size)` is output. | |
| - **past_key_values** (`Cache`, *optional*, returned when `use_cache=True` is passed or when `config.use_cache=True`) -- It is a `Cache` instance. For more details, see our [kv cache guide](https://huggingface.co/docs/transformers/en/kv_cache). | |
| Contains pre-computed hidden-states (key and values in the self-attention blocks and optionally if | |
| `config.is_encoder_decoder=True` in the cross-attention blocks) that can be used (see `past_key_values` | |
| input) to speed up sequential decoding. | |
| - **hidden_states** (`tuple(torch.FloatTensor)`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`) -- Tuple of `torch.FloatTensor` (one for the output of the embeddings, if the model has an embedding layer, + | |
| one for the output of each layer) of shape `(batch_size, sequence_length, hidden_size)`. | |
| Hidden-states of the model at the output of each layer plus the optional initial embedding outputs. | |
| - **attentions** (`tuple(torch.FloatTensor)`, *optional*, returned when `output_attentions=True` is passed or when `config.output_attentions=True`) -- Tuple of `torch.FloatTensor` (one for each layer) of shape `(batch_size, num_heads, sequence_length, | |
| sequence_length)`. | |
| Attentions weights after the attention softmax, used to compute the weighted average in the self-attention | |
| heads. | |
| Example: | |
| ```python | |
| >>> from transformers import AutoTokenizer, CTRLModel | |
| >>> import torch | |
| >>> tokenizer = AutoTokenizer.from_pretrained("Salesforce/ctrl") | |
| >>> model = CTRLModel.from_pretrained("Salesforce/ctrl") | |
| >>> # CTRL was trained with control codes as the first token | |
| >>> inputs = tokenizer("Opinion My dog is cute", return_tensors="pt") | |
| >>> assert inputs["input_ids"][0, 0].item() in tokenizer.control_codes.values() | |
| >>> outputs = model(**inputs) | |
| >>> last_hidden_states = outputs.last_hidden_state | |
| >>> list(last_hidden_states.shape) | |
| [1, 5, 1280] | |
| ``` | |
| **Parameters:** | |
| config ([CTRLModel](/docs/transformers/main/ja/model_doc/ctrl#transformers.CTRLModel)) : Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the [from_pretrained()](/docs/transformers/main/ja/main_classes/model#transformers.PreTrainedModel.from_pretrained) method to load the model weights. | |
| **Returns:** | |
| `[BaseModelOutputWithPast](/docs/transformers/main/ja/main_classes/output#transformers.modeling_outputs.BaseModelOutputWithPast) or `tuple(torch.FloatTensor)`` | |
| A [BaseModelOutputWithPast](/docs/transformers/main/ja/main_classes/output#transformers.modeling_outputs.BaseModelOutputWithPast) or a tuple of | |
| `torch.FloatTensor` (if `return_dict=False` is passed or when `config.return_dict=False`) comprising various | |
| elements depending on the configuration ([CTRLConfig](/docs/transformers/main/ja/model_doc/ctrl#transformers.CTRLConfig)) and inputs. | |
| ## CTRLLMHeadModel[[transformers.CTRLLMHeadModel]] | |
| #### transformers.CTRLLMHeadModel[[transformers.CTRLLMHeadModel]] | |
| [Source](https://github.com/huggingface/transformers/blob/main/src/transformers/models/ctrl/modeling_ctrl.py#L375) | |
| The CTRL Model transformer with a language modeling head on top (linear layer with weights tied to the input | |
| embeddings). | |
| This model inherits from [PreTrainedModel](/docs/transformers/main/ja/main_classes/model#transformers.PreTrainedModel). Check the superclass documentation for the generic methods the | |
| library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads | |
| etc.) | |
| This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass. | |
| Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage | |
| and behavior. | |
| forwardtransformers.CTRLLMHeadModel.forwardhttps://github.com/huggingface/transformers/blob/main/src/transformers/models/ctrl/modeling_ctrl.py#L386[{"name": "input_ids", "val": ": torch.LongTensor | None = None"}, {"name": "past_key_values", "val": ": transformers.cache_utils.Cache | None = None"}, {"name": "attention_mask", "val": ": torch.FloatTensor | None = None"}, {"name": "token_type_ids", "val": ": torch.LongTensor | None = None"}, {"name": "position_ids", "val": ": torch.LongTensor | None = None"}, {"name": "inputs_embeds", "val": ": torch.FloatTensor | None = None"}, {"name": "labels", "val": ": torch.LongTensor | None = None"}, {"name": "use_cache", "val": ": bool | None = None"}, {"name": "output_attentions", "val": ": bool | None = None"}, {"name": "output_hidden_states", "val": ": bool | None = None"}, {"name": "return_dict", "val": ": bool | None = None"}, {"name": "logits_to_keep", "val": ": int | torch.Tensor = 0"}, {"name": "**kwargs", "val": ""}]- **input_ids** (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Indices of input sequence tokens in the vocabulary. Padding will be ignored by default. | |
| Indices can be obtained using [AutoTokenizer](/docs/transformers/main/ja/model_doc/auto#transformers.AutoTokenizer). See [PreTrainedTokenizer.encode()](/docs/transformers/main/ja/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.encode) and | |
| [PreTrainedTokenizer.__call__()](/docs/transformers/main/ja/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.__call__) for details. | |
| [What are input IDs?](../glossary#input-ids) | |
| - **past_key_values** (`~cache_utils.Cache`, *optional*) -- | |
| Pre-computed hidden-states (key and values in the self-attention blocks and in the cross-attention | |
| blocks) that can be used to speed up sequential decoding. This typically consists in the `past_key_values` | |
| returned by the model at a previous stage of decoding, when `use_cache=True` or `config.use_cache=True`. | |
| Only `Cache` instance is allowed as input, see our [kv cache guide](https://huggingface.co/docs/transformers/en/kv_cache). | |
| If no `past_key_values` are passed, `DynamicCache` will be initialized by default. | |
| The model will output the same cache format that is fed as input. | |
| If `past_key_values` are used, the user is expected to input only unprocessed `input_ids` (those that don't | |
| have their past key value states given to this model) of shape `(batch_size, unprocessed_length)` instead of all `input_ids` | |
| of shape `(batch_size, sequence_length)`. | |
| - **attention_mask** (`torch.FloatTensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`: | |
| - 1 for tokens that are **not masked**, | |
| - 0 for tokens that are **masked**. | |
| [What are attention masks?](../glossary#attention-mask) | |
| - **token_type_ids** (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Segment token indices to indicate first and second portions of the inputs. Indices are selected in `[0, 1]`: | |
| - 0 corresponds to a *sentence A* token, | |
| - 1 corresponds to a *sentence B* token. | |
| [What are token type IDs?](../glossary#token-type-ids) | |
| - **position_ids** (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Indices of positions of each input sequence tokens in the position embeddings. Selected in the range `[0, config.n_positions - 1]`. | |
| [What are position IDs?](../glossary#position-ids) | |
| - **inputs_embeds** (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*) -- | |
| Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. This | |
| is useful if you want more control over how to convert `input_ids` indices into associated vectors than the | |
| model's internal embedding lookup matrix. | |
| - **labels** (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Labels for language modeling. Note that the labels **are shifted** inside the model, i.e. you can set | |
| `labels = input_ids` Indices are selected in `[-100, 0, ..., config.vocab_size]` All labels set to `-100` | |
| are ignored (masked), the loss is only computed for labels in `[0, ..., config.vocab_size]` | |
| - **use_cache** (`bool`, *optional*) -- | |
| If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding (see | |
| `past_key_values`). | |
| - **output_attentions** (`bool`, *optional*) -- | |
| Whether or not to return the attentions tensors of all attention layers. See `attentions` under returned | |
| tensors for more detail. | |
| - **output_hidden_states** (`bool`, *optional*) -- | |
| Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors for | |
| more detail. | |
| - **return_dict** (`bool`, *optional*) -- | |
| Whether or not to return a [ModelOutput](/docs/transformers/main/ja/main_classes/output#transformers.utils.ModelOutput) instead of a plain tuple. | |
| - **logits_to_keep** (`Union[int, torch.Tensor]`, *optional*, defaults to `0`) -- | |
| If an `int`, compute logits for the last `logits_to_keep` tokens. If `0`, calculate logits for all | |
| `input_ids` (special case). Only last token logits are needed for generation, and calculating them only for that | |
| token can save memory, which becomes pretty significant for long sequences or large vocabulary size. | |
| If a `torch.Tensor`, must be 1D corresponding to the indices to keep in the sequence length dimension. | |
| This is useful when using packed tensor format (single dimension for batch and sequence length).0[CausalLMOutputWithPast](/docs/transformers/main/ja/main_classes/output#transformers.modeling_outputs.CausalLMOutputWithPast) or `tuple(torch.FloatTensor)`A [CausalLMOutputWithPast](/docs/transformers/main/ja/main_classes/output#transformers.modeling_outputs.CausalLMOutputWithPast) or a tuple of | |
| `torch.FloatTensor` (if `return_dict=False` is passed or when `config.return_dict=False`) comprising various | |
| elements depending on the configuration ([CTRLConfig](/docs/transformers/main/ja/model_doc/ctrl#transformers.CTRLConfig)) and inputs. | |
| The [CTRLLMHeadModel](/docs/transformers/main/ja/model_doc/ctrl#transformers.CTRLLMHeadModel) forward method, overrides the `__call__` special method. | |
| Although the recipe for forward pass needs to be defined within this function, one should call the `Module` | |
| instance afterwards instead of this since the former takes care of running the pre and post processing steps while | |
| the latter silently ignores them. | |
| - **loss** (`torch.FloatTensor` of shape `(1,)`, *optional*, returned when `labels` is provided) -- Language modeling loss (for next-token prediction). | |
| - **logits** (`torch.FloatTensor` of shape `(batch_size, sequence_length, config.vocab_size)`) -- Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax). | |
| - **past_key_values** (`Cache`, *optional*, returned when `use_cache=True` is passed or when `config.use_cache=True`) -- It is a `Cache` instance. For more details, see our [kv cache guide](https://huggingface.co/docs/transformers/en/kv_cache). | |
| Contains pre-computed hidden-states (key and values in the self-attention blocks) that can be used (see | |
| `past_key_values` input) to speed up sequential decoding. | |
| - **hidden_states** (`tuple(torch.FloatTensor)`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`) -- Tuple of `torch.FloatTensor` (one for the output of the embeddings, if the model has an embedding layer, + | |
| one for the output of each layer) of shape `(batch_size, sequence_length, hidden_size)`. | |
| Hidden-states of the model at the output of each layer plus the optional initial embedding outputs. | |
| - **attentions** (`tuple(torch.FloatTensor)`, *optional*, returned when `output_attentions=True` is passed or when `config.output_attentions=True`) -- Tuple of `torch.FloatTensor` (one for each layer) of shape `(batch_size, num_heads, sequence_length, | |
| sequence_length)`. | |
| Attentions weights after the attention softmax, used to compute the weighted average in the self-attention | |
| heads. | |
| Example: | |
| ```python | |
| >>> import torch | |
| >>> from transformers import AutoTokenizer, CTRLLMHeadModel | |
| >>> tokenizer = AutoTokenizer.from_pretrained("Salesforce/ctrl") | |
| >>> model = CTRLLMHeadModel.from_pretrained("Salesforce/ctrl") | |
| >>> # CTRL was trained with control codes as the first token | |
| >>> inputs = tokenizer("Wikipedia The llama is", return_tensors="pt") | |
| >>> assert inputs["input_ids"][0, 0].item() in tokenizer.control_codes.values() | |
| >>> sequence_ids = model.generate(inputs["input_ids"]) | |
| >>> sequences = tokenizer.batch_decode(sequence_ids) | |
| >>> sequences | |
| ['Wikipedia The llama is a member of the family Bovidae. It is native to the Andes of Peru,'] | |
| >>> outputs = model(**inputs, labels=inputs["input_ids"]) | |
| >>> round(outputs.loss.item(), 2) | |
| 9.21 | |
| >>> list(outputs.logits.shape) | |
| [1, 5, 246534] | |
| ``` | |
| **Parameters:** | |
| config ([CTRLLMHeadModel](/docs/transformers/main/ja/model_doc/ctrl#transformers.CTRLLMHeadModel)) : Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the [from_pretrained()](/docs/transformers/main/ja/main_classes/model#transformers.PreTrainedModel.from_pretrained) method to load the model weights. | |
| **Returns:** | |
| `[CausalLMOutputWithPast](/docs/transformers/main/ja/main_classes/output#transformers.modeling_outputs.CausalLMOutputWithPast) or `tuple(torch.FloatTensor)`` | |
| A [CausalLMOutputWithPast](/docs/transformers/main/ja/main_classes/output#transformers.modeling_outputs.CausalLMOutputWithPast) or a tuple of | |
| `torch.FloatTensor` (if `return_dict=False` is passed or when `config.return_dict=False`) comprising various | |
| elements depending on the configuration ([CTRLConfig](/docs/transformers/main/ja/model_doc/ctrl#transformers.CTRLConfig)) and inputs. | |
| ## CTRLForSequenceClassification[[transformers.CTRLForSequenceClassification]] | |
| #### transformers.CTRLForSequenceClassification[[transformers.CTRLForSequenceClassification]] | |
| [Source](https://github.com/huggingface/transformers/blob/main/src/transformers/models/ctrl/modeling_ctrl.py#L505) | |
| The CTRL Model transformer with a sequence classification head on top (linear layer). | |
| [CTRLForSequenceClassification](/docs/transformers/main/ja/model_doc/ctrl#transformers.CTRLForSequenceClassification) uses the last token in order to do the classification, as other causal models | |
| (e.g. GPT-2) do. Since it does classification on the last token, it requires to know the position of the last | |
| token. If a `pad_token_id` is defined in the configuration, it finds the last token that is not a padding token in | |
| each row. If no `pad_token_id` is defined, it simply takes the last value in each row of the batch. Since it cannot | |
| guess the padding tokens when `inputs_embeds` are passed instead of `input_ids`, it does the same (take the last | |
| value in each row of the batch). | |
| This model inherits from [PreTrainedModel](/docs/transformers/main/ja/main_classes/model#transformers.PreTrainedModel). Check the superclass documentation for the generic methods the | |
| library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads | |
| etc.) | |
| This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass. | |
| Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage | |
| and behavior. | |
| forwardtransformers.CTRLForSequenceClassification.forwardhttps://github.com/huggingface/transformers/blob/main/src/transformers/models/ctrl/modeling_ctrl.py#L515[{"name": "input_ids", "val": ": torch.LongTensor | None = None"}, {"name": "past_key_values", "val": ": transformers.cache_utils.Cache | None = None"}, {"name": "attention_mask", "val": ": torch.FloatTensor | None = None"}, {"name": "token_type_ids", "val": ": torch.LongTensor | None = None"}, {"name": "position_ids", "val": ": torch.LongTensor | None = None"}, {"name": "inputs_embeds", "val": ": torch.FloatTensor | None = None"}, {"name": "labels", "val": ": torch.LongTensor | None = None"}, {"name": "use_cache", "val": ": bool | None = None"}, {"name": "output_attentions", "val": ": bool | None = None"}, {"name": "output_hidden_states", "val": ": bool | None = None"}, {"name": "return_dict", "val": ": bool | None = None"}, {"name": "**kwargs", "val": ""}]- **input_ids** (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Indices of input sequence tokens in the vocabulary. Padding will be ignored by default. | |
| Indices can be obtained using [AutoTokenizer](/docs/transformers/main/ja/model_doc/auto#transformers.AutoTokenizer). See [PreTrainedTokenizer.encode()](/docs/transformers/main/ja/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.encode) and | |
| [PreTrainedTokenizer.__call__()](/docs/transformers/main/ja/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.__call__) for details. | |
| [What are input IDs?](../glossary#input-ids) | |
| - **past_key_values** (`~cache_utils.Cache`, *optional*) -- | |
| Pre-computed hidden-states (key and values in the self-attention blocks and in the cross-attention | |
| blocks) that can be used to speed up sequential decoding. This typically consists in the `past_key_values` | |
| returned by the model at a previous stage of decoding, when `use_cache=True` or `config.use_cache=True`. | |
| Only `Cache` instance is allowed as input, see our [kv cache guide](https://huggingface.co/docs/transformers/en/kv_cache). | |
| If no `past_key_values` are passed, `DynamicCache` will be initialized by default. | |
| The model will output the same cache format that is fed as input. | |
| If `past_key_values` are used, the user is expected to input only unprocessed `input_ids` (those that don't | |
| have their past key value states given to this model) of shape `(batch_size, unprocessed_length)` instead of all `input_ids` | |
| of shape `(batch_size, sequence_length)`. | |
| - **attention_mask** (`torch.FloatTensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`: | |
| - 1 for tokens that are **not masked**, | |
| - 0 for tokens that are **masked**. | |
| [What are attention masks?](../glossary#attention-mask) | |
| - **token_type_ids** (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Segment token indices to indicate first and second portions of the inputs. Indices are selected in `[0, 1]`: | |
| - 0 corresponds to a *sentence A* token, | |
| - 1 corresponds to a *sentence B* token. | |
| [What are token type IDs?](../glossary#token-type-ids) | |
| - **position_ids** (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Indices of positions of each input sequence tokens in the position embeddings. Selected in the range `[0, config.n_positions - 1]`. | |
| [What are position IDs?](../glossary#position-ids) | |
| - **inputs_embeds** (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*) -- | |
| Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. This | |
| is useful if you want more control over how to convert `input_ids` indices into associated vectors than the | |
| model's internal embedding lookup matrix. | |
| - **labels** (`torch.LongTensor` of shape `(batch_size,)`, *optional*) -- | |
| Labels for computing the sequence classification/regression loss. Indices should be in `[0, ..., | |
| config.num_labels - 1]`. If `config.num_labels == 1` a regression loss is computed (Mean-Square loss), If | |
| `config.num_labels > 1` a classification loss is computed (Cross-Entropy). | |
| - **use_cache** (`bool`, *optional*) -- | |
| If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding (see | |
| `past_key_values`). | |
| - **output_attentions** (`bool`, *optional*) -- | |
| Whether or not to return the attentions tensors of all attention layers. See `attentions` under returned | |
| tensors for more detail. | |
| - **output_hidden_states** (`bool`, *optional*) -- | |
| Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors for | |
| more detail. | |
| - **return_dict** (`bool`, *optional*) -- | |
| Whether or not to return a [ModelOutput](/docs/transformers/main/ja/main_classes/output#transformers.utils.ModelOutput) instead of a plain tuple.0[SequenceClassifierOutput](/docs/transformers/main/ja/main_classes/output#transformers.modeling_outputs.SequenceClassifierOutput) or `tuple(torch.FloatTensor)`A [SequenceClassifierOutput](/docs/transformers/main/ja/main_classes/output#transformers.modeling_outputs.SequenceClassifierOutput) or a tuple of | |
| `torch.FloatTensor` (if `return_dict=False` is passed or when `config.return_dict=False`) comprising various | |
| elements depending on the configuration ([CTRLConfig](/docs/transformers/main/ja/model_doc/ctrl#transformers.CTRLConfig)) and inputs. | |
| The [CTRLForSequenceClassification](/docs/transformers/main/ja/model_doc/ctrl#transformers.CTRLForSequenceClassification) forward method, overrides the `__call__` special method. | |
| Although the recipe for forward pass needs to be defined within this function, one should call the `Module` | |
| instance afterwards instead of this since the former takes care of running the pre and post processing steps while | |
| the latter silently ignores them. | |
| - **loss** (`torch.FloatTensor` of shape `(1,)`, *optional*, returned when `labels` is provided) -- Classification (or regression if config.num_labels==1) loss. | |
| - **logits** (`torch.FloatTensor` of shape `(batch_size, config.num_labels)`) -- Classification (or regression if config.num_labels==1) scores (before SoftMax). | |
| - **hidden_states** (`tuple(torch.FloatTensor)`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`) -- Tuple of `torch.FloatTensor` (one for the output of the embeddings, if the model has an embedding layer, + | |
| one for the output of each layer) of shape `(batch_size, sequence_length, hidden_size)`. | |
| Hidden-states of the model at the output of each layer plus the optional initial embedding outputs. | |
| - **attentions** (`tuple(torch.FloatTensor)`, *optional*, returned when `output_attentions=True` is passed or when `config.output_attentions=True`) -- Tuple of `torch.FloatTensor` (one for each layer) of shape `(batch_size, num_heads, sequence_length, | |
| sequence_length)`. | |
| Attentions weights after the attention softmax, used to compute the weighted average in the self-attention | |
| heads. | |
| Example of single-label classification: | |
| ```python | |
| >>> import torch | |
| >>> from transformers import AutoTokenizer, CTRLForSequenceClassification | |
| >>> tokenizer = AutoTokenizer.from_pretrained("Salesforce/ctrl") | |
| >>> model = CTRLForSequenceClassification.from_pretrained("Salesforce/ctrl") | |
| >>> # CTRL was trained with control codes as the first token | |
| >>> inputs = tokenizer("Opinion My dog is cute", return_tensors="pt") | |
| >>> assert inputs["input_ids"][0, 0].item() in tokenizer.control_codes.values() | |
| >>> with torch.no_grad(): | |
| ... logits = model(**inputs).logits | |
| >>> predicted_class_id = logits.argmax().item() | |
| >>> model.config.id2label[predicted_class_id] | |
| 'LABEL_0' | |
| ``` | |
| ```python | |
| >>> import torch | |
| >>> torch.manual_seed(42) | |
| >>> # To train a model on `num_labels` classes, you can pass `num_labels=num_labels` to `.from_pretrained(...)` | |
| >>> num_labels = len(model.config.id2label) | |
| >>> model = CTRLForSequenceClassification.from_pretrained("Salesforce/ctrl", num_labels=num_labels) | |
| >>> labels = torch.tensor(1) | |
| >>> loss = model(**inputs, labels=labels).loss | |
| >>> round(loss.item(), 2) | |
| 0.93 | |
| ``` | |
| Example of multi-label classification: | |
| ```python | |
| >>> import torch | |
| >>> from transformers import AutoTokenizer, CTRLForSequenceClassification | |
| >>> tokenizer = AutoTokenizer.from_pretrained("Salesforce/ctrl") | |
| >>> model = CTRLForSequenceClassification.from_pretrained( | |
| ... "Salesforce/ctrl", problem_type="multi_label_classification" | |
| ... ) | |
| >>> # CTRL was trained with control codes as the first token | |
| >>> inputs = tokenizer("Opinion My dog is cute", return_tensors="pt") | |
| >>> assert inputs["input_ids"][0, 0].item() in tokenizer.control_codes.values() | |
| >>> with torch.no_grad(): | |
| ... logits = model(**inputs).logits | |
| >>> predicted_class_id = logits.argmax().item() | |
| >>> model.config.id2label[predicted_class_id] | |
| 'LABEL_0' | |
| ``` | |
| ```python | |
| >>> # To train a model on `num_labels` classes, you can pass `num_labels=num_labels` to `.from_pretrained(...)` | |
| >>> num_labels = len(model.config.id2label) | |
| >>> model = CTRLForSequenceClassification.from_pretrained("Salesforce/ctrl", num_labels=num_labels) | |
| >>> num_labels = len(model.config.id2label) | |
| >>> labels = torch.nn.functional.one_hot(torch.tensor([predicted_class_id]), num_classes=num_labels).to( | |
| ... torch.float | |
| ... ) | |
| >>> loss = model(**inputs, labels=labels).loss | |
| >>> loss.backward() | |
| ``` | |
| **Parameters:** | |
| config ([CTRLForSequenceClassification](/docs/transformers/main/ja/model_doc/ctrl#transformers.CTRLForSequenceClassification)) : Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the [from_pretrained()](/docs/transformers/main/ja/main_classes/model#transformers.PreTrainedModel.from_pretrained) method to load the model weights. | |
| **Returns:** | |
| `[SequenceClassifierOutput](/docs/transformers/main/ja/main_classes/output#transformers.modeling_outputs.SequenceClassifierOutput) or `tuple(torch.FloatTensor)`` | |
| A [SequenceClassifierOutput](/docs/transformers/main/ja/main_classes/output#transformers.modeling_outputs.SequenceClassifierOutput) or a tuple of | |
| `torch.FloatTensor` (if `return_dict=False` is passed or when `config.return_dict=False`) comprising various | |
| elements depending on the configuration ([CTRLConfig](/docs/transformers/main/ja/model_doc/ctrl#transformers.CTRLConfig)) and inputs. | |
Xet Storage Details
- Size:
- 41.1 kB
- Xet hash:
- e61fd21d8652863a6f4c7bcd8363d9ed8bed3f250e7472151e048c85e58fec1f
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.