Buckets:
| # Cohere | |
| Cohere [Command-R](https://cohere.com/blog/command-r) is a 35B parameter multilingual large language model designed for long context tasks like retrieval-augmented generation (RAG) and calling external APIs and tools. The model is specifically trained for grounded generation and supports both single-step and multi-step tool use. It supports a context length of 128K tokens. | |
| You can find all the original Command-R checkpoints under the [Command Models](https://huggingface.co/collections/CohereForAI/command-models-67652b401665205e17b192ad) collection. | |
| > [!TIP] | |
| > Click on the Cohere models in the right sidebar for more examples of how to apply Cohere to different language tasks. | |
| The example below demonstrates how to generate text with [Pipeline](/docs/transformers/pr_33892/en/main_classes/pipelines#transformers.Pipeline) or the [AutoModel](/docs/transformers/pr_33892/en/model_doc/auto#transformers.AutoModel), and from the command line. | |
| <hfoptions id="usage"> | |
| <hfoption id="Pipeline"> | |
| ```python | |
| import torch | |
| from transformers import pipeline | |
| pipeline = pipeline( | |
| task="text-generation", | |
| model="CohereForAI/c4ai-command-r-v01", | |
| dtype=torch.float16, | |
| device=0 | |
| ) | |
| pipeline("Plants create energy through a process known as") | |
| ``` | |
| </hfoption> | |
| <hfoption id="AutoModel"> | |
| ```python | |
| import torch | |
| from transformers import AutoTokenizer, AutoModelForCausalLM | |
| tokenizer = AutoTokenizer.from_pretrained("CohereForAI/c4ai-command-r-v01") | |
| model = AutoModelForCausalLM.from_pretrained("CohereForAI/c4ai-command-r-v01", dtype=torch.float16, device_map="auto", attn_implementation="sdpa") | |
| # format message with the Command-R chat template | |
| messages = [{"role": "user", "content": "How do plants make energy?"}] | |
| input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to(model.device) | |
| output = model.generate( | |
| input_ids, | |
| max_new_tokens=100, | |
| do_sample=True, | |
| temperature=0.3, | |
| cache_implementation="static", | |
| ) | |
| print(tokenizer.decode(output[0], skip_special_tokens=True)) | |
| ``` | |
| </hfoption> | |
| <hfoption id="transformers CLI"> | |
| ```bash | |
| # pip install -U flash-attn --no-build-isolation | |
| transformers chat CohereForAI/c4ai-command-r-v01 --dtype auto --attn_implementation flash_attention_2 | |
| ``` | |
| </hfoption> | |
| </hfoptions> | |
| Quantization reduces the memory burden of large models by representing the weights in a lower precision. Refer to the [Quantization](../quantization/overview) overview for more available quantization backends. | |
| The example below uses [bitsandbytes](../quantization/bitsandbytes) to quantize the weights to 4-bits. | |
| ```python | |
| import torch | |
| from transformers import BitsAndBytesConfig, AutoTokenizer, AutoModelForCausalLM | |
| bnb_config = BitsAndBytesConfig(load_in_4bit=True) | |
| tokenizer = AutoTokenizer.from_pretrained("CohereForAI/c4ai-command-r-v01") | |
| model = AutoModelForCausalLM.from_pretrained("CohereForAI/c4ai-command-r-v01", dtype=torch.float16, device_map="auto", quantization_config=bnb_config, attn_implementation="sdpa") | |
| # format message with the Command-R chat template | |
| messages = [{"role": "user", "content": "How do plants make energy?"}] | |
| input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to(model.device) | |
| output = model.generate( | |
| input_ids, | |
| max_new_tokens=100, | |
| do_sample=True, | |
| temperature=0.3, | |
| cache_implementation="static", | |
| ) | |
| print(tokenizer.decode(output[0], skip_special_tokens=True)) | |
| ``` | |
| Use the [AttentionMaskVisualizer](https://github.com/huggingface/transformers/blob/beb9b5b02246b9b7ee81ddf938f93f44cfeaad19/src/transformers/utils/attention_visualizer.py#L139) to better understand what tokens the model can and cannot attend to. | |
| ```py | |
| from transformers.utils.attention_visualizer import AttentionMaskVisualizer | |
| visualizer = AttentionMaskVisualizer("CohereForAI/c4ai-command-r-v01") | |
| visualizer("Plants create energy through a process known as") | |
| ``` | |
| <div class="flex justify-center"> | |
| <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/cohere-attn-mask.png"/> | |
| </div> | |
| ## Notes | |
| - Don't use the dtype parameter in [from_pretrained()](/docs/transformers/pr_33892/en/model_doc/auto#transformers.AutoModel.from_pretrained) if you're using FlashAttention-2 because it only supports fp16 or bf16. You should use [Automatic Mixed Precision](https://pytorch.org/tutorials/recipes/recipes/amp_recipe.html), set fp16 or bf16 to True if using [Trainer](/docs/transformers/pr_33892/en/main_classes/trainer#transformers.Trainer), or use [torch.autocast](https://pytorch.org/docs/stable/amp.html#torch.autocast). | |
| ## CohereConfig[[transformers.CohereConfig]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class transformers.CohereConfig</name><anchor>transformers.CohereConfig</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/models/cohere/configuration_cohere.py#L32</source><parameters>[{"name": "vocab_size", "val": ": typing.Optional[int] = 256000"}, {"name": "hidden_size", "val": ": typing.Optional[int] = 8192"}, {"name": "intermediate_size", "val": ": typing.Optional[int] = 22528"}, {"name": "logit_scale", "val": ": typing.Optional[float] = 0.0625"}, {"name": "num_hidden_layers", "val": ": typing.Optional[int] = 40"}, {"name": "num_attention_heads", "val": ": typing.Optional[int] = 64"}, {"name": "num_key_value_heads", "val": ": typing.Optional[int] = None"}, {"name": "hidden_act", "val": ": typing.Optional[str] = 'silu'"}, {"name": "max_position_embeddings", "val": ": typing.Optional[int] = 8192"}, {"name": "initializer_range", "val": ": typing.Optional[float] = 0.02"}, {"name": "layer_norm_eps", "val": ": typing.Optional[int] = 1e-05"}, {"name": "use_cache", "val": ": typing.Optional[bool] = True"}, {"name": "pad_token_id", "val": ": typing.Optional[int] = 0"}, {"name": "bos_token_id", "val": ": typing.Optional[int] = 5"}, {"name": "eos_token_id", "val": ": typing.Optional[int] = 255001"}, {"name": "tie_word_embeddings", "val": ": typing.Optional[bool] = True"}, {"name": "rope_parameters", "val": ": typing.Union[transformers.modeling_rope_utils.RopeParameters, dict[transformers.modeling_rope_utils.RopeParameters], NoneType] = None"}, {"name": "attention_bias", "val": ": typing.Optional[bool] = False"}, {"name": "attention_dropout", "val": ": typing.Optional[float] = 0.0"}, {"name": "use_qk_norm", "val": ": typing.Optional[bool] = False"}, {"name": "**kwargs", "val": ""}]</parameters><paramsdesc>- **vocab_size** (`int`, *optional*, defaults to 256000) -- | |
| Vocabulary size of the Cohere model. Defines the number of different tokens that can be represented by the | |
| `inputs_ids` passed when calling [CohereModel](/docs/transformers/pr_33892/en/model_doc/cohere#transformers.CohereModel) | |
| - **hidden_size** (`int`, *optional*, defaults to 8192) -- | |
| Dimension of the hidden representations. | |
| - **intermediate_size** (`int`, *optional*, defaults to 22528) -- | |
| Dimension of the MLP representations. | |
| - **logit_scale** (`float`, *optional*, defaults to 0.0625) -- | |
| The scaling factor for the output logits. | |
| - **num_hidden_layers** (`int`, *optional*, defaults to 40) -- | |
| Number of hidden layers in the Transformer decoder. | |
| - **num_attention_heads** (`int`, *optional*, defaults to 64) -- | |
| Number of attention heads for each attention layer in the Transformer decoder. | |
| - **num_key_value_heads** (`int`, *optional*) -- | |
| This is the number of key_value heads that should be used to implement Grouped Query Attention. If | |
| `num_key_value_heads=num_attention_heads`, the model will use Multi Head Attention (MHA), if | |
| `num_key_value_heads=1` the model will use Multi Query Attention (MQA) otherwise GQA is used. When | |
| converting a multi-head checkpoint to a GQA checkpoint, each group key and value head should be constructed | |
| by meanpooling all the original heads within that group. For more details, check out [this | |
| paper](https://huggingface.co/papers/2305.13245). If it is not specified, will default to | |
| `num_attention_heads`. | |
| - **hidden_act** (`str` or `function`, *optional*, defaults to `"silu"`) -- | |
| The non-linear activation function (function or string) in the decoder. | |
| - **max_position_embeddings** (`int`, *optional*, defaults to 8192) -- | |
| The maximum sequence length that this model might ever be used with. | |
| - **initializer_range** (`float`, *optional*, defaults to 0.02) -- | |
| The standard deviation of the truncated_normal_initializer for initializing all weight matrices. | |
| - **layer_norm_eps** (`float`, *optional*, defaults to 1e-05) -- | |
| The epsilon used by the layer normalization. | |
| - **use_cache** (`bool`, *optional*, defaults to `True`) -- | |
| Whether or not the model should return the last key/values attentions (not used by all models). Only | |
| relevant if `config.is_decoder=True`. | |
| - **pad_token_id** (`int`, *optional*, defaults to 0) -- | |
| Padding token id. | |
| - **bos_token_id** (`int`, *optional*, defaults to 5) -- | |
| Beginning of stream token id. | |
| - **eos_token_id** (`int`, *optional*, defaults to 255001) -- | |
| End of stream token id. | |
| - **tie_word_embeddings** (`bool`, *optional*, defaults to `True`) -- | |
| Whether to tie weight embeddings | |
| - **rope_parameters** (`RopeParameters`, *optional*) -- | |
| Dictionary containing the configuration parameters for the RoPE embeddings. The dictionaty should contain | |
| a value for `rope_theta` and optionally parameters used for scaling in case you want to use RoPE | |
| with longer `max_position_embeddings`. | |
| - **attention_bias** (`bool`, defaults to `False`, *optional*, defaults to `False`) -- | |
| Whether to use a bias in the query, key, value and output projection layers during self-attention. | |
| - **attention_dropout** (`float`, *optional*, defaults to 0.0) -- | |
| The dropout ratio for the attention probabilities. | |
| - **use_qk_norm** (`bool`, *optional*, defaults to `False`) -- | |
| Whether to use query-key normalization in the attention</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| This is the configuration class to store the configuration of a [CohereModel](/docs/transformers/pr_33892/en/model_doc/cohere#transformers.CohereModel). It is used to instantiate an Cohere | |
| model according to the specified arguments, defining the model architecture. | |
| Configuration objects inherit from [PreTrainedConfig](/docs/transformers/pr_33892/en/main_classes/configuration#transformers.PreTrainedConfig) and can be used to control the model outputs. Read the | |
| documentation from [PreTrainedConfig](/docs/transformers/pr_33892/en/main_classes/configuration#transformers.PreTrainedConfig) for more information. Instantiating a configuration | |
| with the defaults will yield a similar configuration to that of the [CohereForAI/c4ai-command-r-v01](https://huggingface.co/CohereForAI/c4ai-command-r-v01) model. | |
| <ExampleCodeBlock anchor="transformers.CohereConfig.example"> | |
| ```python | |
| >>> from transformers import CohereModel, CohereConfig | |
| >>> # Initializing a Cohere model configuration | |
| >>> configuration = CohereConfig() | |
| >>> # Initializing a model from the Cohere configuration | |
| >>> model = CohereModel(configuration) | |
| >>> # Accessing the model configuration | |
| >>> configuration = model.config | |
| ``` | |
| </ExampleCodeBlock> | |
| </div> | |
| ## CohereTokenizerFast[[transformers.CohereTokenizerFast]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class transformers.CohereTokenizerFast</name><anchor>transformers.CohereTokenizerFast</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/models/cohere/tokenization_cohere_fast.py#L46</source><parameters>[{"name": "vocab_file", "val": " = None"}, {"name": "merges_file", "val": " = None"}, {"name": "tokenizer_file", "val": " = None"}, {"name": "clean_up_tokenization_spaces", "val": " = False"}, {"name": "unk_token", "val": " = '<UNK>'"}, {"name": "bos_token", "val": " = '<BOS_TOKEN>'"}, {"name": "eos_token", "val": " = '<|END_OF_TURN_TOKEN|>'"}, {"name": "add_bos_token", "val": " = True"}, {"name": "add_eos_token", "val": " = False"}, {"name": "use_default_system_prompt", "val": " = False"}, {"name": "add_prefix_space", "val": " = False"}, {"name": "**kwargs", "val": ""}]</parameters><paramsdesc>- **vocab_file** (`str`, *optional*) -- | |
| Path to the vocabulary file. | |
| - **merges_file** (`str`, *optional*) -- | |
| Path to the merges file. | |
| - **tokenizer_file** (`str`, *optional*) -- | |
| [tokenizers](https://github.com/huggingface/tokenizers) file (generally has a .json extension) that | |
| contains everything needed to load the tokenizer. | |
| - **clean_up_tokenization_spaces** (`bool`, *optional*, defaults to `False`) -- | |
| Whether or not to cleanup spaces after decoding, cleanup consists in removing potential artifacts like | |
| extra spaces. | |
| - **unk_token** (`str` or `tokenizers.AddedToken`, *optional*, defaults to `"<UNK>"`) -- | |
| The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this | |
| token instead. | |
| - **bos_token** (`str` or `tokenizers.AddedToken`, *optional*, defaults to `"<BOS_TOKEN>"`) -- | |
| The beginning of sequence token that was used during pretraining. Can be used a sequence classifier token. | |
| - **eos_token** (`str` or `tokenizers.AddedToken`, *optional*, defaults to `"<|END_OF_TURN_TOKEN|>"`) -- | |
| The end of sequence token. | |
| - **add_bos_token** (`bool`, *optional*, defaults to `True`) -- | |
| Whether or not to add an `bos_token` at the start of sequences. | |
| - **add_eos_token** (`bool`, *optional*, defaults to `False`) -- | |
| Whether or not to add an `eos_token` at the end of sequences. | |
| - **use_default_system_prompt** (`bool`, *optional*, defaults to `False`) -- | |
| Whether or not the default system prompt for Cohere tokenizer should be used. | |
| - **add_prefix_space** (`bool`, *optional*, defaults to `False`) -- | |
| Whether or not the tokenizer should automatically add a prefix space</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| Construct a Cohere tokenizer. Based on byte-level Byte-Pair-Encoding. | |
| This uses notably ByteFallback and NFC normalization. | |
| <ExampleCodeBlock anchor="transformers.CohereTokenizerFast.example"> | |
| ```python | |
| >>> from transformers import AutoTokenizer | |
| >>> tokenizer = AutoTokenizer.from_pretrained("CohereForAI/c4ai-command-r-v01") | |
| >>> tokenizer.encode("Hello this is a test") | |
| [5, 28339, 2075, 1801, 1671, 3282] | |
| ``` | |
| </ExampleCodeBlock> | |
| If you want to change the `bos_token` or the `eos_token`, make sure to specify them when initializing the model, or | |
| call `tokenizer.update_post_processor()` to make sure that the post-processing is correctly done (otherwise the | |
| values of the first token and final token of an encoded sequence will not be correct). For more details, checkout | |
| [post-processors] (https://huggingface.co/docs/tokenizers/api/post-processors) documentation. | |
| You can get around that behavior by passing `add_prefix_space=True` when instantiating this tokenizer, but since | |
| the model was not pretrained this way, it might yield a decrease in performance. | |
| <Tip> | |
| When used with `is_split_into_words=True`, this tokenizer needs to be instantiated with `add_prefix_space=True`. | |
| </Tip> | |
| This tokenizer inherits from [PreTrainedTokenizerFast](/docs/transformers/pr_33892/en/main_classes/tokenizer#transformers.PreTrainedTokenizerFast) which contains most of the main methods. Users should | |
| refer to this superclass for more information regarding those methods. | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>build_inputs_with_special_tokens</name><anchor>transformers.CohereTokenizerFast.build_inputs_with_special_tokens</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/models/cohere/tokenization_cohere_fast.py#L490</source><parameters>[{"name": "token_ids_0", "val": ""}, {"name": "token_ids_1", "val": " = None"}]</parameters></docstring> | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>get_special_tokens_mask</name><anchor>transformers.CohereTokenizerFast.get_special_tokens_mask</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/tokenization_utils_base.py#L3927</source><parameters>[{"name": "token_ids_0", "val": ": list"}, {"name": "token_ids_1", "val": ": typing.Optional[list[int]] = None"}, {"name": "already_has_special_tokens", "val": ": bool = False"}]</parameters><paramsdesc>- **token_ids_0** (`list[int]`) -- | |
| List of ids of the first sequence. | |
| - **token_ids_1** (`list[int]`, *optional*) -- | |
| List of ids of the second sequence. | |
| - **already_has_special_tokens** (`bool`, *optional*, defaults to `False`) -- | |
| Whether or not the token list is already formatted with special tokens for the model.</paramsdesc><paramgroups>0</paramgroups><rettype>A list of integers in the range [0, 1]</rettype><retdesc>1 for a special token, 0 for a sequence token.</retdesc></docstring> | |
| Retrieves sequence ids from a token list that has no special tokens added. This method is called when adding | |
| special tokens using the tokenizer `prepare_for_model` or `encode_plus` methods. | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>create_token_type_ids_from_sequences</name><anchor>transformers.CohereTokenizerFast.create_token_type_ids_from_sequences</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/tokenization_utils_base.py#L3446</source><parameters>[{"name": "token_ids_0", "val": ": list"}, {"name": "token_ids_1", "val": ": typing.Optional[list[int]] = None"}]</parameters><paramsdesc>- **token_ids_0** (`list[int]`) -- The first tokenized sequence. | |
| - **token_ids_1** (`list[int]`, *optional*) -- The second tokenized sequence.</paramsdesc><paramgroups>0</paramgroups><rettype>`list[int]`</rettype><retdesc>The token type ids.</retdesc></docstring> | |
| Create the token type IDs corresponding to the sequences passed. [What are token type | |
| IDs?](../glossary#token-type-ids) | |
| Should be overridden in a subclass if the model has a special way of building those. | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>update_post_processor</name><anchor>transformers.CohereTokenizerFast.update_post_processor</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/models/cohere/tokenization_cohere_fast.py#L178</source><parameters>[]</parameters></docstring> | |
| Updates the underlying post processor with the current `bos_token` and `eos_token`. | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>save_vocabulary</name><anchor>transformers.CohereTokenizerFast.save_vocabulary</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/tokenization_utils_base.py#L2651</source><parameters>[{"name": "save_directory", "val": ": str"}, {"name": "filename_prefix", "val": ": typing.Optional[str] = None"}]</parameters><paramsdesc>- **save_directory** (`str`) -- | |
| The directory in which to save the vocabulary. | |
| - **filename_prefix** (`str`, *optional*) -- | |
| An optional prefix to add to the named of the saved files.</paramsdesc><paramgroups>0</paramgroups><rettype>`tuple(str)`</rettype><retdesc>Paths to the files saved.</retdesc></docstring> | |
| Save only the vocabulary of the tokenizer (vocabulary + added tokens). | |
| This method won't save the configuration and special token mappings of the tokenizer. Use | |
| `_save_pretrained()` to save the whole state of the tokenizer. | |
| </div></div> | |
| ## CohereModel[[transformers.CohereModel]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class transformers.CohereModel</name><anchor>transformers.CohereModel</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/models/cohere/modeling_cohere.py#L389</source><parameters>[{"name": "config", "val": ": CohereConfig"}]</parameters><paramsdesc>- **config** ([CohereConfig](/docs/transformers/pr_33892/en/model_doc/cohere#transformers.CohereConfig)) -- | |
| Model configuration class with all the parameters of the model. Initializing with a config file does not | |
| load the weights associated with the model, only the configuration. Check out the | |
| [from_pretrained()](/docs/transformers/pr_33892/en/main_classes/model#transformers.PreTrainedModel.from_pretrained) method to load the model weights.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| The bare Cohere Model outputting raw hidden-states without any specific head on top. | |
| This model inherits from [PreTrainedModel](/docs/transformers/pr_33892/en/main_classes/model#transformers.PreTrainedModel). Check the superclass documentation for the generic methods the | |
| library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads | |
| etc.) | |
| This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass. | |
| Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage | |
| and behavior. | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>forward</name><anchor>transformers.CohereModel.forward</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/models/cohere/modeling_cohere.py#L406</source><parameters>[{"name": "input_ids", "val": ": typing.Optional[torch.LongTensor] = None"}, {"name": "attention_mask", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "position_ids", "val": ": typing.Optional[torch.LongTensor] = None"}, {"name": "past_key_values", "val": ": typing.Optional[transformers.cache_utils.Cache] = None"}, {"name": "inputs_embeds", "val": ": typing.Optional[torch.FloatTensor] = None"}, {"name": "cache_position", "val": ": typing.Optional[torch.LongTensor] = None"}, {"name": "use_cache", "val": ": typing.Optional[bool] = None"}, {"name": "**kwargs", "val": ": typing_extensions.Unpack[transformers.utils.generic.TransformersKwargs]"}]</parameters><paramsdesc>- **input_ids** (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Indices of input sequence tokens in the vocabulary. Padding will be ignored by default. | |
| Indices can be obtained using [AutoTokenizer](/docs/transformers/pr_33892/en/model_doc/auto#transformers.AutoTokenizer). See [PreTrainedTokenizer.encode()](/docs/transformers/pr_33892/en/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.encode) and | |
| [PreTrainedTokenizer.__call__()](/docs/transformers/pr_33892/en/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.__call__) for details. | |
| [What are input IDs?](../glossary#input-ids) | |
| - **attention_mask** (`torch.Tensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`: | |
| - 1 for tokens that are **not masked**, | |
| - 0 for tokens that are **masked**. | |
| [What are attention masks?](../glossary#attention-mask) | |
| - **position_ids** (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Indices of positions of each input sequence tokens in the position embeddings. Selected in the range `[0, config.n_positions - 1]`. | |
| [What are position IDs?](../glossary#position-ids) | |
| - **past_key_values** (`~cache_utils.Cache`, *optional*) -- | |
| Pre-computed hidden-states (key and values in the self-attention blocks and in the cross-attention | |
| blocks) that can be used to speed up sequential decoding. This typically consists in the `past_key_values` | |
| returned by the model at a previous stage of decoding, when `use_cache=True` or `config.use_cache=True`. | |
| Only [Cache](/docs/transformers/pr_33892/en/internal/generation_utils#transformers.Cache) instance is allowed as input, see our [kv cache guide](https://huggingface.co/docs/transformers/en/kv_cache). | |
| If no `past_key_values` are passed, [DynamicCache](/docs/transformers/pr_33892/en/internal/generation_utils#transformers.DynamicCache) will be initialized by default. | |
| The model will output the same cache format that is fed as input. | |
| If `past_key_values` are used, the user is expected to input only unprocessed `input_ids` (those that don't | |
| have their past key value states given to this model) of shape `(batch_size, unprocessed_length)` instead of all `input_ids` | |
| of shape `(batch_size, sequence_length)`. | |
| - **inputs_embeds** (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*) -- | |
| Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. This | |
| is useful if you want more control over how to convert `input_ids` indices into associated vectors than the | |
| model's internal embedding lookup matrix. | |
| - **cache_position** (`torch.LongTensor` of shape `(sequence_length)`, *optional*) -- | |
| Indices depicting the position of the input sequence tokens in the sequence. Contrarily to `position_ids`, | |
| this tensor is not affected by padding. It is used to update the cache in the correct position and to infer | |
| the complete sequence length. | |
| - **use_cache** (`bool`, *optional*) -- | |
| If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding (see | |
| `past_key_values`).</paramsdesc><paramgroups>0</paramgroups><rettype>[transformers.modeling_outputs.BaseModelOutputWithPast](/docs/transformers/pr_33892/en/main_classes/output#transformers.modeling_outputs.BaseModelOutputWithPast) or `tuple(torch.FloatTensor)`</rettype><retdesc>A [transformers.modeling_outputs.BaseModelOutputWithPast](/docs/transformers/pr_33892/en/main_classes/output#transformers.modeling_outputs.BaseModelOutputWithPast) or a tuple of | |
| `torch.FloatTensor` (if `return_dict=False` is passed or when `config.return_dict=False`) comprising various | |
| elements depending on the configuration ([CohereConfig](/docs/transformers/pr_33892/en/model_doc/cohere#transformers.CohereConfig)) and inputs. | |
| - **last_hidden_state** (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`) -- Sequence of hidden-states at the output of the last layer of the model. | |
| If `past_key_values` is used only the last hidden-state of the sequences of shape `(batch_size, 1, | |
| hidden_size)` is output. | |
| - **past_key_values** (`Cache`, *optional*, returned when `use_cache=True` is passed or when `config.use_cache=True`) -- It is a [Cache](/docs/transformers/pr_33892/en/internal/generation_utils#transformers.Cache) instance. For more details, see our [kv cache guide](https://huggingface.co/docs/transformers/en/kv_cache). | |
| Contains pre-computed hidden-states (key and values in the self-attention blocks and optionally if | |
| `config.is_encoder_decoder=True` in the cross-attention blocks) that can be used (see `past_key_values` | |
| input) to speed up sequential decoding. | |
| - **hidden_states** (`tuple(torch.FloatTensor)`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`) -- Tuple of `torch.FloatTensor` (one for the output of the embeddings, if the model has an embedding layer, + | |
| one for the output of each layer) of shape `(batch_size, sequence_length, hidden_size)`. | |
| Hidden-states of the model at the output of each layer plus the optional initial embedding outputs. | |
| - **attentions** (`tuple(torch.FloatTensor)`, *optional*, returned when `output_attentions=True` is passed or when `config.output_attentions=True`) -- Tuple of `torch.FloatTensor` (one for each layer) of shape `(batch_size, num_heads, sequence_length, | |
| sequence_length)`. | |
| Attentions weights after the attention softmax, used to compute the weighted average in the self-attention | |
| heads.</retdesc></docstring> | |
| The [CohereModel](/docs/transformers/pr_33892/en/model_doc/cohere#transformers.CohereModel) forward method, overrides the `__call__` special method. | |
| <Tip> | |
| Although the recipe for forward pass needs to be defined within this function, one should call the `Module` | |
| instance afterwards instead of this since the former takes care of running the pre and post processing steps while | |
| the latter silently ignores them. | |
| </Tip> | |
| </div></div> | |
| ## CohereForCausalLM[[transformers.CohereForCausalLM]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class transformers.CohereForCausalLM</name><anchor>transformers.CohereForCausalLM</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/models/cohere/modeling_cohere.py#L468</source><parameters>[{"name": "config", "val": ""}]</parameters><paramsdesc>- **config** ([CohereForCausalLM](/docs/transformers/pr_33892/en/model_doc/cohere#transformers.CohereForCausalLM)) -- | |
| Model configuration class with all the parameters of the model. Initializing with a config file does not | |
| load the weights associated with the model, only the configuration. Check out the | |
| [from_pretrained()](/docs/transformers/pr_33892/en/main_classes/model#transformers.PreTrainedModel.from_pretrained) method to load the model weights.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| The Cohere Model for causal language modeling. | |
| This model inherits from [PreTrainedModel](/docs/transformers/pr_33892/en/main_classes/model#transformers.PreTrainedModel). Check the superclass documentation for the generic methods the | |
| library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads | |
| etc.) | |
| This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass. | |
| Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage | |
| and behavior. | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>forward</name><anchor>transformers.CohereForCausalLM.forward</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/models/cohere/modeling_cohere.py#L484</source><parameters>[{"name": "input_ids", "val": ": typing.Optional[torch.LongTensor] = None"}, {"name": "attention_mask", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "position_ids", "val": ": typing.Optional[torch.LongTensor] = None"}, {"name": "past_key_values", "val": ": typing.Optional[transformers.cache_utils.Cache] = None"}, {"name": "inputs_embeds", "val": ": typing.Optional[torch.FloatTensor] = None"}, {"name": "labels", "val": ": typing.Optional[torch.LongTensor] = None"}, {"name": "use_cache", "val": ": typing.Optional[bool] = None"}, {"name": "output_attentions", "val": ": typing.Optional[bool] = None"}, {"name": "output_hidden_states", "val": ": typing.Optional[bool] = None"}, {"name": "cache_position", "val": ": typing.Optional[torch.LongTensor] = None"}, {"name": "logits_to_keep", "val": ": typing.Union[int, torch.Tensor] = 0"}, {"name": "**kwargs", "val": ": typing_extensions.Unpack[transformers.utils.generic.TransformersKwargs]"}]</parameters><paramsdesc>- **input_ids** (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Indices of input sequence tokens in the vocabulary. Padding will be ignored by default. | |
| Indices can be obtained using [AutoTokenizer](/docs/transformers/pr_33892/en/model_doc/auto#transformers.AutoTokenizer). See [PreTrainedTokenizer.encode()](/docs/transformers/pr_33892/en/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.encode) and | |
| [PreTrainedTokenizer.__call__()](/docs/transformers/pr_33892/en/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.__call__) for details. | |
| [What are input IDs?](../glossary#input-ids) | |
| - **attention_mask** (`torch.Tensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`: | |
| - 1 for tokens that are **not masked**, | |
| - 0 for tokens that are **masked**. | |
| [What are attention masks?](../glossary#attention-mask) | |
| - **position_ids** (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Indices of positions of each input sequence tokens in the position embeddings. Selected in the range `[0, config.n_positions - 1]`. | |
| [What are position IDs?](../glossary#position-ids) | |
| - **past_key_values** (`~cache_utils.Cache`, *optional*) -- | |
| Pre-computed hidden-states (key and values in the self-attention blocks and in the cross-attention | |
| blocks) that can be used to speed up sequential decoding. This typically consists in the `past_key_values` | |
| returned by the model at a previous stage of decoding, when `use_cache=True` or `config.use_cache=True`. | |
| Only [Cache](/docs/transformers/pr_33892/en/internal/generation_utils#transformers.Cache) instance is allowed as input, see our [kv cache guide](https://huggingface.co/docs/transformers/en/kv_cache). | |
| If no `past_key_values` are passed, [DynamicCache](/docs/transformers/pr_33892/en/internal/generation_utils#transformers.DynamicCache) will be initialized by default. | |
| The model will output the same cache format that is fed as input. | |
| If `past_key_values` are used, the user is expected to input only unprocessed `input_ids` (those that don't | |
| have their past key value states given to this model) of shape `(batch_size, unprocessed_length)` instead of all `input_ids` | |
| of shape `(batch_size, sequence_length)`. | |
| - **inputs_embeds** (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*) -- | |
| Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. This | |
| is useful if you want more control over how to convert `input_ids` indices into associated vectors than the | |
| model's internal embedding lookup matrix. | |
| - **labels** (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Labels for computing the masked language modeling loss. Indices should either be in `[0, ..., | |
| config.vocab_size]` or -100 (see `input_ids` docstring). Tokens with indices set to `-100` are ignored | |
| (masked), the loss is only computed for the tokens with labels in `[0, ..., config.vocab_size]`. | |
| - **use_cache** (`bool`, *optional*) -- | |
| If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding (see | |
| `past_key_values`). | |
| - **output_attentions** (`bool`, *optional*) -- | |
| Whether or not to return the attentions tensors of all attention layers. See `attentions` under returned | |
| tensors for more detail. | |
| - **output_hidden_states** (`bool`, *optional*) -- | |
| Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors for | |
| more detail. | |
| - **cache_position** (`torch.LongTensor` of shape `(sequence_length)`, *optional*) -- | |
| Indices depicting the position of the input sequence tokens in the sequence. Contrarily to `position_ids`, | |
| this tensor is not affected by padding. It is used to update the cache in the correct position and to infer | |
| the complete sequence length. | |
| - **logits_to_keep** (`Union[int, torch.Tensor]`, defaults to `0`) -- | |
| If an `int`, compute logits for the last `logits_to_keep` tokens. If `0`, calculate logits for all | |
| `input_ids` (special case). Only last token logits are needed for generation, and calculating them only for that | |
| token can save memory, which becomes pretty significant for long sequences or large vocabulary size. | |
| If a `torch.Tensor`, must be 1D corresponding to the indices to keep in the sequence length dimension. | |
| This is useful when using packed tensor format (single dimension for batch and sequence length).</paramsdesc><paramgroups>0</paramgroups><rettype>[transformers.modeling_outputs.CausalLMOutputWithPast](/docs/transformers/pr_33892/en/main_classes/output#transformers.modeling_outputs.CausalLMOutputWithPast) or `tuple(torch.FloatTensor)`</rettype><retdesc>A [transformers.modeling_outputs.CausalLMOutputWithPast](/docs/transformers/pr_33892/en/main_classes/output#transformers.modeling_outputs.CausalLMOutputWithPast) or a tuple of | |
| `torch.FloatTensor` (if `return_dict=False` is passed or when `config.return_dict=False`) comprising various | |
| elements depending on the configuration ([CohereConfig](/docs/transformers/pr_33892/en/model_doc/cohere#transformers.CohereConfig)) and inputs. | |
| - **loss** (`torch.FloatTensor` of shape `(1,)`, *optional*, returned when `labels` is provided) -- Language modeling loss (for next-token prediction). | |
| - **logits** (`torch.FloatTensor` of shape `(batch_size, sequence_length, config.vocab_size)`) -- Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax). | |
| - **past_key_values** (`Cache`, *optional*, returned when `use_cache=True` is passed or when `config.use_cache=True`) -- It is a [Cache](/docs/transformers/pr_33892/en/internal/generation_utils#transformers.Cache) instance. For more details, see our [kv cache guide](https://huggingface.co/docs/transformers/en/kv_cache). | |
| Contains pre-computed hidden-states (key and values in the self-attention blocks) that can be used (see | |
| `past_key_values` input) to speed up sequential decoding. | |
| - **hidden_states** (`tuple(torch.FloatTensor)`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`) -- Tuple of `torch.FloatTensor` (one for the output of the embeddings, if the model has an embedding layer, + | |
| one for the output of each layer) of shape `(batch_size, sequence_length, hidden_size)`. | |
| Hidden-states of the model at the output of each layer plus the optional initial embedding outputs. | |
| - **attentions** (`tuple(torch.FloatTensor)`, *optional*, returned when `output_attentions=True` is passed or when `config.output_attentions=True`) -- Tuple of `torch.FloatTensor` (one for each layer) of shape `(batch_size, num_heads, sequence_length, | |
| sequence_length)`. | |
| Attentions weights after the attention softmax, used to compute the weighted average in the self-attention | |
| heads.</retdesc></docstring> | |
| The [CohereForCausalLM](/docs/transformers/pr_33892/en/model_doc/cohere#transformers.CohereForCausalLM) forward method, overrides the `__call__` special method. | |
| <Tip> | |
| Although the recipe for forward pass needs to be defined within this function, one should call the `Module` | |
| instance afterwards instead of this since the former takes care of running the pre and post processing steps while | |
| the latter silently ignores them. | |
| </Tip> | |
| <ExampleCodeBlock anchor="transformers.CohereForCausalLM.forward.example"> | |
| Example: | |
| ```python | |
| >> from transformers import AutoTokenizer, CohereForCausalLM | |
| >> model = CohereForCausalLM.from_pretrained("CohereForAI/c4ai-command-r-v01") | |
| >> tokenizer = AutoTokenizer.from_pretrained("CohereForAI/c4ai-command-r-v01") | |
| >> prompt = "Hey, are you conscious? Can you talk to me?" | |
| >> inputs = tokenizer(prompt, return_tensors="pt") | |
| >> # Generate | |
| >> generate_ids = model.generate(inputs.input_ids, max_length=30) | |
| >> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0] | |
| "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you." | |
| ``` | |
| </ExampleCodeBlock> | |
| </div></div> | |
| <EditOnGithub source="https://github.com/huggingface/transformers/blob/main/docs/source/en/model_doc/cohere.md" /> |
Xet Storage Details
- Size:
- 39.7 kB
- Xet hash:
- 02afb35a306c460614157e425c9c7565cca8c8f80e80a0185a84afaadb3b83aa
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.