# Decision Transformer

## Overview

Decision Transformer モデルは、[Decision Transformer: Reinforcement Learning via Sequence Modeling](https://huggingface.co/papers/2106.01345) で提案されました。
Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch.

論文の要約は次のとおりです。

_強化学習（RL）をシーケンスモデリング問題として抽象化するフレームワークを紹介します。
これにより、Transformer アーキテクチャのシンプルさとスケーラビリティ、および関連する進歩を活用できるようになります。
GPT-x や BERT などの言語モデリングで。特に、Decision Transformer というアーキテクチャを紹介します。
RL の問題を条件付きシーケンス モデリングとして投げかけます。値関数に適合する以前の RL アプローチとは異なり、
ポリシー勾配を計算すると、Decision Transformer は因果的にマスクされたアルゴリズムを利用して最適なアクションを出力するだけです。
変成器。望ましいリターン (報酬)、過去の状態、アクションに基づいて自己回帰モデルを条件付けすることにより、
Decision Transformer モデルは、望ましいリターンを達成する将来のアクションを生成できます。そのシンプルさにも関わらず、
Decision Transformer は、最先端のモデルフリーのオフライン RL ベースラインのパフォーマンスと同等、またはそれを超えています。
Atari、OpenAI Gym、Key-to-Door タスク_

このバージョンのモデルは、状態がベクトルであるタスク用です。

このモデルは、[edbeeching](https://huggingface.co/edbeeching) によって提供されました。元のコードは [ここ](https://github.com/kzl/decision-transformer) にあります。

## DecisionTransformerConfig[[transformers.DecisionTransformerConfig]]

#### transformers.DecisionTransformerConfig[[transformers.DecisionTransformerConfig]]

[Source](https://github.com/huggingface/transformers/blob/main/src/transformers/models/decision_transformer/configuration_decision_transformer.py#L24)

This is the configuration class to store the configuration of a DecisionTransformerModel. It is used to instantiate a Decision Transformer
model according to the specified arguments, defining the model architecture. Instantiating a configuration with the
defaults will yield a similar configuration to that of the [](https://huggingface.co/)

Configuration objects inherit from [PreTrainedConfig](/docs/transformers/main/ja/main_classes/configuration#transformers.PreTrainedConfig) and can be used to control the model outputs. Read the
documentation from [PreTrainedConfig](/docs/transformers/main/ja/main_classes/configuration#transformers.PreTrainedConfig) for more information.

Example:

```python
>>> from transformers import DecisionTransformerConfig, DecisionTransformerModel

>>> # Initializing a DecisionTransformer configuration
>>> configuration = DecisionTransformerConfig()

>>> # Initializing a model (with random weights) from the configuration
>>> model = DecisionTransformerModel(configuration)

>>> # Accessing the model configuration
>>> configuration = model.config
```

**Parameters:**

state_dim (`int`, *optional*, defaults to 17) : The state size for the RL environment

act_dim (`int`, *optional*, defaults to 4) : The size of the output action space

hidden_size (`int`, *optional*, defaults to `128`) : Dimension of the hidden representations.

max_ep_len (`int`, *optional*, defaults to 4096) : The maximum length of an episode in the environment

action_tanh (`bool`, *optional*, defaults to True) : Whether to use a tanh activation on action prediction

vocab_size (`int`, *optional*, defaults to `1`) : Vocabulary size of the model. Defines the number of different tokens that can be represented by the `input_ids`.

n_positions (`int`, *optional*, defaults to `1024`) : The maximum sequence length that this model might ever be used with.

n_layer (`int`, *optional*, defaults to `3`) : Number of hidden layers in the Transformer decoder.

n_head (`int`, *optional*, defaults to `1`) : Number of attention heads for each attention layer in the Transformer decoder.

n_inner (`int`, *optional*) : Dimension of the MLP representations.

activation_function (`str`, *optional*, defaults to `relu`) : The non-linear activation function (function or string) in the decoder. For example, `"gelu"`, `"relu"`, `"silu"`, etc.

resid_pdrop (`Union[float, int]`, *optional*, defaults to `0.1`) : The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.

embd_pdrop (`Union[float, int]`, *optional*, defaults to `0.1`) : The dropout ratio for the embeddings.

attn_pdrop (`Union[float, int]`, *optional*, defaults to `0.1`) : The dropout ratio for the attention probabilities.

layer_norm_epsilon (`float`, *optional*, defaults to `1e-05`) : The epsilon used by the layer normalization layers.

initializer_range (`float`, *optional*, defaults to `0.02`) : The standard deviation of the truncated_normal_initializer for initializing all weight matrices.

scale_attn_weights (`bool`, *optional*, defaults to `True`) : Scale attention weights by dividing by sqrt(hidden_size)..

use_cache (`bool`, *optional*, defaults to `True`) : Whether or not the model should return the last key/values attentions (not used by all models). Only relevant if `config.is_decoder=True` or when the model is a decoder-only generative model.

bos_token_id (`int`, *optional*, defaults to `50256`) : Token id used for beginning-of-stream in the vocabulary.

eos_token_id (`Union[int, list[int]]`, *optional*, defaults to `50256`) : Token id used for end-of-stream in the vocabulary.

scale_attn_by_inverse_layer_idx (`bool`, *optional*, defaults to `False`) : Whether to additionally scale attention weights by `1 / layer_idx + 1`.

reorder_and_upcast_attn (`bool`, *optional*, defaults to `False`) : Whether to scale keys (K) prior to computing attention (dot-product) and upcast attention dot-product/softmax to float() when training with mixed precision.

add_cross_attention (`bool`, *optional*, defaults to `False`) : Whether cross-attention layers should be added to the model.

## DecisionTransformerGPT2Model

[[autodoc]] DecisionTransformerGPT2Model - forward

## DecisionTransformerModel

[[autodoc]] DecisionTransformerModel - forward

