anna4142 commited on
Commit
00ed97f
·
verified ·
1 Parent(s): 20de744

hierarchical-decision-transformer

Browse files
Files changed (1) hide show
  1. README.md +26 -52
README.md CHANGED
@@ -1,72 +1,46 @@
1
  ---
 
2
  tags:
3
- - decision_transformer
4
- - reinforcement_learning
5
- - gym_environment
6
- - fine-tuned
7
  model-index:
8
- - name: Hierarchical Decision Transformer
9
  results: []
10
  ---
11
 
12
- # Hierarchical Decision Transformer
 
13
 
14
- This model is a fine-tuned version of the Decision Transformer, trained on expert trajectories sampled from the Gym HalfCheetah environment.
15
 
16
- ## Model Description
17
 
18
- The **Hierarchical Decision Transformer** extends the standard Decision Transformer by incorporating hierarchical reasoning. It introduces clustering and subgoal reasoning capabilities, enabling enhanced performance on tasks requiring multi-level decision-making.
19
 
20
- - **Architecture**:
21
- - A hierarchical head added to process state embeddings for clustering.
22
- - Cluster centroids initialized as learnable parameters.
23
- - **Loss functions**:
24
- - Action prediction loss (MSE between predicted and target actions).
25
- - Entropy loss for cluster assignment diversity.
26
 
27
- ## Intended Uses & Limitations
28
 
29
- ### Intended Uses
30
- - Offline reinforcement learning tasks using trajectory data.
31
- - Tasks requiring subgoal reasoning or clustering-based decision-making.
32
- - Benchmarking on Gym environments like HalfCheetah and Hopper.
33
 
34
- ### Limitations
35
- - Performance depends heavily on clustering configurations and hierarchical design.
36
- - Additional computational cost due to hierarchical components.
37
 
38
- ## Training and Evaluation Data
39
 
40
- The model was trained on expert trajectories from the **Gym HalfCheetah environment**. These trajectories were sampled from a pre-trained policy to provide high-quality data for offline reinforcement learning.
41
 
42
- ## Training Procedure
43
-
44
- ### Training Hyperparameters
45
- - **Learning Rate**: 0.0001
46
- - **Train Batch Size**: 64
47
- - **Eval Batch Size**: 8
48
- - **Seed**: 42
49
- - **Optimizer**: `adamw_torch` with:
50
- - `betas`: (0.9, 0.999)
51
- - `epsilon`: 1e-08
52
- - **LR Scheduler Type**: `linear`
53
- - **Warmup Ratio**: 0.1
54
- - **Number of Epochs**: 200
55
-
56
- ### Framework Versions
57
- - **Transformers**: 4.46.2
58
- - **PyTorch**: 2.5.1+cu121
59
- - **Datasets**: 3.1.0
60
- - **Tokenizers**: 0.20.3
61
-
62
-
63
- ## References
64
-
65
- - [Decision Transformer Paper](https://arxiv.org/abs/2106.01345)
66
- - [Hugging Face Transformers Documentation](https://huggingface.co/docs/transformers/)
67
- - [Gym Environments](https://www.gymlibrary.dev/)
68
 
 
 
 
 
 
 
 
 
 
69
 
 
70
 
71
 
72
 
@@ -75,4 +49,4 @@ The model was trained on expert trajectories from the **Gym HalfCheetah environm
75
  - Transformers 4.46.2
76
  - Pytorch 2.5.1+cu121
77
  - Datasets 3.1.0
78
- - Tokenizers 0.20.3
 
1
  ---
2
+ library_name: transformers
3
  tags:
4
+ - generated_from_trainer
 
 
 
5
  model-index:
6
+ - name: output
7
  results: []
8
  ---
9
 
10
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
11
+ should probably proofread and complete it, then remove this comment. -->
12
 
13
+ # output
14
 
15
+ This model is a fine-tuned version of [](https://huggingface.co/) on an unknown dataset.
16
 
17
+ ## Model description
18
 
19
+ More information needed
 
 
 
 
 
20
 
21
+ ## Intended uses & limitations
22
 
23
+ More information needed
 
 
 
24
 
25
+ ## Training and evaluation data
 
 
26
 
27
+ More information needed
28
 
29
+ ## Training procedure
30
 
31
+ ### Training hyperparameters
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
32
 
33
+ The following hyperparameters were used during training:
34
+ - learning_rate: 0.0001
35
+ - train_batch_size: 64
36
+ - eval_batch_size: 8
37
+ - seed: 42
38
+ - optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
39
+ - lr_scheduler_type: linear
40
+ - lr_scheduler_warmup_ratio: 0.1
41
+ - num_epochs: 200
42
 
43
+ ### Training results
44
 
45
 
46
 
 
49
  - Transformers 4.46.2
50
  - Pytorch 2.5.1+cu121
51
  - Datasets 3.1.0
52
+ - Tokenizers 0.20.3