hrezaei commited on
Commit
309c7a2
·
verified ·
1 Parent(s): df5b00c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +44 -6
README.md CHANGED
@@ -20,6 +20,8 @@ model-index:
20
  - name: Accuracy
21
  type: accuracy
22
  value: 0.032223235792499715
 
 
23
  ---
24
 
25
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -27,22 +29,58 @@ should probably proofread and complete it, then remove this comment. -->
27
 
28
  # T5LA
29
 
30
- This model is a fine-tuned version of [](https://huggingface.co/) on the HuggingFaceFW/fineweb sample-10BT dataset.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
31
  It achieves the following results on the evaluation set:
32
  - Loss: 5.5467
33
  - Accuracy: 0.0322
34
 
35
- ## Model description
 
 
36
 
37
- More information needed
 
 
38
 
39
  ## Intended uses & limitations
40
 
41
- More information needed
 
 
 
 
 
 
 
 
42
 
43
  ## Training and evaluation data
44
 
45
- More information needed
 
 
 
46
 
47
  ## Training procedure
48
 
@@ -173,4 +211,4 @@ The following hyperparameters were used during training:
173
  - Transformers 4.49.0.dev0
174
  - Pytorch 2.5.1+cu121
175
  - Datasets 3.2.0
176
- - Tokenizers 0.21.0
 
20
  - name: Accuracy
21
  type: accuracy
22
  value: 0.032223235792499715
23
+ base_model:
24
+ - google-t5/t5-base
25
  ---
26
 
27
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 
29
 
30
  # T5LA
31
 
32
+ This model is part of the work published in the paper [Interactive Text Games: Lookahead Is All You Need!](https://openreview.net/pdf?id=D38rTnrkal)
33
+
34
+ Four models are introduced in the above paper:
35
+ - [nanoGPTLA](https://huggingface.co/hrezaei/nanoGPTLookAhead)
36
+ - [nanoGPTLAA](https://huggingface.co/hrezaei/nanoGPTLookAheadA)
37
+ - [nanoGPTLAA2](https://huggingface.co/hrezaei/nanoGPTLookAheadA2)
38
+ - [nanoGPTLAE](https://huggingface.co/hrezaei/nanoGPTLookAheadAE)
39
+
40
+ These models are implemented in [this repository](https://github.com/HRezaei/nanoGPT) which is a customized version of [nanoGPT](https://github.com/karpathy/nanoGPT).
41
+
42
+ The same variations are also implemented in [this fork](https://github.com/HRezaei/transformers/tree/feature/lookahead_models) of Transformers library, on top of [Google-t5/T5](https://github.com/huggingface/transformers/tree/128387757105c7c0b57b519ac2aaff217a20e3f0/src/transformers/models/t5) implementation.
43
+ These models are also trained and published as follows:
44
+ - [T5LA](https://huggingface.co/hrezaei/T5LA)
45
+ - [T5LAA](https://huggingface.co/hrezaei/T5LAA)
46
+ - [T5LAA2](https://huggingface.co/hrezaei/T5LAA2)
47
+ - [T5LAE](https://huggingface.co/hrezaei/T5LAE)
48
+
49
+ All the above models are on the scale of GPT2 (~100M parameters). The work is in progress to train them on larger scales.
50
+
51
+ ## Model description
52
+
53
+ This model is not fine-tuned on any instruction or human feedback datasets. It is just pre-trained on the HuggingFaceFW/fineweb sample-10BT dataset.
54
  It achieves the following results on the evaluation set:
55
  - Loss: 5.5467
56
  - Accuracy: 0.0322
57
 
58
+ Since the above fork is not merged into the main Transformers library yet, if you need to load it with AutoModel.from_pretrained(),
59
+ you need to first install Transformers from [this branch](https://github.com/HRezaei/transformers/tree/feature/lookahead_models),
60
+ which contains the code for T5LA models. This can be done by:
61
 
62
+ ```shell
63
+ pip install git+https://github.com/HRezaei/transformers.git@feature/lookahead_models
64
+ ```
65
 
66
  ## Intended uses & limitations
67
 
68
+ The model is designed to predict not only the next immediate token after the prompt (which normal LLMs do), but also to predict
69
+ the second, third, ..., up to K next tokens, conditioned on the prompt. These future predictions can be useful for approximated ranking,
70
+ where a set of potential responses are needed to be ranked based on the approximated probability of their tokens conditioned on the prompt,
71
+ rather than conditioned on their previous tokens.
72
+
73
+ The main limitation is that future predictions are generaly not suitable for generating text, as they don't consider token interdependencies,
74
+ i.e. the future tokens are not conditioned on the previous tokens. Thus, for generation, one should rely only on the next immediate token.
75
+ However, the quality of next immediate token prediction is also degraded, because during training, the loss function has more terms to
76
+ minimize (one term for next immediate token like original LLMs, and one extra term per each future tokens).
77
 
78
  ## Training and evaluation data
79
 
80
+ This model is not fine-tuned on any instruction or human feedback datasets. It is just pre-trained on the HuggingFaceFW/fineweb sample-10BT dataset.
81
+ It achieves the following results on the evaluation set:
82
+ - Loss: 5.5467
83
+ - Accuracy: 0.0322
84
 
85
  ## Training procedure
86
 
 
211
  - Transformers 4.49.0.dev0
212
  - Pytorch 2.5.1+cu121
213
  - Datasets 3.2.0
214
+ - Tokenizers 0.21.0