hrezaei
/

T5LA

@@ -20,6 +20,8 @@ model-index:
     - name: Accuracy
       type: accuracy
       value: 0.032223235792499715
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -27,22 +29,58 @@ should probably proofread and complete it, then remove this comment. -->
 # T5LA
-This model is a fine-tuned version of [](https://huggingface.co/) on the HuggingFaceFW/fineweb sample-10BT dataset.
 It achieves the following results on the evaluation set:
 - Loss: 5.5467
 - Accuracy: 0.0322
-## Model description
-More information needed
 ## Intended uses & limitations
-More information needed
 ## Training and evaluation data
-More information needed
 ## Training procedure
@@ -173,4 +211,4 @@ The following hyperparameters were used during training:
 - Transformers 4.49.0.dev0
 - Pytorch 2.5.1+cu121
 - Datasets 3.2.0
-- Tokenizers 0.21.0

     - name: Accuracy
       type: accuracy
       value: 0.032223235792499715
+base_model:
+- google-t5/t5-base
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 # T5LA
+This model is part of the work published in the paper [Interactive Text Games: Lookahead Is All You Need!](https://openreview.net/pdf?id=D38rTnrkal)
+Four models are introduced in the above paper:
+- [nanoGPTLA](https://huggingface.co/hrezaei/nanoGPTLookAhead)
+- [nanoGPTLAA](https://huggingface.co/hrezaei/nanoGPTLookAheadA)
+- [nanoGPTLAA2](https://huggingface.co/hrezaei/nanoGPTLookAheadA2)
+- [nanoGPTLAE](https://huggingface.co/hrezaei/nanoGPTLookAheadAE)
+These models are implemented in [this repository](https://github.com/HRezaei/nanoGPT) which is a customized version of [nanoGPT](https://github.com/karpathy/nanoGPT).
+The same variations are also implemented in [this fork](https://github.com/HRezaei/transformers/tree/feature/lookahead_models) of Transformers library, on top of [Google-t5/T5](https://github.com/huggingface/transformers/tree/128387757105c7c0b57b519ac2aaff217a20e3f0/src/transformers/models/t5) implementation.
+These models are also trained and published as follows:
+- [T5LA](https://huggingface.co/hrezaei/T5LA)
+- [T5LAA](https://huggingface.co/hrezaei/T5LAA)
+- [T5LAA2](https://huggingface.co/hrezaei/T5LAA2)
+- [T5LAE](https://huggingface.co/hrezaei/T5LAE)
+All the above models are on the scale of GPT2 (~100M parameters). The work is in progress to train them on larger scales.
+## Model description
+This model is not fine-tuned on any instruction or human feedback datasets. It is just pre-trained on the HuggingFaceFW/fineweb sample-10BT dataset.
 It achieves the following results on the evaluation set:
 - Loss: 5.5467
 - Accuracy: 0.0322
+Since the above fork is not merged into the main Transformers library yet, if you need to load it with AutoModel.from_pretrained(),
+you need to first install Transformers from [this branch](https://github.com/HRezaei/transformers/tree/feature/lookahead_models),
+which contains the code for T5LA models. This can be done by:
+```shell
+pip install git+https://github.com/HRezaei/transformers.git@feature/lookahead_models
+```
 ## Intended uses & limitations
+The model is designed to predict not only the next immediate token after the prompt (which normal LLMs do), but also to predict
+the second, third, ..., up to K next tokens, conditioned on the prompt. These future predictions can be useful for approximated ranking,
+where a set of potential responses are needed to be ranked based on the approximated probability of their tokens conditioned on the prompt,
+rather than conditioned on their previous tokens.
+The main limitation is that future predictions are generaly not suitable for generating text, as they don't consider token interdependencies,
+i.e. the future tokens are not conditioned on the previous tokens. Thus, for generation, one should rely only on the next immediate token.
+However, the quality of next immediate token prediction is also degraded, because during training, the loss function has more terms to
+minimize (one term for next immediate token like original LLMs, and one extra term per each future tokens).
 ## Training and evaluation data
+This model is not fine-tuned on any instruction or human feedback datasets. It is just pre-trained on the HuggingFaceFW/fineweb sample-10BT dataset.
+It achieves the following results on the evaluation set:
+- Loss: 5.5467
+- Accuracy: 0.0322
 ## Training procedure
 - Transformers 4.49.0.dev0
 - Pytorch 2.5.1+cu121
 - Datasets 3.2.0
+- Tokenizers 0.21.0