google
/

t5-large-ssm-nqo

text2text-generation

text-generation-inference

Model card Files Files and versions

patrickvonplaten commited on Dec 7, 2020

Commit

4a4fda7

·

1 Parent(s): f56866a

Update README.md

Files changed (1) hide show

README.md +12 -3

README.md CHANGED Viewed

@@ -12,7 +12,7 @@ license: apache-2.0
 The model was pre-trained using T5's denoising objective on [C4](https://huggingface.co/datasets/c4), subsequently additionally pre-trained using [REALM](https://arxiv.org/pdf/2002.08909.pdf)'s salient span masking objective on [Wikipedia](https://huggingface.co/datasets/wikipedia), and finally fine-tuned on [Natural Questions (NQ)](https://huggingface.co/datasets/natural_questions).
-**Note**: The model was fine-tuned on 90% of the train splits of [Natural Questions (NQ)](https://huggingface.co/datasets/natural_questions) for 20k steps.
 Other community Checkpoints: [here](https://huggingface.co/models?search=ssm)
@@ -20,6 +20,17 @@ Paper: [How Much Knowledge Can You Pack
 Into the Parameters of a Language Model?](https://arxiv.org/abs/1910.10683.pdf)
 Authors: *Adam Roberts, Colin Raffel, Noam Shazeer*
 ## Usage
 The model can be used as follows for **closed book question answering**:
@@ -34,8 +45,6 @@ input_ids = t5_tok("When was Franklin D. Roosevelt born?", return_tensors="pt").
 gen_output = t5_qa_model.generate(input_ids)[0]
 print(t5_tok.decode(gen_output, skip_special_tokens=True))
-# should give "On February 13, 1904" => not correct sadly.
 ```
 ## Abstract

 The model was pre-trained using T5's denoising objective on [C4](https://huggingface.co/datasets/c4), subsequently additionally pre-trained using [REALM](https://arxiv.org/pdf/2002.08909.pdf)'s salient span masking objective on [Wikipedia](https://huggingface.co/datasets/wikipedia), and finally fine-tuned on [Natural Questions (NQ)](https://huggingface.co/datasets/natural_questions).
+**Note**: The model was fine-tuned on 90% of the train splits of [Natural Questions (NQ)](https://huggingface.co/datasets/natural_questions) for 20k steps and validated on the held-out 10% of the train split.
 Other community Checkpoints: [here](https://huggingface.co/models?search=ssm)
 Into the Parameters of a Language Model?](https://arxiv.org/abs/1910.10683.pdf)
 Authors: *Adam Roberts, Colin Raffel, Noam Shazeer*
+## Results on Natural Questions - Test Set
+|Id | link | Exact Match  |
+|---|---|---|
+|**T5-large**|**https://huggingface.co/google/t5-large-ssm-nqo**|**29.0**|
+|T5-xxl|https://huggingface.co/google/t5-xxl-ssm-nqo|35.2|
+|T5-3b|https://huggingface.co/google/t5-3b-ssm-nqo|31.7|
+|T5-11b|https://huggingface.co/google/t5-11b-ssm-nqo|34.8|
 ## Usage
 The model can be used as follows for **closed book question answering**:
 gen_output = t5_qa_model.generate(input_ids)[0]
 print(t5_tok.decode(gen_output, skip_special_tokens=True))
 ```
 ## Abstract