| | --- |
| | license: apache-2.0 |
| | datasets: |
| | - race |
| | language: |
| | - en |
| | library_name: transformers |
| | pipeline_tag: text2text-generation |
| | --- |
| | # t5-large fine-tuned to RACE for Generating Question+Answer |
| | - Input: `context` (e.g. news article) |
| | - Output: `question <sep> answer` |
| |
|
| | This model generates **abstractive** answers following the RACE dataset. If you would like to have **extractive** questions/answers, you can use our model trained on SQuAD: https://huggingface.co/potsawee/t5-large-generation-squad-QuestionAnswer. |
| |
|
| | ## Model Details |
| |
|
| | t5-large model is fine-tuned to the RACE dataset where the input is the context/passage and the output is the question followed by the answer. This is the first component in the question generation pipeline (i.e. `g1`) in our [MQAG paper](https://arxiv.org/abs/2301.12307), |
| | or please refer to the GitHub repo of this project: https://github.com/potsawee/mqag0. |
| |
|
| | ## How to Use the Model |
| |
|
| | Use the code below to get started with the model. You can also set do_sample=True in generate() to obtain different question-answer pairs. |
| | |
| | ```python |
| | >>> from transformers import AutoTokenizer, AutoModelForSeq2SeqLM |
| | |
| | >>> tokenizer = AutoTokenizer.from_pretrained("potsawee/t5-large-generation-race-QuestionAnswer") |
| | >>> model = AutoModelForSeq2SeqLM.from_pretrained("potsawee/t5-large-generation-race-QuestionAnswer") |
| | |
| | >>> context = r""" |
| | ... World number one Novak Djokovic says he is hoping for a "positive decision" to allow him |
| | ... to play at Indian Wells and the Miami Open next month. The United States has extended |
| | ... its requirement for international visitors to be vaccinated against Covid-19. Proof of vaccination |
| | ... will be required to enter the country until at least 10 April, but the Serbian has previously |
| | ... said he is unvaccinated. The 35-year-old has applied for special permission to enter the country. |
| | ... Indian Wells and the Miami Open - two of the most prestigious tournaments on the tennis calendar |
| | ... outside the Grand Slams - start on 6 and 20 March respectively. Djokovic says he will return to |
| | ... the ATP tour in Dubai next week after claiming a record-extending 10th Australian Open title |
| | ... and a record-equalling 22nd Grand Slam men's title last month.""".replace("\n", "") |
| | |
| | >>> inputs = tokenizer(context, return_tensors="pt") |
| | >>> outputs = model.generate(**inputs, max_length=100) |
| | >>> question_answer = tokenizer.decode(outputs[0], skip_special_tokens=False) |
| | >>> question_answer = question_answer.replace(tokenizer.pad_token, "").replace(tokenizer.eos_token, "") |
| | >>> question, answer = question_answer.split(tokenizer.sep_token) |
| | |
| | >>> print("question:", question) |
| | question: What is the best title for the passage? |
| | >>> print("answer:", answer) |
| | answer: Djokovic's application for special permission to enter the United States |
| | |
| | ``` |
| | |
| | ## Generating Distractors (other options in a multiple-choice setup) |
| | |
| | ```Context ---> Question + (A) Answer (B) Distractor1 (C) Distractor2 (D) Distractor3``` |
| | |
| | Please refer to our distractor generation model: https://huggingface.co/potsawee/t5-large-generation-race-Distractor |
| | |
| | ## Citation |
| | |
| | ```bibtex |
| | @article{manakul2023mqag, |
| | title={MQAG: Multiple-choice Question Answering and Generation for Assessing Information Consistency in Summarization}, |
| | author={Manakul, Potsawee and Liusie, Adian and Gales, Mark JF}, |
| | journal={arXiv preprint arXiv:2301.12307}, |
| | year={2023} |
| | } |
| | ``` |