| | --- |
| | license: apache-2.0 |
| | datasets: |
| | - squad_v2 |
| | language: |
| | - en |
| | library_name: transformers |
| | pipeline_tag: text-classification |
| | inference: false |
| | --- |
| | # longformer-large-4096 fine-tuned to SQuAD2.0 for answerability score |
| | This model determines whether the question is answerable (or unanswerable) given the context. |
| | The output is a probability where values close to 0.0 indicate that the question is unanswerable and values close to 1.0 means answerable. |
| |
|
| | - Input: `question` and `context` |
| | - Output: `probability` (i.e. logit -> sigmoid) |
| |
|
| | ## Model Details |
| |
|
| | longformer-large-4096 model is fine-tuned to the SQuAD2.0 dataset where the input is a concatenation of ```question + context```. |
| | Due to class imbalance in SQuAD2.0, we resample such that the model is trained on a 50/50 split between answerable and unanswerable samples in SQuAD2.0. |
| |
|
| | ## How to Use the Model |
| |
|
| | Use the code below to get started with the model. |
| |
|
| | ```python |
| | >>> import torch |
| | >>> from transformers import LongformerTokenizer, LongformerForSequenceClassification |
| | |
| | >>> tokenizer = LongformerTokenizer.from_pretrained("potsawee/longformer-large-4096-answerable-squad2") |
| | >>> model = LongformerForSequenceClassification.from_pretrained("potsawee/longformer-large-4096-answerable-squad2") |
| | |
| | >>> context = """ |
| | British government ministers have been banned from using Chinese-owned social media app TikTok on their work phones and devices on security grounds. |
| | The government fears sensitive data held on official phones could be accessed by the Chinese government. |
| | Cabinet Minister Oliver Dowden said the ban was a "precautionary" move but would come into effect immediately. |
| | """.replace("\n", " ").strip() |
| | |
| | >>> question1 = "Which application have been banned by the British government?" |
| | >>> input_text1 = question1 + ' ' + tokenizer.sep_token + ' ' + context |
| | >>> inputs1 = tokenizer(input_text1, max_length=4096, truncation=True, return_tensors="pt") |
| | >>> prob1 = torch.sigmoid(model(**inputs1).logits.squeeze(-1)) |
| | >>> print("P(answerable|question1, context) = {:.2f}%".format(prob1.item()*100)) |
| | P(answerable|question1, context) = 99.21% # highly answerable |
| | |
| | >>> question2 = "Is Facebook popular among young students in America?" |
| | >>> input_text2 = question2 + ' ' + tokenizer.sep_token + ' ' + context |
| | >>> inputs2 = tokenizer(input_text2, max_length=4096, truncation=True, return_tensors="pt") |
| | >>> prob2 = torch.sigmoid(model(**inputs2).logits.squeeze(-1)) |
| | >>> print("P(answerable|question2, context) = {:.2f}%".format(prob2.item()*100)) |
| | P(answerable|question2, context) = 2.53% # highly unanswerable |
| | ``` |
| |
|
| | ## Citation |
| |
|
| | ```bibtex |
| | @misc{manakul2023selfcheckgpt, |
| | title={SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models}, |
| | author={Potsawee Manakul and Adian Liusie and Mark J. F. Gales}, |
| | year={2023}, |
| | eprint={2303.08896}, |
| | archivePrefix={arXiv}, |
| | primaryClass={cs.CL} |
| | } |
| | ``` |