Inria-CEDAR
/

FactSpotter-DeBERTaV3-Base

Text Classification

text-embeddings-inference

Model card Files Files and versions

FactSpotter-DeBERTaV3-Base / README.md

guihu's picture

Update README.md

462c20e verified about 2 years ago

|

history blame contribute delete

3.38 kB

	---
	license: mit
	datasets:
	- web_nlg
	language:
	- en
	widget:
	- text: "Bourg-la-Reine is located in France and I love this town. I'm from People's Republic of China. [SEP] A Chinese, Loves, Bourg-la-Reine"
	- text: "Bucharest is a city in Romania. [SEP] Romania \| is located in \| Bucharest"

	---
	# Model card for Inria-CEDAR/FactSpotter-DeBERTaV3-Base

	## Model description

	This model is related to the paper "FactSpotter: Evaluating the Factual Faithfulness of Graph-to-Text Generation".

	Given a triple of format "subject, predicate, object" and a text, the model determines if the triple is present in the text.

	The delimiter can be ", " or " \| ".

	Different from the paper using ELECTRA, this model is finetuned on DeBERTaV3.

	We also provide Small and Large versions of this model:

	https://huggingface.co/Inria-CEDAR/FactSpotter-DeBERTaV3-Small

	https://huggingface.co/Inria-CEDAR/FactSpotter-DeBERTaV3-Large

	## How to use the model

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification


	def sentence_cls_score(input_strings, predicate_cls_model, predicate_cls_tokenizer):
	tokenized_cls_input = predicate_cls_tokenizer(input_strings, truncation=True, padding=True,
	return_token_type_ids=True)
	input_ids = torch.Tensor(tokenized_cls_input['input_ids']).long().to(torch.device("cuda"))
	token_type_ids = torch.Tensor(tokenized_cls_input['token_type_ids']).long().to(torch.device("cuda"))
	attention_mask = torch.Tensor(tokenized_cls_input['attention_mask']).long().to(torch.device("cuda"))
	prev_cls_output = predicate_cls_model(input_ids, attention_mask=attention_mask, token_type_ids=token_type_ids)
	softmax_cls_output = torch.softmax(prev_cls_output.logits, dim=1, )
	return softmax_cls_output


	tokenizer = AutoTokenizer.from_pretrained("Inria-CEDAR/FactSpotter-DeBERTaV3-Base")
	model = AutoModelForSequenceClassification.from_pretrained("Inria-CEDAR/FactSpotter-DeBERTaV3-Base")
	model.to(torch.device("cuda"))

	# pairs of texts (as premises) and triples (as hypotheses)
	cls_texts = [("the aarhus is the airport of aarhus, denmark", "aarhus airport, city served, aarhus, denmark"),
	("aarhus airport is 25.0 metres above the sea level", "aarhus airport, elevation above the sea level, 1174")]
	cls_scores = sentence_cls_score(cls_texts, model, tokenizer)
	# Dimensions: 0-entailment, 1-neutral, 2-contradiction
	label_names = ["entailment", "neutral", "contradiction"]
	```
	## Citation
	If the model is useful to you, please cite the paper

	```
	@inproceedings{zhang:hal-04257838,
	TITLE = {{FactSpotter: Evaluating the Factual Faithfulness of Graph-to-Text Generation}},
	AUTHOR = {Zhang, Kun and Balalau, Oana and Manolescu, Ioana},
	URL = {https://hal.science/hal-04257838},
	BOOKTITLE = {{Findings of EMNLP 2023 - Conference on Empirical Methods in Natural Language Processing}},
	ADDRESS = {Singapore, Singapore},
	YEAR = {2023},
	MONTH = Dec,
	KEYWORDS = {Graph-to-Text Generation ; Factual Faithfulness ; Constrained Text Generation},
	PDF = {https://hal.science/hal-04257838/file/_EMNLP_2023__Evaluating_the_Factual_Faithfulness_of_Graph_to_Text_Generation_Camera.pdf},
	HAL_ID = {hal-04257838},
	HAL_VERSION = {v1},
	}
	```

	## Questions
	If you have some questions, please contact through my email zhangkun@ieee.org or kun.zhang@inria.fr