cestwc
/

roberta-base-bib

Text Classification

Model card Files Files and versions

roberta-base-bib / README.md

cestwc's picture

Update README.md

dc99ac2 verified almost 2 years ago

|

history blame contribute delete

2.31 kB

	---
	datasets:
	- cestwc/anthology
	metrics:
	- accuracy
	- f1
	pipeline_tag: text-classification
	widget:
	- text: "Evaluating and Enhancing the Robustness of Neural Network-based Dependency Parsing Models with Adversarial Examples </s> Assessing Hidden Risks of LLMs: An Empirical Study on Robustness, Consistency, and Credibility"
	example_title: "Example 1"
	- text: "Incongruent Headlines: Yet Another Way to Mislead Your Readers </s> Emotion Cause Extraction - A Review of Various Methods and Corpora"
	example_title: "Example 2"
	---

	# Bibtex classification using RoBERTa

	## Model Description
	This model is a text classification tool designed to predict the likelihood of a given context paper being cited by a query paper. It processes concatenated titles of context and query papers and outputs a binary prediction: `1` indicates a potential citation relationship (though not necessary), and `0` suggests no such relationship.

	### Intended Use
	- Primary Use: To extract a subset of bibtex from ACL Anthology to make it < 50 MB.

	### Model Training
	- Data Description: The model was trained on a ACL Anthology dataset [cestwc/anthology](https://huggingface.co/datasets/cestwc/anthology) comprising pairs of paper titles. Each pair was annotated to indicate whether the context paper could potentially be cited by the query paper.

	### Performance
	- Metrics: [Include performance metrics like accuracy, precision, recall, F1-score, etc.]

	## How to Use
	```python
	from transformers import AutoModelForSequenceClassification, AutoTokenizer

	model_name = "cestwc/roberta-base-bib"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForSequenceClassification.from_pretrained(model_name)

	def predict_citation(context_title, query_title):
	inputs = tokenizer.encode_plus(f"{context_title} </s> {query_title}", return_tensors="pt")
	outputs = model(**inputs)
	prediction = outputs.logits.argmax(-1).item()
	return "include" if prediction == 1 else "not include"

	# Example
	context_title = "Evaluating and Enhancing the Robustness of Neural Network-based Dependency Parsing Models with Adversarial Examples"
	query_title = "Assessing Hidden Risks of LLMs: An Empirical Study on Robustness, Consistency, and Credibility"
	print(predict_citation(context_title, query_title))