|
|
--- |
|
|
datasets: |
|
|
- cestwc/anthology |
|
|
metrics: |
|
|
- accuracy |
|
|
- f1 |
|
|
pipeline_tag: text-classification |
|
|
widget: |
|
|
- text: "Evaluating and Enhancing the Robustness of Neural Network-based Dependency Parsing Models with Adversarial Examples </s> Assessing Hidden Risks of LLMs: An Empirical Study on Robustness, Consistency, and Credibility" |
|
|
example_title: "Example 1" |
|
|
- text: "Incongruent Headlines: Yet Another Way to Mislead Your Readers </s> Emotion Cause Extraction - A Review of Various Methods and Corpora" |
|
|
example_title: "Example 2" |
|
|
--- |
|
|
|
|
|
# Bibtex classification using RoBERTa |
|
|
|
|
|
## Model Description |
|
|
This model is a text classification tool designed to predict the likelihood of a given context paper being cited by a query paper. It processes concatenated titles of context and query papers and outputs a binary prediction: `1` indicates a potential citation relationship (though not necessary), and `0` suggests no such relationship. |
|
|
|
|
|
### Intended Use |
|
|
- **Primary Use**: To extract a subset of bibtex from ACL Anthology to make it < 50 MB. |
|
|
|
|
|
### Model Training |
|
|
- **Data Description**: The model was trained on a ACL Anthology dataset [cestwc/anthology](https://huggingface.co/datasets/cestwc/anthology) comprising pairs of paper titles. Each pair was annotated to indicate whether the context paper could potentially be cited by the query paper. |
|
|
|
|
|
### Performance |
|
|
- **Metrics**: [Include performance metrics like accuracy, precision, recall, F1-score, etc.] |
|
|
|
|
|
## How to Use |
|
|
```python |
|
|
from transformers import AutoModelForSequenceClassification, AutoTokenizer |
|
|
|
|
|
model_name = "cestwc/roberta-base-bib" |
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
|
model = AutoModelForSequenceClassification.from_pretrained(model_name) |
|
|
|
|
|
def predict_citation(context_title, query_title): |
|
|
inputs = tokenizer.encode_plus(f"{context_title} </s> {query_title}", return_tensors="pt") |
|
|
outputs = model(**inputs) |
|
|
prediction = outputs.logits.argmax(-1).item() |
|
|
return "include" if prediction == 1 else "not include" |
|
|
|
|
|
# Example |
|
|
context_title = "Evaluating and Enhancing the Robustness of Neural Network-based Dependency Parsing Models with Adversarial Examples" |
|
|
query_title = "Assessing Hidden Risks of LLMs: An Empirical Study on Robustness, Consistency, and Credibility" |
|
|
print(predict_citation(context_title, query_title)) |
|
|
|