File size: 2,310 Bytes
1c9254b
 
 
 
 
 
61e2314
 
8aa6b66
 
 
 
dc99ac2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
---
datasets:
- cestwc/anthology
metrics:
- accuracy
- f1
pipeline_tag: text-classification
widget:
- text: "Evaluating and Enhancing the Robustness of Neural Network-based Dependency Parsing Models with Adversarial Examples </s> Assessing Hidden Risks of LLMs: An Empirical Study on Robustness, Consistency, and Credibility"
  example_title: "Example 1"
- text: "Incongruent Headlines: Yet Another Way to Mislead Your Readers </s> Emotion Cause Extraction - A Review of Various Methods and Corpora"
  example_title: "Example 2"
---

# Bibtex classification using RoBERTa

## Model Description
This model is a text classification tool designed to predict the likelihood of a given context paper being cited by a query paper. It processes concatenated titles of context and query papers and outputs a binary prediction: `1` indicates a potential citation relationship (though not necessary), and `0` suggests no such relationship.

### Intended Use
- **Primary Use**: To extract a subset of bibtex from ACL Anthology to make it < 50 MB.

### Model Training
- **Data Description**: The model was trained on a ACL Anthology dataset [cestwc/anthology](https://huggingface.co/datasets/cestwc/anthology) comprising pairs of paper titles. Each pair was annotated to indicate whether the context paper could potentially be cited by the query paper.

### Performance
- **Metrics**: [Include performance metrics like accuracy, precision, recall, F1-score, etc.]

## How to Use
```python
from transformers import AutoModelForSequenceClassification, AutoTokenizer

model_name = "cestwc/roberta-base-bib"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

def predict_citation(context_title, query_title):
    inputs = tokenizer.encode_plus(f"{context_title} </s> {query_title}", return_tensors="pt")
    outputs = model(**inputs)
    prediction = outputs.logits.argmax(-1).item()
    return "include" if prediction == 1 else "not include"

# Example
context_title = "Evaluating and Enhancing the Robustness of Neural Network-based Dependency Parsing Models with Adversarial Examples"
query_title = "Assessing Hidden Risks of LLMs: An Empirical Study on Robustness, Consistency, and Credibility"
print(predict_citation(context_title, query_title))