Rethinking the Authorship Verification Experimental Setups
Paper
•
2112.05125
•
Published
Fine-tuned BERT model for determining if two texts were written by the same author.
from sentence_transformers import SentenceTransformer
import numpy as np
# Load model
model = SentenceTransformer('swan07/bert-authorship-verification')
# Encode texts
text1 = "Your first text here"
text2 = "Your second text here"
emb1 = model.encode(text1)
emb2 = model.encode(text2)
# Calculate cosine similarity
similarity = np.dot(emb1, emb2) / (np.linalg.norm(emb1) * np.linalg.norm(emb2))
# Predict
prediction = "Same Author" if similarity >= 0.5 else "Different Authors"
print(f"Prediction: {prediction}")
print(f"Similarity: {similarity:.3f}")
Trained on 50K pairs from the swan07/authorship-verification dataset using:
swan07/authorship-verification - 325K text pairs from 12 sources including PAN competitions (2011-2020).
@article{manolache2021transferring,
title={Transferring BERT-like Transformers' Knowledge for Authorship Verification},
author={Manolache, Andrei and Brad, Florin and Burceanu, Elena and Barbalau, Antonio and Ionescu, Radu Tudor and Popescu, Marius},
journal={arXiv preprint arXiv:2112.05125},
year={2021}
}