--- language: en license: mit tags: - authorship-verification - sentence-transformers - sentence-similarity datasets: - swan07/authorship-verification metrics: - accuracy - auc model-index: - name: swan07/bert-authorship-verification results: - task: type: authorship-verification name: Authorship Verification dataset: name: swan07/authorship-verification type: authorship-verification metrics: - type: accuracy value: 0.739 name: Accuracy - type: auc value: 0.821 name: AUC --- # BERT for Authorship Verification Fine-tuned BERT model for determining if two texts were written by the same author. ## Model Details - **Base Model**: sentence-transformers/all-MiniLM-L6-v2 - **Training Data**: 50K text pairs from swan07/authorship-verification dataset - **Task**: Authorship verification (binary classification) - **Performance**: 73.9% accuracy, 0.821 AUC ## Usage ```python from sentence_transformers import SentenceTransformer import numpy as np # Load model model = SentenceTransformer('swan07/bert-authorship-verification') # Encode texts text1 = "Your first text here" text2 = "Your second text here" emb1 = model.encode(text1) emb2 = model.encode(text2) # Calculate cosine similarity similarity = np.dot(emb1, emb2) / (np.linalg.norm(emb1) * np.linalg.norm(emb2)) # Predict prediction = "Same Author" if similarity >= 0.5 else "Different Authors" print(f"Prediction: {prediction}") print(f"Similarity: {similarity:.3f}") ``` ## Training Trained on 50K pairs from the swan07/authorship-verification dataset using: - Learning rate: 2e-5 - Batch size: 16 - Epochs: 4 - Loss: CosineSimilarityLoss ## Dataset [swan07/authorship-verification](https://huggingface.co/datasets/swan07/authorship-verification) - 325K text pairs from 12 sources including PAN competitions (2011-2020). ## Citation ```bibtex @article{manolache2021transferring, title={Transferring BERT-like Transformers' Knowledge for Authorship Verification}, author={Manolache, Andrei and Brad, Florin and Burceanu, Elena and Barbalau, Antonio and Ionescu, Radu Tudor and Popescu, Marius}, journal={arXiv preprint arXiv:2112.05125}, year={2021} } ``` ## Links - **Live Demo**: [same-writer-detector.streamlit.app](https://same-writer-detector.streamlit.app/) - **Code**: [github.com/swan-07/authorship-verification](https://github.com/swan-07/authorship-verification) - **Dataset**: [huggingface.co/datasets/swan07/authorship-verification](https://huggingface.co/datasets/swan07/authorship-verification)