File size: 2,565 Bytes
a8d748c 985f327 a8d748c 985f327 a8d748c 985f327 a8d748c 985f327 a8d748c 985f327 a8d748c 985f327 a8d748c 985f327 a8d748c 985f327 a8d748c 985f327 a8d748c 985f327 a8d748c 985f327 a8d748c 985f327 a8d748c 985f327 a8d748c 985f327 a8d748c 985f327 a8d748c 985f327 a8d748c 985f327 a8d748c 985f327 a8d748c 985f327 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 | ---
language: en
license: mit
tags:
- authorship-verification
- sentence-transformers
- sentence-similarity
datasets:
- swan07/authorship-verification
metrics:
- accuracy
- auc
model-index:
- name: swan07/bert-authorship-verification
results:
- task:
type: authorship-verification
name: Authorship Verification
dataset:
name: swan07/authorship-verification
type: authorship-verification
metrics:
- type: accuracy
value: 0.739
name: Accuracy
- type: auc
value: 0.821
name: AUC
---
# BERT for Authorship Verification
Fine-tuned BERT model for determining if two texts were written by the same author.
## Model Details
- **Base Model**: sentence-transformers/all-MiniLM-L6-v2
- **Training Data**: 50K text pairs from swan07/authorship-verification dataset
- **Task**: Authorship verification (binary classification)
- **Performance**: 73.9% accuracy, 0.821 AUC
## Usage
```python
from sentence_transformers import SentenceTransformer
import numpy as np
# Load model
model = SentenceTransformer('swan07/bert-authorship-verification')
# Encode texts
text1 = "Your first text here"
text2 = "Your second text here"
emb1 = model.encode(text1)
emb2 = model.encode(text2)
# Calculate cosine similarity
similarity = np.dot(emb1, emb2) / (np.linalg.norm(emb1) * np.linalg.norm(emb2))
# Predict
prediction = "Same Author" if similarity >= 0.5 else "Different Authors"
print(f"Prediction: {prediction}")
print(f"Similarity: {similarity:.3f}")
```
## Training
Trained on 50K pairs from the swan07/authorship-verification dataset using:
- Learning rate: 2e-5
- Batch size: 16
- Epochs: 4
- Loss: CosineSimilarityLoss
## Dataset
[swan07/authorship-verification](https://huggingface.co/datasets/swan07/authorship-verification) - 325K text pairs from 12 sources including PAN competitions (2011-2020).
## Citation
```bibtex
@article{manolache2021transferring,
title={Transferring BERT-like Transformers' Knowledge for Authorship Verification},
author={Manolache, Andrei and Brad, Florin and Burceanu, Elena and Barbalau, Antonio and Ionescu, Radu Tudor and Popescu, Marius},
journal={arXiv preprint arXiv:2112.05125},
year={2021}
}
```
## Links
- **Live Demo**: [same-writer-detector.streamlit.app](https://same-writer-detector.streamlit.app/)
- **Code**: [github.com/swan-07/authorship-verification](https://github.com/swan-07/authorship-verification)
- **Dataset**: [huggingface.co/datasets/swan07/authorship-verification](https://huggingface.co/datasets/swan07/authorship-verification)
|