File size: 2,565 Bytes

---
language: en
license: mit
tags:
- authorship-verification
- sentence-transformers
- sentence-similarity
datasets:
- swan07/authorship-verification
metrics:
- accuracy
- auc
model-index:
- name: swan07/bert-authorship-verification
  results:
  - task:
      type: authorship-verification
      name: Authorship Verification
    dataset:
      name: swan07/authorship-verification
      type: authorship-verification
    metrics:
    - type: accuracy
      value: 0.739
      name: Accuracy
    - type: auc
      value: 0.821
      name: AUC
---

# BERT for Authorship Verification

Fine-tuned BERT model for determining if two texts were written by the same author.

## Model Details

- **Base Model**: sentence-transformers/all-MiniLM-L6-v2
- **Training Data**: 50K text pairs from swan07/authorship-verification dataset
- **Task**: Authorship verification (binary classification)
- **Performance**: 73.9% accuracy, 0.821 AUC

## Usage

```python
from sentence_transformers import SentenceTransformer
import numpy as np

# Load model
model = SentenceTransformer('swan07/bert-authorship-verification')

# Encode texts
text1 = "Your first text here"
text2 = "Your second text here"

emb1 = model.encode(text1)
emb2 = model.encode(text2)

# Calculate cosine similarity
similarity = np.dot(emb1, emb2) / (np.linalg.norm(emb1) * np.linalg.norm(emb2))

# Predict
prediction = "Same Author" if similarity >= 0.5 else "Different Authors"
print(f"Prediction: {prediction}")
print(f"Similarity: {similarity:.3f}")
```

## Training

Trained on 50K pairs from the swan07/authorship-verification dataset using:
- Learning rate: 2e-5
- Batch size: 16
- Epochs: 4
- Loss: CosineSimilarityLoss

## Dataset

[swan07/authorship-verification](https://huggingface.co/datasets/swan07/authorship-verification) - 325K text pairs from 12 sources including PAN competitions (2011-2020).

## Citation

```bibtex
@article{manolache2021transferring,
  title={Transferring BERT-like Transformers' Knowledge for Authorship Verification},
  author={Manolache, Andrei and Brad, Florin and Burceanu, Elena and Barbalau, Antonio and Ionescu, Radu Tudor and Popescu, Marius},
  journal={arXiv preprint arXiv:2112.05125},
  year={2021}
}
```

## Links

- **Live Demo**: [same-writer-detector.streamlit.app](https://same-writer-detector.streamlit.app/)
- **Code**: [github.com/swan-07/authorship-verification](https://github.com/swan-07/authorship-verification)
- **Dataset**: [huggingface.co/datasets/swan07/authorship-verification](https://huggingface.co/datasets/swan07/authorship-verification)