swan07's picture
Upload README.md with huggingface_hub
985f327 verified
metadata
language: en
license: mit
tags:
  - authorship-verification
  - sentence-transformers
  - sentence-similarity
datasets:
  - swan07/authorship-verification
metrics:
  - accuracy
  - auc
model-index:
  - name: swan07/bert-authorship-verification
    results:
      - task:
          type: authorship-verification
          name: Authorship Verification
        dataset:
          name: swan07/authorship-verification
          type: authorship-verification
        metrics:
          - type: accuracy
            value: 0.739
            name: Accuracy
          - type: auc
            value: 0.821
            name: AUC

BERT for Authorship Verification

Fine-tuned BERT model for determining if two texts were written by the same author.

Model Details

  • Base Model: sentence-transformers/all-MiniLM-L6-v2
  • Training Data: 50K text pairs from swan07/authorship-verification dataset
  • Task: Authorship verification (binary classification)
  • Performance: 73.9% accuracy, 0.821 AUC

Usage

from sentence_transformers import SentenceTransformer
import numpy as np

# Load model
model = SentenceTransformer('swan07/bert-authorship-verification')

# Encode texts
text1 = "Your first text here"
text2 = "Your second text here"

emb1 = model.encode(text1)
emb2 = model.encode(text2)

# Calculate cosine similarity
similarity = np.dot(emb1, emb2) / (np.linalg.norm(emb1) * np.linalg.norm(emb2))

# Predict
prediction = "Same Author" if similarity >= 0.5 else "Different Authors"
print(f"Prediction: {prediction}")
print(f"Similarity: {similarity:.3f}")

Training

Trained on 50K pairs from the swan07/authorship-verification dataset using:

  • Learning rate: 2e-5
  • Batch size: 16
  • Epochs: 4
  • Loss: CosineSimilarityLoss

Dataset

swan07/authorship-verification - 325K text pairs from 12 sources including PAN competitions (2011-2020).

Citation

@article{manolache2021transferring,
  title={Transferring BERT-like Transformers' Knowledge for Authorship Verification},
  author={Manolache, Andrei and Brad, Florin and Burceanu, Elena and Barbalau, Antonio and Ionescu, Radu Tudor and Popescu, Marius},
  journal={arXiv preprint arXiv:2112.05125},
  year={2021}
}

Links