BookNLP Speaker Attribution Models

Speaker attribution models trained on the PDNC dataset using leave-x-out cross-validation.

Performance Summary

Metric	Mean (5-fold)	Best Fold (#2)
Test Accuracy	60.54%	72.49%
Test F1	0.5352	0.6852

Per-Fold Results

Fold	Test Accuracy	Test F1	Note
0	62.27%	0.5580
1	54.27%	0.4427
2	72.49%	0.6852	⭐ Best
3	56.86%	0.4741
4	56.80%	0.5158

Models

This repository contains 5 fold models trained with leave-x-out split:

Leave-x-out: Each fold trains on some novels and tests on completely unseen novels
Best for: Processing new books (recommended for audiobook generators)
Best fold: split_2 with 72.49% accuracy

Files

leave-x-out/
├── split_0/best_model.model
├── split_1/best_model.model
├── split_2/best_model.model
├── split_3/best_model.model
└── split_4/best_model.model
evaluation_results.json

Usage

Get all fold scores via API

from huggingface_hub import model_info

info = model_info("bodyanats/booknlp-plus-speaker-attribution")
folds = info.card_data["folds"]

# Show all folds
for fold_id, data in folds.items():
    print(f"Fold {fold_id}: Acc={data['accuracy']:.2%}, F1={data['f1']:.4f}")

# Get best fold
best_fold = info.card_data["best_fold"]
print(f"\nBest: Fold {best_fold} ({folds[str(best_fold)]['accuracy']:.2%})")

Download best model (via API)

from huggingface_hub import model_info, hf_hub_download

info = model_info("bodyanats/booknlp-plus-speaker-attribution")
best_path = info.card_data["best_model_path"]
model_path = hf_hub_download(repo_id="bodyanats/booknlp-plus-speaker-attribution", filename=best_path)

Download specific fold model

from huggingface_hub import model_info, hf_hub_download

info = model_info("bodyanats/booknlp-plus-speaker-attribution")
fold_id = "2"  # Choose fold 0-4
model_path = hf_hub_download(
    repo_id="bodyanats/booknlp-plus-speaker-attribution", 
    filename=info.card_data["folds"][fold_id]["model_path"]
)

Download all models for ensemble

from huggingface_hub import snapshot_download

local_dir = snapshot_download(repo_id="bodyanats/booknlp-plus-speaker-attribution")

Training

Trained using the speaker-attribution-acl2023 repository.

Paper: Improving Automatic Quotation Attribution in Literary Novels (ACL 2023)

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Evaluation results

Mean Test Accuracy (5-fold) on PDNC (Leave-x-out)
self-reported

0.605
Mean Test F1 (5-fold) on PDNC (Leave-x-out)
self-reported

0.535
Best Fold Test Accuracy on PDNC (Leave-x-out)
self-reported

0.725
Best Fold Test F1 on PDNC (Leave-x-out)
self-reported

0.685