BookNLP Speaker Attribution Models

Speaker attribution models trained on the PDNC dataset using leave-x-out cross-validation.

Performance Summary

Metric Mean (5-fold) Best Fold (#2)
Test Accuracy 60.54% 72.49%
Test F1 0.5352 0.6852

Per-Fold Results

Fold Test Accuracy Test F1 Note
0 62.27% 0.5580
1 54.27% 0.4427
2 72.49% 0.6852 ⭐ Best
3 56.86% 0.4741
4 56.80% 0.5158

Models

This repository contains 5 fold models trained with leave-x-out split:

  • Leave-x-out: Each fold trains on some novels and tests on completely unseen novels
  • Best for: Processing new books (recommended for audiobook generators)
  • Best fold: split_2 with 72.49% accuracy

Files

leave-x-out/
β”œβ”€β”€ split_0/best_model.model
β”œβ”€β”€ split_1/best_model.model
β”œβ”€β”€ split_2/best_model.model
β”œβ”€β”€ split_3/best_model.model
└── split_4/best_model.model
evaluation_results.json

Usage

Get all fold scores via API

from huggingface_hub import model_info

info = model_info("bodyanats/booknlp-plus-speaker-attribution")
folds = info.card_data["folds"]

# Show all folds
for fold_id, data in folds.items():
    print(f"Fold {fold_id}: Acc={data['accuracy']:.2%}, F1={data['f1']:.4f}")

# Get best fold
best_fold = info.card_data["best_fold"]
print(f"\nBest: Fold {best_fold} ({folds[str(best_fold)]['accuracy']:.2%})")

Download best model (via API)

from huggingface_hub import model_info, hf_hub_download

info = model_info("bodyanats/booknlp-plus-speaker-attribution")
best_path = info.card_data["best_model_path"]
model_path = hf_hub_download(repo_id="bodyanats/booknlp-plus-speaker-attribution", filename=best_path)

Download specific fold model

from huggingface_hub import model_info, hf_hub_download

info = model_info("bodyanats/booknlp-plus-speaker-attribution")
fold_id = "2"  # Choose fold 0-4
model_path = hf_hub_download(
    repo_id="bodyanats/booknlp-plus-speaker-attribution", 
    filename=info.card_data["folds"][fold_id]["model_path"]
)

Download all models for ensemble

from huggingface_hub import snapshot_download

local_dir = snapshot_download(repo_id="bodyanats/booknlp-plus-speaker-attribution")

Training

Trained using the speaker-attribution-acl2023 repository.

Paper: Improving Automatic Quotation Attribution in Literary Novels (ACL 2023)

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Evaluation results