BookNLP Speaker Attribution Models
Speaker attribution models trained on the PDNC dataset using leave-x-out cross-validation.
Performance Summary
| Metric | Mean (5-fold) | Best Fold (#2) |
|---|---|---|
| Test Accuracy | 60.54% | 72.49% |
| Test F1 | 0.5352 | 0.6852 |
Per-Fold Results
| Fold | Test Accuracy | Test F1 | Note |
|---|---|---|---|
| 0 | 62.27% | 0.5580 | |
| 1 | 54.27% | 0.4427 | |
| 2 | 72.49% | 0.6852 | β Best |
| 3 | 56.86% | 0.4741 | |
| 4 | 56.80% | 0.5158 |
Models
This repository contains 5 fold models trained with leave-x-out split:
- Leave-x-out: Each fold trains on some novels and tests on completely unseen novels
- Best for: Processing new books (recommended for audiobook generators)
- Best fold:
split_2with 72.49% accuracy
Files
leave-x-out/
βββ split_0/best_model.model
βββ split_1/best_model.model
βββ split_2/best_model.model
βββ split_3/best_model.model
βββ split_4/best_model.model
evaluation_results.json
Usage
Get all fold scores via API
from huggingface_hub import model_info
info = model_info("bodyanats/booknlp-plus-speaker-attribution")
folds = info.card_data["folds"]
# Show all folds
for fold_id, data in folds.items():
print(f"Fold {fold_id}: Acc={data['accuracy']:.2%}, F1={data['f1']:.4f}")
# Get best fold
best_fold = info.card_data["best_fold"]
print(f"\nBest: Fold {best_fold} ({folds[str(best_fold)]['accuracy']:.2%})")
Download best model (via API)
from huggingface_hub import model_info, hf_hub_download
info = model_info("bodyanats/booknlp-plus-speaker-attribution")
best_path = info.card_data["best_model_path"]
model_path = hf_hub_download(repo_id="bodyanats/booknlp-plus-speaker-attribution", filename=best_path)
Download specific fold model
from huggingface_hub import model_info, hf_hub_download
info = model_info("bodyanats/booknlp-plus-speaker-attribution")
fold_id = "2" # Choose fold 0-4
model_path = hf_hub_download(
repo_id="bodyanats/booknlp-plus-speaker-attribution",
filename=info.card_data["folds"][fold_id]["model_path"]
)
Download all models for ensemble
from huggingface_hub import snapshot_download
local_dir = snapshot_download(repo_id="bodyanats/booknlp-plus-speaker-attribution")
Training
Trained using the speaker-attribution-acl2023 repository.
Paper: Improving Automatic Quotation Attribution in Literary Novels (ACL 2023)
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support
Evaluation results
- Mean Test Accuracy (5-fold) on PDNC (Leave-x-out)self-reported0.605
- Mean Test F1 (5-fold) on PDNC (Leave-x-out)self-reported0.535
- Best Fold Test Accuracy on PDNC (Leave-x-out)self-reported0.725
- Best Fold Test F1 on PDNC (Leave-x-out)self-reported0.685