aNameNobodyChose's picture
Update README.md
11a0bc6 verified
---
license: mit
tags:
- quote-attribution
- speaker-identification
- dialogue-attribution
- nlp
- transformers
- bert
language:
- en
datasets:
- aNameNobodyChose/quote-speaker-attribution
---
# πŸ—£οΈ QuoteCaster: Speaker-Aware Quote Encoder
**QuoteCaster** is a fine-tuned BERT-based model designed to encode dialogue quotes along with their surrounding context in order to **identify or group quotes by speaker** β€” even in stories the model has never seen before.
This encoder powers unsupervised or few-shot quote attribution by mapping similar speaking styles (with context) to nearby points in embedding space. Perfect for clustering or nearest-neighbor speaker inference tasks.
---
## πŸ“¦ Model Details
- **Base model**: `bert-base-uncased`
- **Trained with**: Triplet Margin Loss
- **Objective**: Pull quotes from the same speaker together, push different ones apart
- **Input**: `context [SEP] quote`
- **Output**: `[CLS]` embedding as a 768-dimensional vector
---
## πŸ“Š Use Case
QuoteCaster is ideal for:
- 🧠 Clustering quotes by speaker using KMeans or Agglomerative Clustering
- πŸ” Zero-shot speaker inference on unseen stories
- πŸ§ͺ Dialogue structure analysis in novels, scripts, or plays
---
## πŸš€ Example: Inference with QuoteCaster
```python
from transformers import AutoModel, AutoTokenizer
# Load fine-tuned encoder
model = AutoModel.from_pretrained("aNameNobodyChose/quote-caster-encoder")
tokenizer = AutoTokenizer.from_pretrained("aNameNobodyChose/quote-caster-encoder")
# Encode a quote with its surrounding context
def encode_quote(context, quote):
text = f"{context} [SEP] {quote}"
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)
outputs = model(**inputs)
return outputs.last_hidden_state[:, 0, :] # [CLS] token