| | ---
|
| | language: en
|
| | tags:
|
| | - deberta
|
| | - deberta-v3
|
| | - text-classification
|
| | - conversational
|
| | - dialogue
|
| | - reranking
|
| | pipeline_tag: text-classification
|
| | license: mit
|
| | ---
|
| |
|
| | # Cross Talk
|
| |
|
| | ## Multiturn Conversation by Reranking.
|
| |
|
| | Cross Talk is a [DeBERTa V3 Large](https://huggingface.co/microsoft/deberta-v3-large) finetune designed to identify which outgoing text candidates best follow conversational history. It is trained on a "next-like, not next-like" corpus derived from Open Subtitles. It was intended for _recreational use._
|
| |
|
| | ## Concept
|
| |
|
| | Suppose we have the following exchange in a conversational history:
|
| |
|
| | ```
|
| | [user] I'm out of coffee.
|
| | [bot] What will you do about it?
|
| | [user] I guess I'll buy more.
|
| | ```
|
| |
|
| | And by some generative model, we have three candidate texts:
|
| |
|
| | 1. `Buy more what?`
|
| | 2. `Is it expensive?`
|
| | 3. `Colorless green ideas sleep furiously.`
|
| |
|
| | We should be able to score item 2 as the most ideal, and item 3 as the least ideal.
|
| |
|
| | ## Motivation
|
| |
|
| | This is a product of [an experiment](https://joecooper.me/blog/crosstalk/) to
|
| | run multiturn conversational AI entirely by reranking of candidates.
|
| |
|
| | In the experiment, a Markov text generator produces wholly random candidates
|
| | with no regard to user input. These candidates are then scored by the model.
|
| | The trained model acts exclusively as a judge.
|
| |
|
| | ## Input / Output
|
| |
|
| | Elements are scored pointwise; the model should identify whether a given
|
| | candidate in isolation fits the context. To this end, the model is trained
|
| | on context + candidate pairs. The context is presented as series of lines,
|
| | concatenated together using a `|` pipe (token id `1540`), like so:
|
| |
|
| | ```
|
| | [cls] line | line | … [sep] candidate [sep] [pad] …
|
| | ```
|
| |
|
| | Output is a single logit.
|
| |
|
| | Line speaker is _not_ labeled, and there is no "assistant" / "user" distinction.
|
| | The text under consideration is the outgoing text, and the last line of context
|
| | is the proximate stimulus.
|
| |
|
| | ## Usage
|
| |
|
| | Model is loaded exactly like DeBERTa-v3, but with model id `thejoephase/crosstalk`.
|
| |
|
| | ```python
|
| | from transformers import AutoTokenizer, DebertaV2ForSequenceClassification
|
| | import torch
|
| |
|
| | tokenizer = AutoTokenizer.from_pretrained("thejoephase/crosstalk", use_fast=False)
|
| | model = DebertaV2ForSequenceClassification.from_pretrained(
|
| | "thejoephase/crosstalk",
|
| | num_labels=1)
|
| |
|
| | history = [
|
| | "I'm out of coffee.",
|
| | "What will you do about it?",
|
| | "I guess I'll buy more."
|
| | ]
|
| | context = '|'.join(history)
|
| | candidate = "Is it expensive?"
|
| |
|
| | inputs = tokenizer(context, candidate, return_tensors="pt")
|
| |
|
| | with torch.no_grad():
|
| | logit = model(**inputs).logits.item()
|
| | score = torch.sigmoid(torch.tensor(logit)).item()
|
| |
|
| | print(f"Score: {score:.4f}") # Higher = better fit
|
| | ```
|
| |
|
| | ## Training
|
| |
|
| | Model was trained for 40 hours on a single Nvidia 3090, on 130m tokens of content from Open Subtitles with some pruning and processing of the data. The model scores 92% on the test set, derived from content unrelated to the training set.
|
| |
|