---
language: en
tags:
  - deberta
  - deberta-v3
  - text-classification
  - conversational
  - dialogue
  - reranking
pipeline_tag: text-classification
license: mit
---

# Cross Talk

## Multiturn Conversation by Reranking.

Cross Talk is a [DeBERTa V3 Large](https://huggingface.co/microsoft/deberta-v3-large) finetune designed to identify which outgoing text candidates best follow conversational history. It is trained on a "next-like, not next-like" corpus derived from Open Subtitles. It was intended for _recreational use._

## Concept

Suppose we have the following exchange in a conversational history:

```
[user] I'm out of coffee.
[bot] What will you do about it?
[user] I guess I'll buy more.
```

And by some generative model, we have three candidate texts:

1. `Buy more what?`
2. `Is it expensive?`
3. `Colorless green ideas sleep furiously.`

We should be able to score item 2 as the most ideal, and item 3 as the least ideal.

## Motivation

This is a product of [an experiment](https://joecooper.me/blog/crosstalk/) to
run multiturn conversational AI entirely by reranking of candidates.

In the experiment, a Markov text generator produces wholly random candidates
with no regard to user input. These candidates are then scored by the model.
The trained model acts exclusively as a judge.

## Input / Output

Elements are scored pointwise; the model should identify whether a given
candidate in isolation fits the context. To this end, the model is trained
on context + candidate pairs. The context is presented as series of lines,
concatenated together using a `|` pipe (token id `1540`), like so:

```
[cls] line | line | … [sep] candidate [sep] [pad] …
```

Output is a single logit.

Line speaker is _not_ labeled, and there is no "assistant" / "user" distinction.
The text under consideration is the outgoing text, and the last line of context
is the proximate stimulus.

## Usage

Model is loaded exactly like DeBERTa-v3, but with model id `thejoephase/crosstalk`.

```python
from transformers import AutoTokenizer, DebertaV2ForSequenceClassification
import torch

tokenizer = AutoTokenizer.from_pretrained("thejoephase/crosstalk", use_fast=False)
model = DebertaV2ForSequenceClassification.from_pretrained(
    "thejoephase/crosstalk",
    num_labels=1)

history = [
    "I'm out of coffee.",
    "What will you do about it?",
    "I guess I'll buy more."
]
context = '|'.join(history)
candidate = "Is it expensive?"

inputs = tokenizer(context, candidate, return_tensors="pt")

with torch.no_grad():
    logit = model(**inputs).logits.item()
    score = torch.sigmoid(torch.tensor(logit)).item()

print(f"Score: {score:.4f}")  # Higher = better fit
```

## Training

Model was trained for 40 hours on a single Nvidia 3090, on 130m tokens of content from Open Subtitles with some pruning and processing of the data. The model scores 92% on the test set, derived from content unrelated to the training set.