First upload.

Files changed (5) hide show

README.md +54 -0
config.json +42 -0
model.safetensors +3 -0
spm.model +3 -0
tokenizer_config.json +4 -0

README.md CHANGED Viewed

@@ -1,3 +1,57 @@
 ---
 license: mit
 ---

 ---
+language: en
+tags:
+  - deberta
+  - deberta-v3
 license: mit
 ---
+# Cross Talk
+## Multiturn Conversation by Reranking.
+Cross Talk is a [DeBERTa V3 Large](https://huggingface.co/microsoft/deberta-v3-large) finetune designed to identify which outgoing text candidates best follow conversational history. It is trained on a "next-like, not next-like" corpus derived from Open Subtitles. It was intended for _recreational use._
+## Concept
+Suppose we have the following exchange in a conversational history:
+```
+[user] I'm out of coffee.
+[bot] What will you do about it?
+[user] I guess I'll buy more.
+```
+And by some generative model, we have three candidate texts:
+1. `Buy more what?`
+2. `Is it expensive?`
+3. `Colorless green ideas sleep furiously.`
+We should be able to score item 2 as the most ideal, and item 3 as the least ideal.
+## Motivation
+This is a product of [an experiment](https://joecooper.me/blog/crosstalk/) to
+run multiturn conversational AI entirely by reranking of candidates.
+In the experiment, a Markov text generator produces wholly random candidates
+with no regard to user input. These candidates are then scored by the model.
+The trained model acts exclusively as a judge.
+## Input / Output
+Elements are scored pointwise; the model should identify whether a given
+candidate in isolation fits the context. To this end, the model is trained
+on context + candidate pairs. The context is presented as series of lines,
+concatenated together using a `|` pipe (token id `1540`), like so:
+```
+[cls] line | line | … [sep] candidate [sep] [pad] …
+```
+Output is a single logit.
+## Training
+Model was trained for 40 hours on a single Nvidia 3090, on 130m tokens of content from Open Subtitles with some pruning and processing of the data. The model scores 92% on the test set, derived from content unrelated to the training set.

config.json ADDED Viewed

	@@ -0,0 +1,42 @@

+{
+  "architectures": [
+    "DebertaV2ForSequenceClassification"
+  ],
+  "attention_probs_dropout_prob": 0.1,
+  "dtype": "float32",
+  "hidden_act": "gelu",
+  "hidden_dropout_prob": 0.1,
+  "hidden_size": 1024,
+  "id2label": {
+    "0": "LABEL_0"
+  },
+  "initializer_range": 0.02,
+  "intermediate_size": 4096,
+  "label2id": {
+    "LABEL_0": 0
+  },
+  "layer_norm_eps": 1e-07,
+  "legacy": true,
+  "max_position_embeddings": 512,
+  "max_relative_positions": -1,
+  "model_type": "deberta-v2",
+  "norm_rel_ebd": "layer_norm",
+  "num_attention_heads": 16,
+  "num_hidden_layers": 24,
+  "pad_token_id": 0,
+  "pooler_dropout": 0,
+  "pooler_hidden_act": "gelu",
+  "pooler_hidden_size": 1024,
+  "pos_att_type": [
+    "p2c",
+    "c2p"
+  ],
+  "position_biased_input": false,
+  "position_buckets": 256,
+  "problem_type": "single_label_classification",
+  "relative_attention": true,
+  "share_att_key": true,
+  "transformers_version": "4.57.0",
+  "type_vocab_size": 0,
+  "vocab_size": 128100
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5fd20627d3656d85c5cf2357a369265501af1a85d1cee5d5cdefa0705aca44bd
+size 1740300340

spm.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c679fbf93643d19aab7ee10c0b99e460bdbc02fedf34b92b05af343b4af586fd
+size 2464616

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+  "do_lower_case": false,
+  "vocab_type": "spm"
+}