Commit ·
7b605d8
1
Parent(s): afacc0b
initial commit
Browse files- results/runs/Apr01_01-35-38_my-fastai-instance/1648776940.8445003/events.out.tfevents.1648776940.my-fastai-instance.3351.1 +3 -0
- results/runs/Apr01_01-35-38_my-fastai-instance/events.out.tfevents.1648776940.my-fastai-instance.3351.0 +3 -0
- results/runs/Apr01_01-40-32_my-fastai-instance/1648777241.5917044/events.out.tfevents.1648777241.my-fastai-instance.1257.1 +3 -0
- results/runs/Apr01_01-40-32_my-fastai-instance/events.out.tfevents.1648777241.my-fastai-instance.1257.0 +3 -0
- summarize-linydub.ipynb +540 -0
- test.json +27 -0
- train.json +27 -0
- val.json +27 -0
results/runs/Apr01_01-35-38_my-fastai-instance/1648776940.8445003/events.out.tfevents.1648776940.my-fastai-instance.3351.1
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:cb9c54d10c29b6bfc4a2f208eeef2788426484cbd893dfdda066c63d7254731e
|
| 3 |
+
size 4976
|
results/runs/Apr01_01-35-38_my-fastai-instance/events.out.tfevents.1648776940.my-fastai-instance.3351.0
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:b4492c90cc11ce44cdfd38e553875437c1580f10a896fbf60ab92cb400fae61b
|
| 3 |
+
size 4348
|
results/runs/Apr01_01-40-32_my-fastai-instance/1648777241.5917044/events.out.tfevents.1648777241.my-fastai-instance.1257.1
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:faea226f3d63e936b3d31a37ffb0d41f7d2841903e6180c6a7fbf3c977de899e
|
| 3 |
+
size 4976
|
results/runs/Apr01_01-40-32_my-fastai-instance/events.out.tfevents.1648777241.my-fastai-instance.1257.0
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:2111c3e33f69a6d90cc3d536e96f434897b76436c8cfdce6999857d47ffd39e4
|
| 3 |
+
size 5494
|
summarize-linydub.ipynb
ADDED
|
@@ -0,0 +1,540 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"cells": [
|
| 3 |
+
{
|
| 4 |
+
"cell_type": "markdown",
|
| 5 |
+
"id": "fe34671e-5117-4b12-a2b5-dc07fbb49021",
|
| 6 |
+
"metadata": {},
|
| 7 |
+
"source": [
|
| 8 |
+
"## Testing out Hugging Face Inference API. Goal is to get working model & inference API on Hugging Face hub."
|
| 9 |
+
]
|
| 10 |
+
},
|
| 11 |
+
{
|
| 12 |
+
"cell_type": "markdown",
|
| 13 |
+
"id": "92575bb9-bcdf-4357-aa08-e8814b02eafb",
|
| 14 |
+
"metadata": {},
|
| 15 |
+
"source": [
|
| 16 |
+
"For starters, just trying out inference api from starter script here: https://api-inference.huggingface.co/docs/python/html/detailed_parameters.html#summarization-task\n"
|
| 17 |
+
]
|
| 18 |
+
},
|
| 19 |
+
{
|
| 20 |
+
"cell_type": "code",
|
| 21 |
+
"execution_count": 1,
|
| 22 |
+
"id": "f44dba0e-72bd-49be-93a8-68895dbf994d",
|
| 23 |
+
"metadata": {},
|
| 24 |
+
"outputs": [
|
| 25 |
+
{
|
| 26 |
+
"data": {
|
| 27 |
+
"text/plain": [
|
| 28 |
+
"[{'summary_text': 'CNN.com is celebrating its 10th anniversary this year. We are celebrating by asking our team members to share their thoughts and ideas. We want to hear from you, our readers, about what you think and what you want to share with the world. Share your ideas and help our team to become a little bit better today.'}]"
|
| 29 |
+
]
|
| 30 |
+
},
|
| 31 |
+
"execution_count": 1,
|
| 32 |
+
"metadata": {},
|
| 33 |
+
"output_type": "execute_result"
|
| 34 |
+
}
|
| 35 |
+
],
|
| 36 |
+
"source": [
|
| 37 |
+
"import json\n",
|
| 38 |
+
"\n",
|
| 39 |
+
"import requests\n",
|
| 40 |
+
"\n",
|
| 41 |
+
"API_TOKEN = \"hugging_face_access_token\"\n",
|
| 42 |
+
"\n",
|
| 43 |
+
"headers = {\"Authorization\": f\"Bearer {API_TOKEN}\"}\n",
|
| 44 |
+
"API_URL = \"https://api-inference.huggingface.co/models/facebook/bart-large-cnn\"\n",
|
| 45 |
+
"\n",
|
| 46 |
+
"# frank question = why POST? Maybe Willian can help answer that\n",
|
| 47 |
+
"def query(payload):\n",
|
| 48 |
+
" data = json.dumps(payload)\n",
|
| 49 |
+
" response = requests.request(\"POST\", API_URL, headers=headers, data=data)\n",
|
| 50 |
+
" return json.loads(response.content.decode(\"utf-8\"))\n",
|
| 51 |
+
"\n",
|
| 52 |
+
"data = query(\n",
|
| 53 |
+
" {\n",
|
| 54 |
+
" \"inputs\": \"Picture this it’s the early morning, I’m sitting down with a hot cup of French roast coffee. I’m putting my AirPods in everything’s quiet. My mind is fresh and there are no distractions. I have a feeling of anticipation. It’s a mix of focus, excitement, and a tiny bit of apprehension. I’m about to hear your voice because some about to listen, to ponder replies. I have the space to connect with my team members to listen to them and learn something about you. This usually forces me to learn something about myself _____. I press play, and the learning begins. Thank you for being engaged here and sharing your ideas and helping our team to become just a little bit better today.\",\n",
|
| 55 |
+
" \"parameters\": {\"do_sample\": False},\n",
|
| 56 |
+
" }\n",
|
| 57 |
+
")\n",
|
| 58 |
+
"\n",
|
| 59 |
+
"data"
|
| 60 |
+
]
|
| 61 |
+
},
|
| 62 |
+
{
|
| 63 |
+
"cell_type": "markdown",
|
| 64 |
+
"id": "3e786be3-d888-4faa-9527-11237fea3882",
|
| 65 |
+
"metadata": {},
|
| 66 |
+
"source": [
|
| 67 |
+
"Ok. Crappy summary, but you can tell it works from the last sentence..."
|
| 68 |
+
]
|
| 69 |
+
},
|
| 70 |
+
{
|
| 71 |
+
"cell_type": "markdown",
|
| 72 |
+
"id": "5d3455b8-ec89-49d2-be27-a2e87218d902",
|
| 73 |
+
"metadata": {},
|
| 74 |
+
"source": [
|
| 75 |
+
"### Next, fine tune linydub and push model to hugging face hub¶"
|
| 76 |
+
]
|
| 77 |
+
},
|
| 78 |
+
{
|
| 79 |
+
"cell_type": "markdown",
|
| 80 |
+
"id": "4cdd2db4-95ea-4212-9f99-8d65f62fb349",
|
| 81 |
+
"metadata": {},
|
| 82 |
+
"source": [
|
| 83 |
+
"We need the model..."
|
| 84 |
+
]
|
| 85 |
+
},
|
| 86 |
+
{
|
| 87 |
+
"cell_type": "code",
|
| 88 |
+
"execution_count": 2,
|
| 89 |
+
"id": "ca901ecf-7369-47f9-bbdd-ddc358cc8ff1",
|
| 90 |
+
"metadata": {},
|
| 91 |
+
"outputs": [],
|
| 92 |
+
"source": [
|
| 93 |
+
"from transformers import AutoTokenizer, AutoModelForSeq2SeqLM\n",
|
| 94 |
+
"\n",
|
| 95 |
+
"tokenizer = AutoTokenizer.from_pretrained(\"linydub/bart-large-samsum\")\n",
|
| 96 |
+
"\n",
|
| 97 |
+
"model = AutoModelForSeq2SeqLM.from_pretrained(\"linydub/bart-large-samsum\")"
|
| 98 |
+
]
|
| 99 |
+
},
|
| 100 |
+
{
|
| 101 |
+
"cell_type": "markdown",
|
| 102 |
+
"id": "f88d654b-f9de-421d-aab0-bd8522e01e69",
|
| 103 |
+
"metadata": {},
|
| 104 |
+
"source": [
|
| 105 |
+
"We need training data...\n",
|
| 106 |
+
"\n",
|
| 107 |
+
"Let's see what the lindydub training data looks like so we can replicate the format for fine-tuning. https://github.com/linydub/azureml-greenai-txtsum/tree/main/examples/assets/data/hf-samsum/train\n",
|
| 108 |
+
"\n",
|
| 109 |
+
"I had no idea how to open an arrow file, so I got the samsum data from here instead: https://paperswithcode.com/dataset/samsum-corpus\n",
|
| 110 |
+
"\n",
|
| 111 |
+
"And then this blog post helped me figure out the training part: https://medium.com/rocket-mortgage-technology-blog/conversational-summarization-with-natural-language-processing-c073a6bcaa3a\n"
|
| 112 |
+
]
|
| 113 |
+
},
|
| 114 |
+
{
|
| 115 |
+
"cell_type": "code",
|
| 116 |
+
"execution_count": 3,
|
| 117 |
+
"id": "cea91bb7-b1eb-4363-841a-88502a368669",
|
| 118 |
+
"metadata": {},
|
| 119 |
+
"outputs": [],
|
| 120 |
+
"source": [
|
| 121 |
+
"val_path = 'val.json'\n",
|
| 122 |
+
"test_path = 'test.json'\n",
|
| 123 |
+
"train_path = 'train.json'\n",
|
| 124 |
+
"\n",
|
| 125 |
+
"with open(val_path) as in_file:\n",
|
| 126 |
+
" val = json.load(in_file)\n",
|
| 127 |
+
" in_file.close()\n",
|
| 128 |
+
"\n",
|
| 129 |
+
"with open(test_path) as in_file:\n",
|
| 130 |
+
" test = json.load(in_file)\n",
|
| 131 |
+
" in_file.close()\n",
|
| 132 |
+
"\n",
|
| 133 |
+
"with open(train_path) as in_file:\n",
|
| 134 |
+
" train = json.load(in_file)\n",
|
| 135 |
+
" in_file.close()\n",
|
| 136 |
+
" \n",
|
| 137 |
+
"data = train + test + val\n",
|
| 138 |
+
"assert len(data) == len(train) + len(test) + len(val)"
|
| 139 |
+
]
|
| 140 |
+
},
|
| 141 |
+
{
|
| 142 |
+
"cell_type": "code",
|
| 143 |
+
"execution_count": 4,
|
| 144 |
+
"id": "a4fc8ad6-1c3c-4839-9db7-ea811d8d1cc3",
|
| 145 |
+
"metadata": {},
|
| 146 |
+
"outputs": [],
|
| 147 |
+
"source": [
|
| 148 |
+
"import pandas as pd\n",
|
| 149 |
+
"df = pd.DataFrame(data)\n",
|
| 150 |
+
"df['dialogue'] = df['dialogue'].str.replace('\\r', '')\n",
|
| 151 |
+
"df['dialogue'] = df['dialogue'].str.replace('\\n', '')\n",
|
| 152 |
+
"df['summary'] = df['summary'].str.replace('\\r', '')\n",
|
| 153 |
+
"df['summary'] = df['summary'].str.replace('\\n', '')"
|
| 154 |
+
]
|
| 155 |
+
},
|
| 156 |
+
{
|
| 157 |
+
"cell_type": "code",
|
| 158 |
+
"execution_count": 5,
|
| 159 |
+
"id": "4db4c0de-1a68-4977-b10b-694923f692e1",
|
| 160 |
+
"metadata": {},
|
| 161 |
+
"outputs": [],
|
| 162 |
+
"source": [
|
| 163 |
+
"validator = df.head(1)\n",
|
| 164 |
+
"df = df.iloc[1:,]"
|
| 165 |
+
]
|
| 166 |
+
},
|
| 167 |
+
{
|
| 168 |
+
"cell_type": "code",
|
| 169 |
+
"execution_count": 7,
|
| 170 |
+
"id": "83afe4d7-b410-432d-9b1a-a2983f3858cc",
|
| 171 |
+
"metadata": {},
|
| 172 |
+
"outputs": [],
|
| 173 |
+
"source": [
|
| 174 |
+
"import datasets\n",
|
| 175 |
+
"\n",
|
| 176 |
+
"data_as_dataset = datasets.Dataset.from_pandas(df, preserve_index=False)"
|
| 177 |
+
]
|
| 178 |
+
},
|
| 179 |
+
{
|
| 180 |
+
"cell_type": "code",
|
| 181 |
+
"execution_count": 8,
|
| 182 |
+
"id": "5539d549-2306-42cc-9975-aa2fb4ecab43",
|
| 183 |
+
"metadata": {},
|
| 184 |
+
"outputs": [],
|
| 185 |
+
"source": [
|
| 186 |
+
"dd = data_as_dataset.train_test_split(test_size=0.1)"
|
| 187 |
+
]
|
| 188 |
+
},
|
| 189 |
+
{
|
| 190 |
+
"cell_type": "code",
|
| 191 |
+
"execution_count": 9,
|
| 192 |
+
"id": "e013ce63-8252-41f7-8ebb-41e926a889d6",
|
| 193 |
+
"metadata": {},
|
| 194 |
+
"outputs": [],
|
| 195 |
+
"source": [
|
| 196 |
+
"prefix = \"summarize: \"\n",
|
| 197 |
+
"\n",
|
| 198 |
+
"def preprocess_function(examples):\n",
|
| 199 |
+
" inputs = [prefix + doc for doc in examples[\"dialogue\"]]\n",
|
| 200 |
+
" model_inputs = tokenizer(inputs, max_length=1024, truncation=True)\n",
|
| 201 |
+
"\n",
|
| 202 |
+
" with tokenizer.as_target_tokenizer():\n",
|
| 203 |
+
" labels = tokenizer(examples[\"summary\"], max_length=128, truncation=True)\n",
|
| 204 |
+
"\n",
|
| 205 |
+
" model_inputs[\"labels\"] = labels[\"input_ids\"]\n",
|
| 206 |
+
" return model_inputs"
|
| 207 |
+
]
|
| 208 |
+
},
|
| 209 |
+
{
|
| 210 |
+
"cell_type": "code",
|
| 211 |
+
"execution_count": 10,
|
| 212 |
+
"id": "64f15b64-34b2-4849-9d7f-8fe712656613",
|
| 213 |
+
"metadata": {},
|
| 214 |
+
"outputs": [
|
| 215 |
+
{
|
| 216 |
+
"data": {
|
| 217 |
+
"application/vnd.jupyter.widget-view+json": {
|
| 218 |
+
"model_id": "a2ba058964994fd2b55f7c569c000f13",
|
| 219 |
+
"version_major": 2,
|
| 220 |
+
"version_minor": 0
|
| 221 |
+
},
|
| 222 |
+
"text/plain": [
|
| 223 |
+
" 0%| | 0/1 [00:00<?, ?ba/s]"
|
| 224 |
+
]
|
| 225 |
+
},
|
| 226 |
+
"metadata": {},
|
| 227 |
+
"output_type": "display_data"
|
| 228 |
+
},
|
| 229 |
+
{
|
| 230 |
+
"data": {
|
| 231 |
+
"application/vnd.jupyter.widget-view+json": {
|
| 232 |
+
"model_id": "d4a48e263d50461ba596bd5033a1e920",
|
| 233 |
+
"version_major": 2,
|
| 234 |
+
"version_minor": 0
|
| 235 |
+
},
|
| 236 |
+
"text/plain": [
|
| 237 |
+
" 0%| | 0/1 [00:00<?, ?ba/s]"
|
| 238 |
+
]
|
| 239 |
+
},
|
| 240 |
+
"metadata": {},
|
| 241 |
+
"output_type": "display_data"
|
| 242 |
+
}
|
| 243 |
+
],
|
| 244 |
+
"source": [
|
| 245 |
+
"tokenized_dd = dd.map(preprocess_function, batched=True)"
|
| 246 |
+
]
|
| 247 |
+
},
|
| 248 |
+
{
|
| 249 |
+
"cell_type": "code",
|
| 250 |
+
"execution_count": 11,
|
| 251 |
+
"id": "9b5ded79-f0ed-4c0a-8a96-ae694f232aa6",
|
| 252 |
+
"metadata": {},
|
| 253 |
+
"outputs": [],
|
| 254 |
+
"source": [
|
| 255 |
+
"from transformers import Seq2SeqTrainingArguments, Seq2SeqTrainer"
|
| 256 |
+
]
|
| 257 |
+
},
|
| 258 |
+
{
|
| 259 |
+
"cell_type": "code",
|
| 260 |
+
"execution_count": 12,
|
| 261 |
+
"id": "1f051a1f-9292-4d91-a2c3-89f1d03a0935",
|
| 262 |
+
"metadata": {},
|
| 263 |
+
"outputs": [],
|
| 264 |
+
"source": [
|
| 265 |
+
"from transformers import DataCollatorForSeq2Seq\n",
|
| 266 |
+
"\n",
|
| 267 |
+
"data_collator = DataCollatorForSeq2Seq(tokenizer=tokenizer, model=model)"
|
| 268 |
+
]
|
| 269 |
+
},
|
| 270 |
+
{
|
| 271 |
+
"cell_type": "code",
|
| 272 |
+
"execution_count": 13,
|
| 273 |
+
"id": "c04e187b-6e44-479f-911e-278733884c30",
|
| 274 |
+
"metadata": {},
|
| 275 |
+
"outputs": [],
|
| 276 |
+
"source": [
|
| 277 |
+
"from pynvml import *\n",
|
| 278 |
+
"\n",
|
| 279 |
+
"def print_gpu_utilization():\n",
|
| 280 |
+
" nvmlInit()\n",
|
| 281 |
+
" handle = nvmlDeviceGetHandleByIndex(0)\n",
|
| 282 |
+
" info = nvmlDeviceGetMemoryInfo(handle)\n",
|
| 283 |
+
" print(f\"GPU memory occupied: {info.used//1024**2} MB.\")"
|
| 284 |
+
]
|
| 285 |
+
},
|
| 286 |
+
{
|
| 287 |
+
"cell_type": "code",
|
| 288 |
+
"execution_count": 14,
|
| 289 |
+
"id": "bdbaf892-ce5d-4fba-8133-db147de51d53",
|
| 290 |
+
"metadata": {},
|
| 291 |
+
"outputs": [
|
| 292 |
+
{
|
| 293 |
+
"name": "stdout",
|
| 294 |
+
"output_type": "stream",
|
| 295 |
+
"text": [
|
| 296 |
+
"GPU memory occupied: 0 MB.\n"
|
| 297 |
+
]
|
| 298 |
+
}
|
| 299 |
+
],
|
| 300 |
+
"source": [
|
| 301 |
+
"print_gpu_utilization()"
|
| 302 |
+
]
|
| 303 |
+
},
|
| 304 |
+
{
|
| 305 |
+
"cell_type": "code",
|
| 306 |
+
"execution_count": 15,
|
| 307 |
+
"id": "ed721ebc-11c9-431d-a303-dcd1d39a6a6a",
|
| 308 |
+
"metadata": {},
|
| 309 |
+
"outputs": [
|
| 310 |
+
{
|
| 311 |
+
"name": "stderr",
|
| 312 |
+
"output_type": "stream",
|
| 313 |
+
"text": [
|
| 314 |
+
"Using amp half precision backend\n",
|
| 315 |
+
"The following columns in the training set don't have a corresponding argument in `BartForConditionalGeneration.forward` and have been ignored: id, dialogue, summary. If id, dialogue, summary are not expected by `BartForConditionalGeneration.forward`, you can safely ignore this message.\n",
|
| 316 |
+
"/opt/conda/lib/python3.7/site-packages/transformers/optimization.py:309: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning\n",
|
| 317 |
+
" FutureWarning,\n",
|
| 318 |
+
"***** Running training *****\n",
|
| 319 |
+
" Num examples = 12\n",
|
| 320 |
+
" Num Epochs = 3\n",
|
| 321 |
+
" Instantaneous batch size per device = 8\n",
|
| 322 |
+
" Total train batch size (w. parallel, distributed & accumulation) = 8\n",
|
| 323 |
+
" Gradient Accumulation steps = 1\n",
|
| 324 |
+
" Total optimization steps = 6\n"
|
| 325 |
+
]
|
| 326 |
+
},
|
| 327 |
+
{
|
| 328 |
+
"data": {
|
| 329 |
+
"text/html": [
|
| 330 |
+
"\n",
|
| 331 |
+
" <div>\n",
|
| 332 |
+
" \n",
|
| 333 |
+
" <progress value='6' max='6' style='width:300px; height:20px; vertical-align: middle;'></progress>\n",
|
| 334 |
+
" [6/6 00:02, Epoch 3/3]\n",
|
| 335 |
+
" </div>\n",
|
| 336 |
+
" <table border=\"1\" class=\"dataframe\">\n",
|
| 337 |
+
" <thead>\n",
|
| 338 |
+
" <tr style=\"text-align: left;\">\n",
|
| 339 |
+
" <th>Epoch</th>\n",
|
| 340 |
+
" <th>Training Loss</th>\n",
|
| 341 |
+
" <th>Validation Loss</th>\n",
|
| 342 |
+
" </tr>\n",
|
| 343 |
+
" </thead>\n",
|
| 344 |
+
" <tbody>\n",
|
| 345 |
+
" <tr>\n",
|
| 346 |
+
" <td>1</td>\n",
|
| 347 |
+
" <td>No log</td>\n",
|
| 348 |
+
" <td>1.015452</td>\n",
|
| 349 |
+
" </tr>\n",
|
| 350 |
+
" <tr>\n",
|
| 351 |
+
" <td>2</td>\n",
|
| 352 |
+
" <td>No log</td>\n",
|
| 353 |
+
" <td>1.071723</td>\n",
|
| 354 |
+
" </tr>\n",
|
| 355 |
+
" <tr>\n",
|
| 356 |
+
" <td>3</td>\n",
|
| 357 |
+
" <td>No log</td>\n",
|
| 358 |
+
" <td>1.088332</td>\n",
|
| 359 |
+
" </tr>\n",
|
| 360 |
+
" </tbody>\n",
|
| 361 |
+
"</table><p>"
|
| 362 |
+
],
|
| 363 |
+
"text/plain": [
|
| 364 |
+
"<IPython.core.display.HTML object>"
|
| 365 |
+
]
|
| 366 |
+
},
|
| 367 |
+
"metadata": {},
|
| 368 |
+
"output_type": "display_data"
|
| 369 |
+
},
|
| 370 |
+
{
|
| 371 |
+
"name": "stderr",
|
| 372 |
+
"output_type": "stream",
|
| 373 |
+
"text": [
|
| 374 |
+
"The following columns in the evaluation set don't have a corresponding argument in `BartForConditionalGeneration.forward` and have been ignored: id, dialogue, summary. If id, dialogue, summary are not expected by `BartForConditionalGeneration.forward`, you can safely ignore this message.\n",
|
| 375 |
+
"***** Running Evaluation *****\n",
|
| 376 |
+
" Num examples = 2\n",
|
| 377 |
+
" Batch size = 8\n",
|
| 378 |
+
"The following columns in the evaluation set don't have a corresponding argument in `BartForConditionalGeneration.forward` and have been ignored: id, dialogue, summary. If id, dialogue, summary are not expected by `BartForConditionalGeneration.forward`, you can safely ignore this message.\n",
|
| 379 |
+
"***** Running Evaluation *****\n",
|
| 380 |
+
" Num examples = 2\n",
|
| 381 |
+
" Batch size = 8\n",
|
| 382 |
+
"The following columns in the evaluation set don't have a corresponding argument in `BartForConditionalGeneration.forward` and have been ignored: id, dialogue, summary. If id, dialogue, summary are not expected by `BartForConditionalGeneration.forward`, you can safely ignore this message.\n",
|
| 383 |
+
"***** Running Evaluation *****\n",
|
| 384 |
+
" Num examples = 2\n",
|
| 385 |
+
" Batch size = 8\n",
|
| 386 |
+
"\n",
|
| 387 |
+
"\n",
|
| 388 |
+
"Training completed. Do not forget to share your model on huggingface.co/models =)\n",
|
| 389 |
+
"\n",
|
| 390 |
+
"\n"
|
| 391 |
+
]
|
| 392 |
+
},
|
| 393 |
+
{
|
| 394 |
+
"data": {
|
| 395 |
+
"text/plain": [
|
| 396 |
+
"TrainOutput(global_step=6, training_loss=1.0018877983093262, metrics={'train_runtime': 4.8179, 'train_samples_per_second': 7.472, 'train_steps_per_second': 1.245, 'total_flos': 22551432265728.0, 'train_loss': 1.0018877983093262, 'epoch': 3.0})"
|
| 397 |
+
]
|
| 398 |
+
},
|
| 399 |
+
"execution_count": 15,
|
| 400 |
+
"metadata": {},
|
| 401 |
+
"output_type": "execute_result"
|
| 402 |
+
}
|
| 403 |
+
],
|
| 404 |
+
"source": [
|
| 405 |
+
"training_args = Seq2SeqTrainingArguments(\n",
|
| 406 |
+
" output_dir=\"./results\",\n",
|
| 407 |
+
" evaluation_strategy=\"epoch\",\n",
|
| 408 |
+
" learning_rate=2e-5,\n",
|
| 409 |
+
" per_device_train_batch_size=8,\n",
|
| 410 |
+
" per_device_eval_batch_size=8,\n",
|
| 411 |
+
" weight_decay=0.01,\n",
|
| 412 |
+
" save_total_limit=3,\n",
|
| 413 |
+
" num_train_epochs=3,\n",
|
| 414 |
+
" fp16=True,\n",
|
| 415 |
+
")\n",
|
| 416 |
+
"\n",
|
| 417 |
+
"trainer = Seq2SeqTrainer(\n",
|
| 418 |
+
" model=model,\n",
|
| 419 |
+
" args=training_args,\n",
|
| 420 |
+
" train_dataset=tokenized_dd[\"train\"],\n",
|
| 421 |
+
" eval_dataset=tokenized_dd[\"test\"],\n",
|
| 422 |
+
" tokenizer=tokenizer,\n",
|
| 423 |
+
" data_collator=data_collator,\n",
|
| 424 |
+
")\n",
|
| 425 |
+
"\n",
|
| 426 |
+
"trainer.train()"
|
| 427 |
+
]
|
| 428 |
+
},
|
| 429 |
+
{
|
| 430 |
+
"cell_type": "code",
|
| 431 |
+
"execution_count": 17,
|
| 432 |
+
"id": "eece8dd8-29ab-45cf-9a3e-7cded880f385",
|
| 433 |
+
"metadata": {},
|
| 434 |
+
"outputs": [
|
| 435 |
+
{
|
| 436 |
+
"data": {
|
| 437 |
+
"application/vnd.jupyter.widget-view+json": {
|
| 438 |
+
"model_id": "784ecc7b7797495f95651cd2e61a7c3d",
|
| 439 |
+
"version_major": 2,
|
| 440 |
+
"version_minor": 0
|
| 441 |
+
},
|
| 442 |
+
"text/plain": [
|
| 443 |
+
"VBox(children=(HTML(value='<center>\\n<img src=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…"
|
| 444 |
+
]
|
| 445 |
+
},
|
| 446 |
+
"metadata": {},
|
| 447 |
+
"output_type": "display_data"
|
| 448 |
+
}
|
| 449 |
+
],
|
| 450 |
+
"source": [
|
| 451 |
+
"from huggingface_hub import notebook_login\n",
|
| 452 |
+
"\n",
|
| 453 |
+
"notebook_login()"
|
| 454 |
+
]
|
| 455 |
+
},
|
| 456 |
+
{
|
| 457 |
+
"cell_type": "code",
|
| 458 |
+
"execution_count": 18,
|
| 459 |
+
"id": "e4a7de93-5acc-4173-b6fe-98ae15e2373a",
|
| 460 |
+
"metadata": {},
|
| 461 |
+
"outputs": [
|
| 462 |
+
{
|
| 463 |
+
"name": "stdout",
|
| 464 |
+
"output_type": "stream",
|
| 465 |
+
"text": [
|
| 466 |
+
"huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n",
|
| 467 |
+
"To disable this warning, you can either:\n",
|
| 468 |
+
"\t- Avoid using `tokenizers` before the fork if possible\n",
|
| 469 |
+
"\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n",
|
| 470 |
+
"huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n",
|
| 471 |
+
"To disable this warning, you can either:\n",
|
| 472 |
+
"\t- Avoid using `tokenizers` before the fork if possible\n",
|
| 473 |
+
"\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n",
|
| 474 |
+
"huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n",
|
| 475 |
+
"To disable this warning, you can either:\n",
|
| 476 |
+
"\t- Avoid using `tokenizers` before the fork if possible\n",
|
| 477 |
+
"\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n",
|
| 478 |
+
"huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n",
|
| 479 |
+
"To disable this warning, you can either:\n",
|
| 480 |
+
"\t- Avoid using `tokenizers` before the fork if possible\n",
|
| 481 |
+
"\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n"
|
| 482 |
+
]
|
| 483 |
+
},
|
| 484 |
+
{
|
| 485 |
+
"ename": "OSError",
|
| 486 |
+
"evalue": "Tried to clone a repository in a non-empty folder that isn't a git repository. If you really want to do this, do it manually:\ngit init && git remote add origin && git pull origin main\n or clone repo to a new folder and move your existing files there afterwards.",
|
| 487 |
+
"output_type": "error",
|
| 488 |
+
"traceback": [
|
| 489 |
+
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
|
| 490 |
+
"\u001b[0;31mOSError\u001b[0m Traceback (most recent call last)",
|
| 491 |
+
"\u001b[0;32m/tmp/ipykernel_1257/1405518398.py\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mtrainer\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mpush_to_hub\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
|
| 492 |
+
"\u001b[0;32m/opt/conda/lib/python3.7/site-packages/transformers/trainer.py\u001b[0m in \u001b[0;36mpush_to_hub\u001b[0;34m(self, commit_message, blocking, **kwargs)\u001b[0m\n\u001b[1;32m 2827\u001b[0m \u001b[0;31m# it might fail.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2828\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0mhasattr\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m\"repo\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 2829\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0minit_git_repo\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2830\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2831\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mshould_save\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
|
| 493 |
+
"\u001b[0;32m/opt/conda/lib/python3.7/site-packages/transformers/trainer.py\u001b[0m in \u001b[0;36minit_git_repo\u001b[0;34m(self, at_init)\u001b[0m\n\u001b[1;32m 2709\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0moutput_dir\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2710\u001b[0m \u001b[0mclone_from\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mrepo_name\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 2711\u001b[0;31m \u001b[0muse_auth_token\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0muse_auth_token\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2712\u001b[0m )\n\u001b[1;32m 2713\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0mEnvironmentError\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
|
| 494 |
+
"\u001b[0;32m/opt/conda/lib/python3.7/site-packages/huggingface_hub/repository.py\u001b[0m in \u001b[0;36m__init__\u001b[0;34m(self, local_dir, clone_from, repo_type, use_auth_token, git_user, git_email, revision, private, skip_lfs_files)\u001b[0m\n\u001b[1;32m 419\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 420\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mclone_from\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 421\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mclone_from\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mrepo_url\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mclone_from\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 422\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 423\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mis_git_repo\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mlocal_dir\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
|
| 495 |
+
"\u001b[0;32m/opt/conda/lib/python3.7/site-packages/huggingface_hub/repository.py\u001b[0m in \u001b[0;36mclone_from\u001b[0;34m(self, repo_url, use_auth_token)\u001b[0m\n\u001b[1;32m 620\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0min_repository\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 621\u001b[0m raise EnvironmentError(\n\u001b[0;32m--> 622\u001b[0;31m \u001b[0;34m\"Tried to clone a repository in a non-empty folder that isn't a git repository. If you really \"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 623\u001b[0m \u001b[0;34m\"want to do this, do it manually:\\n\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 624\u001b[0m \u001b[0;34m\"git init && git remote add origin && git pull origin main\\n\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
|
| 496 |
+
"\u001b[0;31mOSError\u001b[0m: Tried to clone a repository in a non-empty folder that isn't a git repository. If you really want to do this, do it manually:\ngit init && git remote add origin && git pull origin main\n or clone repo to a new folder and move your existing files there afterwards."
|
| 497 |
+
]
|
| 498 |
+
}
|
| 499 |
+
],
|
| 500 |
+
"source": [
|
| 501 |
+
"trainer.push_to_hub()"
|
| 502 |
+
]
|
| 503 |
+
},
|
| 504 |
+
{
|
| 505 |
+
"cell_type": "code",
|
| 506 |
+
"execution_count": null,
|
| 507 |
+
"id": "53e539de-637d-4127-99c5-4f3dab3ab286",
|
| 508 |
+
"metadata": {},
|
| 509 |
+
"outputs": [],
|
| 510 |
+
"source": []
|
| 511 |
+
}
|
| 512 |
+
],
|
| 513 |
+
"metadata": {
|
| 514 |
+
"environment": {
|
| 515 |
+
"kernel": "python3",
|
| 516 |
+
"name": "pytorch-gpu.1-10.m87",
|
| 517 |
+
"type": "gcloud",
|
| 518 |
+
"uri": "gcr.io/deeplearning-platform-release/pytorch-gpu.1-10:m87"
|
| 519 |
+
},
|
| 520 |
+
"kernelspec": {
|
| 521 |
+
"display_name": "Python 3",
|
| 522 |
+
"language": "python",
|
| 523 |
+
"name": "python3"
|
| 524 |
+
},
|
| 525 |
+
"language_info": {
|
| 526 |
+
"codemirror_mode": {
|
| 527 |
+
"name": "ipython",
|
| 528 |
+
"version": 3
|
| 529 |
+
},
|
| 530 |
+
"file_extension": ".py",
|
| 531 |
+
"mimetype": "text/x-python",
|
| 532 |
+
"name": "python",
|
| 533 |
+
"nbconvert_exporter": "python",
|
| 534 |
+
"pygments_lexer": "ipython3",
|
| 535 |
+
"version": "3.7.12"
|
| 536 |
+
}
|
| 537 |
+
},
|
| 538 |
+
"nbformat": 4,
|
| 539 |
+
"nbformat_minor": 5
|
| 540 |
+
}
|
test.json
ADDED
|
@@ -0,0 +1,27 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
[
|
| 2 |
+
{
|
| 3 |
+
"id": "13862856",
|
| 4 |
+
"summary": "Hannah needs Betty's number but Amanda doesn't have it. She needs to contact Larry.",
|
| 5 |
+
"dialogue": "Hannah: Hey, do you have Betty's number?\nAmanda: Lemme check\nHannah: <file_gif>\nAmanda: Sorry, can't find it.\nAmanda: Ask Larry\nAmanda: He called her last time we were at the park together\nHannah: I don't know him well\nHannah: <file_gif>\nAmanda: Don't be shy, he's very nice\nHannah: If you say so..\nHannah: I'd rather you texted him\nAmanda: Just text him 🙂\nHannah: Urgh.. Alright\nHannah: Bye\nAmanda: Bye bye"
|
| 6 |
+
},
|
| 7 |
+
{
|
| 8 |
+
"id": "13729565",
|
| 9 |
+
"summary": "Eric and Rob are going to watch a stand-up on youtube.",
|
| 10 |
+
"dialogue": "Eric: MACHINE!\r\nRob: That's so gr8!\r\nEric: I know! And shows how Americans see Russian ;)\r\nRob: And it's really funny!\r\nEric: I know! I especially like the train part!\r\nRob: Hahaha! No one talks to the machine like that!\r\nEric: Is this his only stand-up?\r\nRob: Idk. I'll check.\r\nEric: Sure.\r\nRob: Turns out no! There are some of his stand-ups on youtube.\r\nEric: Gr8! I'll watch them now!\r\nRob: Me too!\r\nEric: MACHINE!\r\nRob: MACHINE!\r\nEric: TTYL?\r\nRob: Sure :)"
|
| 11 |
+
},
|
| 12 |
+
{
|
| 13 |
+
"id": "13680171",
|
| 14 |
+
"summary": "Lenny can't decide which trousers to buy. Bob advised Lenny on that topic. Lenny goes with Bob's advice to pick the trousers that are of best quality.",
|
| 15 |
+
"dialogue": "Lenny: Babe, can you help me with something?\r\nBob: Sure, what's up?\r\nLenny: Which one should I pick?\r\nBob: Send me photos\r\nLenny: <file_photo>\r\nLenny: <file_photo>\r\nLenny: <file_photo>\r\nBob: I like the first ones best\r\nLenny: But I already have purple trousers. Does it make sense to have two pairs?\r\nBob: I have four black pairs :D :D\r\nLenny: yeah, but shouldn't I pick a different color?\r\nBob: what matters is what you'll give you the most outfit options\r\nLenny: So I guess I'll buy the first or the third pair then\r\nBob: Pick the best quality then\r\nLenny: ur right, thx\r\nBob: no prob :)"
|
| 16 |
+
},
|
| 17 |
+
{
|
| 18 |
+
"id": "13729438",
|
| 19 |
+
"summary": "Emma will be home soon and she will let Will know.",
|
| 20 |
+
"dialogue": "Will: hey babe, what do you want for dinner tonight?\r\nEmma: gah, don't even worry about it tonight\r\nWill: what do you mean? everything ok?\r\nEmma: not really, but it's ok, don't worry about cooking though, I'm not hungry\r\nWill: Well what time will you be home?\r\nEmma: soon, hopefully\r\nWill: you sure? Maybe you want me to pick you up?\r\nEmma: no no it's alright. I'll be home soon, i'll tell you when I get home. \r\nWill: Alright, love you. \r\nEmma: love you too. "
|
| 21 |
+
},
|
| 22 |
+
{
|
| 23 |
+
"id": "13828600",
|
| 24 |
+
"summary": "Jane is in Warsaw. Ollie and Jane has a party. Jane lost her calendar. They will get a lunch this week on Friday. Ollie accidentally called Jane and talked about whisky. Jane cancels lunch. They'll meet for a tea at 6 pm.",
|
| 25 |
+
"dialogue": "Ollie: Hi , are you in Warsaw\r\nJane: yes, just back! Btw are you free for diner the 19th?\r\nOllie: nope!\r\nJane: and the 18th?\r\nOllie: nope, we have this party and you must be there, remember?\r\nJane: oh right! i lost my calendar.. thanks for reminding me\r\nOllie: we have lunch this week?\r\nJane: with pleasure!\r\nOllie: friday?\r\nJane: ok\r\nJane: what do you mean \" we don't have any more whisky!\" lol..\r\nOllie: what!!!\r\nJane: you just call me and the all thing i heard was that sentence about whisky... what's wrong with you?\r\nOllie: oh oh... very strange! i have to be carefull may be there is some spy in my mobile! lol\r\nJane: dont' worry, we'll check on friday.\r\nOllie: don't forget to bring some sun with you\r\nJane: I can't wait to be in Morocco..\r\nOllie: enjoy and see you friday\r\nJane: sorry Ollie, i'm very busy, i won't have time for lunch tomorrow, but may be at 6pm after my courses?this trip to Morocco was so nice, but time consuming!\r\nOllie: ok for tea!\r\nJane: I'm on my way..\r\nOllie: tea is ready, did you bring the pastries?\r\nJane: I already ate them all... see you in a minute\r\nOllie: ok"
|
| 26 |
+
}
|
| 27 |
+
]
|
train.json
ADDED
|
@@ -0,0 +1,27 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
[
|
| 2 |
+
{
|
| 3 |
+
"id": "13818513",
|
| 4 |
+
"summary": "Amanda baked cookies and will bring Jerry some tomorrow.",
|
| 5 |
+
"dialogue": "Amanda: I baked cookies. Do you want some?\r\nJerry: Sure!\r\nAmanda: I'll bring you tomorrow :-)"
|
| 6 |
+
},
|
| 7 |
+
{
|
| 8 |
+
"id": "13728867",
|
| 9 |
+
"summary": "Olivia and Olivier are voting for liberals in this election. ",
|
| 10 |
+
"dialogue": "Olivia: Who are you voting for in this election? \r\nOliver: Liberals as always.\r\nOlivia: Me too!!\r\nOliver: Great"
|
| 11 |
+
},
|
| 12 |
+
{
|
| 13 |
+
"id": "13681000",
|
| 14 |
+
"summary": "Kim may try the pomodoro technique recommended by Tim to get more stuff done.",
|
| 15 |
+
"dialogue": "Tim: Hi, what's up?\r\nKim: Bad mood tbh, I was going to do lots of stuff but ended up procrastinating\r\nTim: What did you plan on doing?\r\nKim: Oh you know, uni stuff and unfucking my room\r\nKim: Maybe tomorrow I'll move my ass and do everything\r\nKim: We were going to defrost a fridge so instead of shopping I'll eat some defrosted veggies\r\nTim: For doing stuff I recommend Pomodoro technique where u use breaks for doing chores\r\nTim: It really helps\r\nKim: thanks, maybe I'll do that\r\nTim: I also like using post-its in kaban style"
|
| 16 |
+
},
|
| 17 |
+
{
|
| 18 |
+
"id": "13730747",
|
| 19 |
+
"summary": "Edward thinks he is in love with Bella. Rachel wants Edward to open his door. Rachel is outside. ",
|
| 20 |
+
"dialogue": "Edward: Rachel, I think I'm in ove with Bella..\r\nrachel: Dont say anything else..\r\nEdward: What do you mean??\r\nrachel: Open your fu**ing door.. I'm outside"
|
| 21 |
+
},
|
| 22 |
+
{
|
| 23 |
+
"id": "13728094",
|
| 24 |
+
"summary": "Sam is confused, because he overheard Rick complaining about him as a roommate. Naomi thinks Sam should talk to Rick. Sam is not sure what to do.",
|
| 25 |
+
"dialogue": "Sam: hey overheard rick say something\r\nSam: i don't know what to do :-/\r\nNaomi: what did he say??\r\nSam: he was talking on the phone with someone\r\nSam: i don't know who\r\nSam: and he was telling them that he wasn't very happy here\r\nNaomi: damn!!!\r\nSam: he was saying he doesn't like being my roommate\r\nNaomi: wow, how do you feel about it?\r\nSam: i thought i was a good rommate\r\nSam: and that we have a nice place\r\nNaomi: that's true man!!!\r\nNaomi: i used to love living with you before i moved in with me boyfriend\r\nNaomi: i don't know why he's saying that\r\nSam: what should i do???\r\nNaomi: honestly if it's bothering you that much you should talk to him\r\nNaomi: see what's going on\r\nSam: i don't want to get in any kind of confrontation though\r\nSam: maybe i'll just let it go\r\nSam: and see how it goes in the future\r\nNaomi: it's your choice sam\r\nNaomi: if i were you i would just talk to him and clear the air"
|
| 26 |
+
}
|
| 27 |
+
]
|
val.json
ADDED
|
@@ -0,0 +1,27 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
[
|
| 2 |
+
{
|
| 3 |
+
"id": "13817023",
|
| 4 |
+
"summary": "A will go to the animal shelter tomorrow to get a puppy for her son. They already visited the shelter last Monday and the son chose the puppy. ",
|
| 5 |
+
"dialogue": "A: Hi Tom, are you busy tomorrow’s afternoon?\r\nB: I’m pretty sure I am. What’s up?\r\nA: Can you go with me to the animal shelter?.\r\nB: What do you want to do?\r\nA: I want to get a puppy for my son.\r\nB: That will make him so happy.\r\nA: Yeah, we’ve discussed it many times. I think he’s ready now.\r\nB: That’s good. Raising a dog is a tough issue. Like having a baby ;-) \r\nA: I'll get him one of those little dogs.\r\nB: One that won't grow up too big;-)\r\nA: And eat too much;-))\r\nB: Do you know which one he would like?\r\nA: Oh, yes, I took him there last Monday. He showed me one that he really liked.\r\nB: I bet you had to drag him away.\r\nA: He wanted to take it home right away ;-).\r\nB: I wonder what he'll name it.\r\nA: He said he’d name it after his dead hamster – Lemmy - he's a great Motorhead fan :-)))"
|
| 6 |
+
},
|
| 7 |
+
{
|
| 8 |
+
"id": "13716628",
|
| 9 |
+
"summary": "Emma and Rob love the advent calendar. Lauren fits inside calendar various items, for instance, small toys and Christmas decorations. Her children are excited whenever they get the calendar.",
|
| 10 |
+
"dialogue": "Emma: I’ve just fallen in love with this advent calendar! Awesome! I wanna one for my kids!\r\nRob: I used to get one every year as a child! Loved them! \r\nEmma: Yeah, i remember! they were filled with chocolates!\r\nLauren: they are different these days! much more sophisticated! Haha!\r\nRob: yeah, they can be fabric/ wooden, shop bought/ homemade, filled with various stuff\r\nEmma: what do you fit inside?\r\nLauren: small toys, Christmas decorations, creative stuff, hair bands & clips, stickers, pencils & rubbers, small puzzles, sweets\r\nEmma: WOW! That’s brill! X\r\nLauren: i add one more very special thing as well- little notes asking my children to do something nice for someone else\r\nRob: i like that! My sister adds notes asking her kids questions about christmas such as What did the 3 wise men bring? etc\r\nLauren: i reckon it prepares them for Christmas \r\nEmma: and makes it more about traditions and being kind to other people\r\nLauren: my children get very excited every time they get one!\r\nEmma: i can see why! :)"
|
| 11 |
+
},
|
| 12 |
+
{
|
| 13 |
+
"id": "13829420",
|
| 14 |
+
"summary": "Madison is pregnant but she doesn't want to talk about it. Patricia Stevens got married and she thought she was pregnant. ",
|
| 15 |
+
"dialogue": "Jackie: Madison is pregnant\r\nJackie: but she doesn't wanna talk about it\r\nIggy: why\r\nJackie: I don't know why because she doesn't wanna talk about it\r\nIggy: ok\r\nJackie: I wanted to prepare you for it because people get super excited and ask lots of questions\r\nJackie: and she looked way more anxious than excited\r\nIggy: she's probably worrying about it\r\nIggy: she's taking every commitment really seriously\r\nJackie: it could be money problems or relationship problems\r\nIggy: or maybe she wants an abortion\r\nJackie: it could be all of the above\r\nIggy: but you know what?\r\nIggy: once my friend was pregnant and I couldn't bring myself to be happy about it\r\nJackie: why?\r\nIggy: I felt they were immature and I couldn't picture this couple as parents\r\nJackie: I felt similar way on Patricia's wedding\r\nIggy: Patricia Stevens?\r\nJackie: yes\r\nIggy: so we're talking about the same person\r\nJackie: what a coincidence\r\nJackie: so she's pregnant?\r\nIggy: she thought she was\r\nJackie: damn..."
|
| 16 |
+
},
|
| 17 |
+
{
|
| 18 |
+
"id": "13819648",
|
| 19 |
+
"summary": "Marla found a pair of boxers under her bed.",
|
| 20 |
+
"dialogue": "Marla: <file_photo>\r\nMarla: look what I found under my bed\r\nKiki: lol\r\nTamara: is that someone's underwear?\r\nMarla: it certainly isn't mine, my ass is big but it isn't huge\r\nKiki: it looks like male underwear\r\nTamara: not necessarily, maybe some butch had fun in your room while you were gone\r\nMarla: ok but how can you leave your underwear after hooking up? wtf is wrong with people\r\nKiki: she or he could be too wasted to notice\r\nTamara: or maybe someone put their pants there to piss you off\r\nMarla: that makes no sense\r\nMarla: it's so fucking childish\r\nKiki: if it's childish then it must have been your sister's idea\r\nMarla: she's 13, she doesn't have underwear that isn't pink\r\nTamara: maybe it belonged to one of your exes?\r\nKiki: she would have recognized it\r\nMarla: lol we're doing total CSI investigation on one pair of boxers :D\r\nKiki: <file_gif>\r\nTamara: lol\r\nTamara: I think your sister convinced someone to put their underwear in your room as a dare\r\nMarla: sounds legit\r\nKiki: Tamara, you just cracked the case!\r\nTamara: <file_gif>\r\nTamara: always happy to help"
|
| 21 |
+
},
|
| 22 |
+
{
|
| 23 |
+
"id": "13728448",
|
| 24 |
+
"summary": "Robert wants Fred to send him the address of the music shop as he needs to buy guitar cable.",
|
| 25 |
+
"dialogue": "Robert: Hey give me the address of this music shop you mentioned before\r\nRobert: I have to buy guitar cable\r\nFred: <file_other>\r\nFred: Catch it on google maps\r\nRobert: thx m8\r\nFred: ur welcome"
|
| 26 |
+
}
|
| 27 |
+
]
|