Spaces:

nuriyev
/

text2mcdm

Paused

nuriyev commited on Dec 21, 2025

Commit

791e77b

1 Parent(s): 9ce26a7

organize data in the data/ folder. use dedicated test set in finetune notebook

Files changed (4) hide show

test.jsonl → data/test.jsonl RENAMED Viewed

File without changes

train.jsonl → data/train.jsonl RENAMED Viewed

File without changes

val.jsonl → data/val.jsonl RENAMED Viewed

File without changes

qwen3_finetune.ipynb CHANGED Viewed

@@ -61,8 +61,9 @@
       "outputs": [],
       "source": [
         "\n",
-        "TRAIN_FILE = \"train.jsonl\"\n",
-        "VAL_FILE = \"val.jsonl\"\n",
         "MODEL_NAME = \"Qwen/Qwen3-4B-Instruct-2507\"\n",
         "HF_TOKEN = \"...\""
       ]
@@ -321,6 +322,16 @@
         "val_dataset = val_dataset.map(formatting_prompts_func, batched = True,)"
       ]
     },
     {
       "cell_type": "markdown",
       "metadata": {
@@ -824,16 +835,7 @@
     },
     {
       "cell_type": "code",
-      "execution_count": 18,
-      "metadata": {},
-      "outputs": [],
-      "source": [
-        "val_dataset = load_data('./val.jsonl')"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": 19,
       "metadata": {},
       "outputs": [
         {
@@ -851,7 +853,7 @@
         }
       ],
       "source": [
-        "messages = val_dataset[1][\"messages\"][:2]\n",
         "messages"
       ]
     },

       "outputs": [],
       "source": [
         "\n",
+        "TRAIN_FILE = \"data/train.jsonl\"\n",
+        "VAL_FILE = \"data/val.jsonl\"\n",
+        "TEST_FILE = \"data/test.jsonl\"\n",
         "MODEL_NAME = \"Qwen/Qwen3-4B-Instruct-2507\"\n",
         "HF_TOKEN = \"...\""
       ]
         "val_dataset = val_dataset.map(formatting_prompts_func, batched = True,)"
       ]
     },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "test_dataset = load_data(TEST_FILE)\n",
+        "test_dataset = test_dataset.map(formatting_prompts_func, batched = True,)"
+      ]
+    },
     {
       "cell_type": "markdown",
       "metadata": {
     },
     {
       "cell_type": "code",
+      "execution_count": null,
       "metadata": {},
       "outputs": [
         {
         }
       ],
       "source": [
+        "messages = test_dataset[1][\"messages\"][:2]\n",
         "messages"
       ]
     },