Spaces:

Pujan-Dev
/

AI_API

Sleeping

App Files Files Community

Pujan-Dev commited on Apr 11

Commit

5298fcc

1 Parent(s): 33fb2d7

Added documentation

Browse files

Files changed (3) hide show

notebook/ai_vs_human/main.ipynb +1110 -0
notebook/ai_vs_human/mainv2.ipynb +1170 -0
notebook/ai_vs_human_nepali/notebook/documentation.md +435 -0

notebook/ai_vs_human/main.ipynb ADDED Viewed

	@@ -0,0 +1,1110 @@

+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "e522047b",
+   "metadata": {},
+   "source": [
+    "# AI vs Human Text Detector using BERT\n",
+    "Using google-bert/bert-base-cased with HC3 dataset or local data (~20k samples)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 35,
+   "id": "16eddd36",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from functools import partial\n",
+    "\n",
+    "import datasets\n",
+    "from datasets import Dataset, DatasetDict, concatenate_datasets\n",
+    "import evaluate\n",
+    "import numpy as np\n",
+    "import torch\n",
+    "from transformers import (\n",
+    "    AutoModelForSequenceClassification,\n",
+    "    AutoTokenizer,\n",
+    "    PreTrainedTokenizer,\n",
+    "    BatchEncoding,\n",
+    "    DataCollatorWithPadding,\n",
+    "    Trainer,\n",
+    "    TrainingArguments,\n",
+    ")\n",
+    "from peft import LoraConfig, get_peft_model"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "99bca750",
+   "metadata": {},
+   "source": [
+    "## Load AI Detection Dataset (~20k samples)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 36,
+   "id": "2945f87a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def get_raid_dataset(max_samples: int = 20000, use_local: bool = True) -> DatasetDict:\n",
+    "    \"\"\"Load AI detection dataset and limit to ~20k samples\"\"\"\n",
+    "    \n",
+    "    print(\"Loading AI vs Human text dataset...\")\n",
+    "    \n",
+    "    all_texts = []\n",
+    "    all_labels = []\n",
+    "    \n",
+    "    # Try loading HC3 dataset (Human ChatGPT Comparison Corpus)\n",
+    "    try:\n",
+    "        print(\"Attempting to load HC3 dataset...\")\n",
+    "        dataset = datasets.load_dataset(\"Hello-SimpleAI/HC3\", \"all\", split=\"train\")\n",
+    "        \n",
+    "        # HC3 format: has 'question', 'human_answers', 'chatgpt_answers'\n",
+    "        for item in dataset:\n",
+    "            # Add human answers\n",
+    "            if 'human_answers' in item and item['human_answers']:\n",
+    "                for answer in item['human_answers'][:1]:  # Take first answer\n",
+    "                    if answer and len(answer.strip()) > 0:\n",
+    "                        all_texts.append(answer)\n",
+    "                        all_labels.append(0)  # 0 for human\n",
+    "            \n",
+    "            # Add AI answers\n",
+    "            if 'chatgpt_answers' in item and item['chatgpt_answers']:\n",
+    "                for answer in item['chatgpt_answers'][:1]:  # Take first answer\n",
+    "                    if answer and len(answer.strip()) > 0:\n",
+    "                        all_texts.append(answer)\n",
+    "                        all_labels.append(1)  # 1 for AI\n",
+    "        \n",
+    "        print(f\"✓ Loaded {len(all_texts)} samples from HC3 dataset\")\n",
+    "    except Exception as e:\n",
+    "        print(f\"⚠ Could not load HC3 dataset: {e}\")\n",
+    "    \n",
+    "    # Load local data and combine\n",
+    "    if use_local:\n",
+    "        try:\n",
+    "            print(\"Loading local dataset...\")\n",
+    "            import pandas as pd\n",
+    "            df = pd.read_json(\"./DATASET/basic_Data.jsonl\", lines=True)\n",
+    "            \n",
+    "            # Build a proper binary classification dataset: human_text -> 0, ai_text -> 1\n",
+    "            if {\"human_text\", \"ai_text\"}.issubset(df.columns):\n",
+    "                local_texts = list(df[\"human_text\"].dropna()) + list(df[\"ai_text\"].dropna())\n",
+    "                local_labels = [0] * len(df[\"human_text\"].dropna()) + [1] * len(df[\"ai_text\"].dropna())\n",
+    "                \n",
+    "                all_texts.extend(local_texts)\n",
+    "                all_labels.extend(local_labels)\n",
+    "                \n",
+    "                print(f\"✓ Loaded {len(local_texts)} samples from local data\")\n",
+    "            else:\n",
+    "                print(\"⚠ Local dataset doesn't have required columns\")\n",
+    "        except Exception as e:\n",
+    "            print(f\"⚠ Could not load local dataset: {e}\")\n",
+    "    \n",
+    "    # Check if we have any data\n",
+    "    if len(all_texts) == 0:\n",
+    "        raise ValueError(\"No data loaded! Check HC3 dataset or local data availability\")\n",
+    "    \n",
+    "    # Create combined dataset\n",
+    "    combined_dataset = Dataset.from_dict({\n",
+    "        \"text\": all_texts,\n",
+    "        \"label\": all_labels\n",
+    "    })\n",
+    "    \n",
+    "    print(f\"Total combined samples: {len(combined_dataset)}\")\n",
+    "    \n",
+    "    # Shuffle and limit to max_samples\n",
+    "    combined_dataset = combined_dataset.shuffle(seed=42)\n",
+    "    if len(combined_dataset) > max_samples:\n",
+    "        combined_dataset = combined_dataset.select(range(max_samples))\n",
+    "        print(f\"Limited to {max_samples} samples\")\n",
+    "    \n",
+    "    # Filter out empty texts\n",
+    "    combined_dataset = combined_dataset.filter(lambda x: x['text'] is not None and len(x['text'].strip()) > 0)\n",
+    "    \n",
+    "    # Split into train/test (95/5 split)\n",
+    "    dataset_split = combined_dataset.train_test_split(test_size=0.05, seed=42)\n",
+    "    \n",
+    "    print(f\"\\n✓ Dataset ready!\")\n",
+    "    print(f\"  Train samples: {len(dataset_split['train'])}\")\n",
+    "    print(f\"  Test samples: {len(dataset_split['test'])}\")\n",
+    "    \n",
+    "    # Check label distribution\n",
+    "    import numpy as np\n",
+    "    train_labels = np.array(dataset_split['train']['label'])\n",
+    "    print(f\"  Label distribution (train):\")\n",
+    "    print(f\"    Human (0): {(train_labels == 0).sum()}\")\n",
+    "    print(f\"    AI (1): {(train_labels == 1).sum()}\")\n",
+    "    \n",
+    "    return dataset_split"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 37,
+   "id": "38d8478c",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Loading AI vs Human text dataset...\n",
+      "Attempting to load HC3 dataset...\n",
+      "⚠ Could not load HC3 dataset: Dataset scripts are no longer supported, but found HC3.py\n",
+      "Loading local dataset...\n",
+      "✓ Loaded 19940 samples from local data\n",
+      "Total combined samples: 19940\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "Filter: 100%|██████████| 19940/19940 [00:00<00:00, 95584.60 examples/s] \n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\n",
+      "✓ Dataset ready!\n",
+      "  Train samples: 18943\n",
+      "  Test samples: 997\n",
+      "  Label distribution (train):\n",
+      "    Human (0): 9477\n",
+      "    AI (1): 9466\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Load dataset\n",
+    "raw_datasets = get_raid_dataset(max_samples=20000)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f60191f6",
+   "metadata": {},
+   "source": [
+    "## Initialize Model and Tokenizer"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 38,
+   "id": "315bb737",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "Loading weights: 100%|██████████| 199/199 [00:00<00:00, 1208.24it/s, Materializing param=bert.pooler.dense.weight]                               \n",
+      "BertForSequenceClassification LOAD REPORT from: google-bert/bert-base-cased\n",
+      "Key                                        | Status     | \n",
+      "-------------------------------------------+------------+-\n",
+      "cls.predictions.transform.LayerNorm.bias   | UNEXPECTED | \n",
+      "cls.seq_relationship.weight                | UNEXPECTED | \n",
+      "cls.predictions.transform.dense.weight     | UNEXPECTED | \n",
+      "cls.seq_relationship.bias                  | UNEXPECTED | \n",
+      "cls.predictions.bias                       | UNEXPECTED | \n",
+      "cls.predictions.transform.dense.bias       | UNEXPECTED | \n",
+      "cls.predictions.transform.LayerNorm.weight | UNEXPECTED | \n",
+      "classifier.weight                          | MISSING    | \n",
+      "classifier.bias                            | MISSING    | \n",
+      "\n",
+      "Notes:\n",
+      "- UNEXPECTED\t:can be ignored when loading from different task/architecture; not ok if you expect identical arch.\n",
+      "- MISSING\t:those params were newly initialized because missing from the checkpoint. Consider training on your downstream task.\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Model loaded: google-bert/bert-base-cased\n",
+      "Device: cuda\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Use google-bert/bert-base-cased\n",
+    "base_model_name = \"google-bert/bert-base-cased\"\n",
+    "\n",
+    "tokenizer = AutoTokenizer.from_pretrained(base_model_name)\n",
+    "model = AutoModelForSequenceClassification.from_pretrained(\n",
+    "    base_model_name,\n",
+    "    num_labels=2,\n",
+    ").to(device='cuda' if torch.cuda.is_available() else 'cpu')\n",
+    "\n",
+    "print(f\"Model loaded: {base_model_name}\")\n",
+    "print(f\"Device: {'cuda' if torch.cuda.is_available() else 'cpu'}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0a772192",
+   "metadata": {},
+   "source": [
+    "## Apply LoRA for Parameter-Efficient Fine-tuning"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 39,
+   "id": "ba294e50",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "trainable params: 2,680,322 || all params: 110,992,132 || trainable%: 2.4149\n"
+     ]
+    }
+   ],
+   "source": [
+    "peft_config = LoraConfig(\n",
+    "    r=16,\n",
+    "    target_modules=\"all-linear\",\n",
+    "    lora_alpha=16,\n",
+    "    bias=\"none\",\n",
+    "    lora_dropout=0.05,\n",
+    "    use_rslora=True,\n",
+    "    modules_to_save=[\"classifier\"],\n",
+    ")\n",
+    "\n",
+    "model = get_peft_model(model, peft_config)\n",
+    "model.print_trainable_parameters()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3cf58dd8",
+   "metadata": {},
+   "source": [
+    "## Preprocessing and Tokenization"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 40,
+   "id": "c7992ba4",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "Map: 100%|██████████| 18943/18943 [00:01<00:00, 10132.04 examples/s]\n",
+      "Map: 100%|██████████| 997/997 [00:00<00:00, 11498.07 examples/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Tokenization complete!\n",
+      "Tensor columns: ['input_ids', 'attention_mask', 'token_type_ids', 'labels']\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "def _preprocess_function(\n",
+    "    batch: dict,\n",
+    "    tokenizer: PreTrainedTokenizer,\n",
+    "    max_length: int = 512,\n",
+    ") -> BatchEncoding:\n",
+    "    model_inputs = tokenizer(\n",
+    "        batch[\"text\"],\n",
+    "        max_length=max_length,\n",
+    "        truncation=True,\n",
+    "    )\n",
+    "    model_inputs[\"labels\"] = batch[\"label\"]\n",
+    "    return model_inputs\n",
+    "\n",
+    "\n",
+    "preprocess_function = partial(_preprocess_function, tokenizer=tokenizer)\n",
+    "tokenized_datasets = raw_datasets.map(\n",
+    "    preprocess_function,\n",
+    "    batched=True,\n",
+    "    remove_columns=[\"text\", \"label\"],\n",
+    ")\n",
+    "\n",
+    "# Ensure PyTorch tensors and expected columns\n",
+    "available_columns = tokenized_datasets[\"train\"].column_names\n",
+    "tensor_columns = [\n",
+    "    column_name\n",
+    "    for column_name in [\"input_ids\", \"attention_mask\", \"token_type_ids\", \"labels\"]\n",
+    "    if column_name in available_columns\n",
+    "]\n",
+    "tokenized_datasets.set_format(type=\"torch\", columns=tensor_columns)\n",
+    "\n",
+    "print(\"Tokenization complete!\")\n",
+    "print(\"Tensor columns:\", tensor_columns)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "31db700b",
+   "metadata": {},
+   "source": [
+    "## Define Metrics"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 41,
+   "id": "899e4408",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "metric_accuracy = evaluate.load(\"accuracy\")\n",
+    "metric_f1 = evaluate.load(\"f1\")\n",
+    "\n",
+    "\n",
+    "def _compute_metrics(\n",
+    "    eval_pred: tuple[np.ndarray, np.ndarray],\n",
+    "    metric_accuracy: evaluate.EvaluationModule,\n",
+    "    metric_f1: evaluate.EvaluationModule,\n",
+    ") -> dict[str, float]:\n",
+    "    predictions, labels = eval_pred\n",
+    "\n",
+    "    if isinstance(predictions, tuple):\n",
+    "        predictions = predictions[0]\n",
+    "\n",
+    "    predictions = np.argmax(predictions, axis=1)\n",
+    "\n",
+    "    accuracy = metric_accuracy.compute(predictions=predictions, references=labels)\n",
+    "    f1 = metric_f1.compute(predictions=predictions, references=labels)\n",
+    "\n",
+    "    assert accuracy is not None and f1 is not None\n",
+    "\n",
+    "    result = {\n",
+    "        \"accuracy\": accuracy[\"accuracy\"],\n",
+    "        \"f1\": f1[\"f1\"],\n",
+    "    }\n",
+    "\n",
+    "    return result\n",
+    "\n",
+    "\n",
+    "compute_metrics = partial(\n",
+    "    _compute_metrics, metric_accuracy=metric_accuracy, metric_f1=metric_f1\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "34890c4d",
+   "metadata": {},
+   "source": [
+    "## Training Configuration"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 42,
+   "id": "9717d666",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "train_batch_size = 4\n",
+    "gradient_accumulation_steps = 8\n",
+    "eval_batch_size = 4\n",
+    "\n",
+    "training_args = TrainingArguments(\n",
+    "    \"./models/bert-base-raid-classifier\",\n",
+    "    num_train_epochs=5,\n",
+    "    learning_rate=5e-5,\n",
+    "    weight_decay=0.1,\n",
+    "    per_device_train_batch_size=train_batch_size,\n",
+    "    per_device_eval_batch_size=eval_batch_size,\n",
+    "    gradient_accumulation_steps=gradient_accumulation_steps,\n",
+    "    fp16=torch.cuda.is_available(),\n",
+    "    save_strategy=\"steps\",\n",
+    "    save_total_limit=2,\n",
+    "    save_steps=64,\n",
+    "    metric_for_best_model=\"eval_accuracy\",\n",
+    "    load_best_model_at_end=True,\n",
+    "    eval_strategy=\"steps\",\n",
+    "    eval_steps=64,\n",
+    "    logging_strategy=\"steps\",\n",
+    "    logging_steps=16,\n",
+    "    remove_unused_columns=False,\n",
+    ")\n",
+    "\n",
+    "data_collator = DataCollatorWithPadding(tokenizer=tokenizer)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e840a954",
+   "metadata": {},
+   "source": [
+    "## Initialize Trainer and Train"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 43,
+   "id": "0fa3ed58",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Starting training...\n"
+     ]
+    },
+    {
+     "data": {
+      "text/html": [
+       "\n",
+       "    <div>\n",
+       "      \n",
+       "      <progress value='2960' max='2960' style='width:300px; height:20px; vertical-align: middle;'></progress>\n",
+       "      [2960/2960 1:03:52, Epoch 5/5]\n",
+       "    </div>\n",
+       "    <table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       " <tr style=\"text-align: left;\">\n",
+       "      <th>Step</th>\n",
+       "      <th>Training Loss</th>\n",
+       "      <th>Validation Loss</th>\n",
+       "      <th>Accuracy</th>\n",
+       "      <th>F1</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <td>64</td>\n",
+       "      <td>5.212345</td>\n",
+       "      <td>0.625602</td>\n",
+       "      <td>0.661986</td>\n",
+       "      <td>0.634093</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>128</td>\n",
+       "      <td>3.753965</td>\n",
+       "      <td>0.458432</td>\n",
+       "      <td>0.771314</td>\n",
+       "      <td>0.809045</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>192</td>\n",
+       "      <td>3.100017</td>\n",
+       "      <td>0.287685</td>\n",
+       "      <td>0.889669</td>\n",
+       "      <td>0.891089</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>256</td>\n",
+       "      <td>2.328572</td>\n",
+       "      <td>0.390553</td>\n",
+       "      <td>0.830491</td>\n",
+       "      <td>0.855432</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>320</td>\n",
+       "      <td>2.129814</td>\n",
+       "      <td>0.238838</td>\n",
+       "      <td>0.911735</td>\n",
+       "      <td>0.917757</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>384</td>\n",
+       "      <td>1.657923</td>\n",
+       "      <td>0.388610</td>\n",
+       "      <td>0.856570</td>\n",
+       "      <td>0.874671</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>448</td>\n",
+       "      <td>1.758504</td>\n",
+       "      <td>0.179176</td>\n",
+       "      <td>0.933801</td>\n",
+       "      <td>0.937262</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>512</td>\n",
+       "      <td>1.352967</td>\n",
+       "      <td>0.344061</td>\n",
+       "      <td>0.867603</td>\n",
+       "      <td>0.882979</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>576</td>\n",
+       "      <td>1.528169</td>\n",
+       "      <td>0.143238</td>\n",
+       "      <td>0.945838</td>\n",
+       "      <td>0.947368</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>640</td>\n",
+       "      <td>1.692302</td>\n",
+       "      <td>0.185934</td>\n",
+       "      <td>0.925777</td>\n",
+       "      <td>0.930582</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>704</td>\n",
+       "      <td>1.194244</td>\n",
+       "      <td>0.189194</td>\n",
+       "      <td>0.927783</td>\n",
+       "      <td>0.932203</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>768</td>\n",
+       "      <td>1.089103</td>\n",
+       "      <td>0.191697</td>\n",
+       "      <td>0.926780</td>\n",
+       "      <td>0.931455</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>832</td>\n",
+       "      <td>1.313780</td>\n",
+       "      <td>0.133464</td>\n",
+       "      <td>0.949850</td>\n",
+       "      <td>0.950593</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>896</td>\n",
+       "      <td>1.144064</td>\n",
+       "      <td>0.161593</td>\n",
+       "      <td>0.943831</td>\n",
+       "      <td>0.946463</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>960</td>\n",
+       "      <td>1.503407</td>\n",
+       "      <td>0.211920</td>\n",
+       "      <td>0.921765</td>\n",
+       "      <td>0.927374</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>1024</td>\n",
+       "      <td>1.106765</td>\n",
+       "      <td>0.182482</td>\n",
+       "      <td>0.931795</td>\n",
+       "      <td>0.935606</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>1088</td>\n",
+       "      <td>1.450451</td>\n",
+       "      <td>0.127360</td>\n",
+       "      <td>0.956871</td>\n",
+       "      <td>0.958212</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>1152</td>\n",
+       "      <td>1.380015</td>\n",
+       "      <td>0.131538</td>\n",
+       "      <td>0.957874</td>\n",
+       "      <td>0.959064</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>1216</td>\n",
+       "      <td>0.755666</td>\n",
+       "      <td>0.158870</td>\n",
+       "      <td>0.940822</td>\n",
+       "      <td>0.943432</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>1280</td>\n",
+       "      <td>0.863713</td>\n",
+       "      <td>0.157785</td>\n",
+       "      <td>0.943831</td>\n",
+       "      <td>0.946565</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>1344</td>\n",
+       "      <td>0.821364</td>\n",
+       "      <td>0.172321</td>\n",
+       "      <td>0.944835</td>\n",
+       "      <td>0.947469</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>1408</td>\n",
+       "      <td>0.957095</td>\n",
+       "      <td>0.226298</td>\n",
+       "      <td>0.922768</td>\n",
+       "      <td>0.927835</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>1472</td>\n",
+       "      <td>0.868089</td>\n",
+       "      <td>0.197520</td>\n",
+       "      <td>0.934804</td>\n",
+       "      <td>0.938505</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>1536</td>\n",
+       "      <td>1.310811</td>\n",
+       "      <td>0.140865</td>\n",
+       "      <td>0.953862</td>\n",
+       "      <td>0.955426</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>1600</td>\n",
+       "      <td>0.708888</td>\n",
+       "      <td>0.152195</td>\n",
+       "      <td>0.943831</td>\n",
+       "      <td>0.946565</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>1664</td>\n",
+       "      <td>0.717255</td>\n",
+       "      <td>0.176768</td>\n",
+       "      <td>0.942828</td>\n",
+       "      <td>0.945663</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>1728</td>\n",
+       "      <td>1.143681</td>\n",
+       "      <td>0.156816</td>\n",
+       "      <td>0.951856</td>\n",
+       "      <td>0.953757</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>1792</td>\n",
+       "      <td>0.638254</td>\n",
+       "      <td>0.176596</td>\n",
+       "      <td>0.944835</td>\n",
+       "      <td>0.947469</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>1856</td>\n",
+       "      <td>1.133300</td>\n",
+       "      <td>0.119119</td>\n",
+       "      <td>0.967904</td>\n",
+       "      <td>0.968317</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>1920</td>\n",
+       "      <td>1.061837</td>\n",
+       "      <td>0.140624</td>\n",
+       "      <td>0.957874</td>\n",
+       "      <td>0.959381</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>1984</td>\n",
+       "      <td>0.708067</td>\n",
+       "      <td>0.189490</td>\n",
+       "      <td>0.940822</td>\n",
+       "      <td>0.943863</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>2048</td>\n",
+       "      <td>0.761451</td>\n",
+       "      <td>0.150488</td>\n",
+       "      <td>0.951856</td>\n",
+       "      <td>0.953846</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>2112</td>\n",
+       "      <td>0.609547</td>\n",
+       "      <td>0.189622</td>\n",
+       "      <td>0.940822</td>\n",
+       "      <td>0.943863</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>2176</td>\n",
+       "      <td>0.803254</td>\n",
+       "      <td>0.173354</td>\n",
+       "      <td>0.946841</td>\n",
+       "      <td>0.949282</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>2240</td>\n",
+       "      <td>0.664540</td>\n",
+       "      <td>0.154308</td>\n",
+       "      <td>0.952859</td>\n",
+       "      <td>0.954764</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>2304</td>\n",
+       "      <td>0.691763</td>\n",
+       "      <td>0.144127</td>\n",
+       "      <td>0.963892</td>\n",
+       "      <td>0.964706</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>2368</td>\n",
+       "      <td>1.092195</td>\n",
+       "      <td>0.157182</td>\n",
+       "      <td>0.957874</td>\n",
+       "      <td>0.959381</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>2432</td>\n",
+       "      <td>0.752286</td>\n",
+       "      <td>0.231035</td>\n",
+       "      <td>0.933801</td>\n",
+       "      <td>0.937736</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>2496</td>\n",
+       "      <td>0.757014</td>\n",
+       "      <td>0.185019</td>\n",
+       "      <td>0.948847</td>\n",
+       "      <td>0.951103</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>2560</td>\n",
+       "      <td>0.766771</td>\n",
+       "      <td>0.153019</td>\n",
+       "      <td>0.958877</td>\n",
+       "      <td>0.960078</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>2624</td>\n",
+       "      <td>0.434590</td>\n",
+       "      <td>0.201383</td>\n",
+       "      <td>0.946841</td>\n",
+       "      <td>0.949282</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>2688</td>\n",
+       "      <td>0.565482</td>\n",
+       "      <td>0.181478</td>\n",
+       "      <td>0.952859</td>\n",
+       "      <td>0.954764</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>2752</td>\n",
+       "      <td>0.568177</td>\n",
+       "      <td>0.201250</td>\n",
+       "      <td>0.946841</td>\n",
+       "      <td>0.949282</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>2816</td>\n",
+       "      <td>0.611295</td>\n",
+       "      <td>0.173839</td>\n",
+       "      <td>0.954865</td>\n",
+       "      <td>0.956606</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>2880</td>\n",
+       "      <td>0.716351</td>\n",
+       "      <td>0.187448</td>\n",
+       "      <td>0.948847</td>\n",
+       "      <td>0.951103</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>2944</td>\n",
+       "      <td>0.603852</td>\n",
+       "      <td>0.184578</td>\n",
+       "      <td>0.948847</td>\n",
+       "      <td>0.951103</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table><p>"
+      ],
+      "text/plain": [
+       "<IPython.core.display.HTML object>"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "TrainOutput(global_step=2960, training_loss=1.3125710455146995, metrics={'train_runtime': 3832.8474, 'train_samples_per_second': 24.711, 'train_steps_per_second': 0.772, 'total_flos': 8360830141838376.0, 'train_loss': 1.3125710455146995, 'epoch': 5.0})"
+      ]
+     },
+     "execution_count": 43,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "trainer = Trainer(\n",
+    "    model,\n",
+    "    training_args,\n",
+    "    train_dataset=tokenized_datasets[\"train\"],\n",
+    "    eval_dataset=tokenized_datasets[\"test\"],\n",
+    "    data_collator=data_collator,\n",
+    "    compute_metrics=compute_metrics,\n",
+    ")\n",
+    "\n",
+    "print(\"Starting training...\")\n",
+    "trainer.train()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cde9bbb1",
+   "metadata": {},
+   "source": [
+    "## Final Evaluation"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 44,
+   "id": "bb81afb9",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Evaluating model...\n"
+     ]
+    },
+    {
+     "data": {
+      "text/html": [
+       "\n",
+       "    <div>\n",
+       "      \n",
+       "      <progress value='250' max='250' style='width:300px; height:20px; vertical-align: middle;'></progress>\n",
+       "      [250/250 00:14]\n",
+       "    </div>\n",
+       "    "
+      ],
+      "text/plain": [
+       "<IPython.core.display.HTML object>"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\n",
+      "Final Results:\n",
+      "Accuracy: 0.9679\n",
+      "F1 Score: 0.9683\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(\"Evaluating model...\")\n",
+    "eval_results = trainer.evaluate()\n",
+    "print(\"\\nFinal Results:\")\n",
+    "print(f\"Accuracy: {eval_results['eval_accuracy']:.4f}\")\n",
+    "print(f\"F1 Score: {eval_results['eval_f1']:.4f}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8bf17a40",
+   "metadata": {},
+   "source": [
+    "## Save Model"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 45,
+   "id": "e580bfd6",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Model saved successfully!\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Save the final model\n",
+    "trainer.save_model(\"./trained_model/bert-base-raid-final\")\n",
+    "print(\"Model saved successfully!\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "99c0a2f0",
+   "metadata": {},
+   "source": [
+    "## test the model\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 46,
+   "id": "016cc53e",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Prediction for human-written text:\n",
+      "{'predicted_label': 0, 'probability_human': 0.9988395571708679, 'probability_ai': 0.0011604195460677147}\n",
+      "\n",
+      "Prediction for AI-generated text:\n",
+      "{'predicted_label': 0, 'probability_human': 0.9988927245140076, 'probability_ai': 0.0011073390487581491}\n"
+     ]
+    }
+   ],
+   "source": [
+    "def predict(text: str) -> dict[str, float]:\n",
+    "    inputs = tokenizer(\n",
+    "        text,\n",
+    "        max_length=512,\n",
+    "        truncation=True,\n",
+    "        return_tensors=\"pt\",\n",
+    "    ).to(model.device)\n",
+    "\n",
+    "    with torch.no_grad():\n",
+    "        outputs = model(**inputs)\n",
+    "        logits = outputs.logits\n",
+    "        probabilities = torch.softmax(logits, dim=-1).cpu().numpy()[0]\n",
+    "        predicted_label = np.argmax(probabilities)\n",
+    "\n",
+    "    return {\n",
+    "        \"predicted_label\": int(predicted_label),\n",
+    "        \"probability_human\": float(probabilities[0]),\n",
+    "        \"probability_ai\": float(probabilities[1]),\n",
+    "    }\n",
+    "    \n",
+    "text = \"Ai will replace this world. today in the nepal election someone might win by using ai.\"\n",
+    "text_by_ai = \"This is a sample text generated by AI.Also This is an long text by AI.\"\n",
+    "print(\"Prediction for human-written text:\")\n",
+    "print(predict(text))\n",
+    "print(\"\\nPrediction for AI-generated text:\")\n",
+    "print(predict(text_by_ai))\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7c6c2a5d",
+   "metadata": {},
+   "source": [
+    "def predict"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 47,
+   "id": "1b287605",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Using 512 samples for RAID quick test\n"
+     ]
+    },
+    {
+     "ename": "OutOfMemoryError",
+     "evalue": "CUDA out of memory. Tried to allocate 768.00 MiB. GPU 0 has a total capacity of 3.68 GiB of which 719.12 MiB is free. Process 2034 has 46.03 MiB memory in use. Process 1961 has 6.78 MiB memory in use. Including non-PyTorch memory, this process has 2.90 GiB memory in use. Of the allocated memory 2.71 GiB is allocated by PyTorch, and 85.13 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)",
+     "output_type": "error",
+     "traceback": [
+      "\u001b[31m---------------------------------------------------------------------------\u001b[39m",
+      "\u001b[31mOutOfMemoryError\u001b[39m                          Traceback (most recent call last)",
+      "\u001b[36mCell\u001b[39m\u001b[36m \u001b[39m\u001b[32mIn[47]\u001b[39m\u001b[32m, line 32\u001b[39m\n\u001b[32m     28\u001b[39m     \u001b[38;5;66;03m# Return AI-class probability for each input text\u001b[39;00m\n\u001b[32m     29\u001b[39m     \u001b[38;5;28;01mreturn\u001b[39;00m probabilities[:, \u001b[32m1\u001b[39m].astype(\u001b[38;5;28mfloat\u001b[39m).tolist()\n\u001b[32m---> \u001b[39m\u001b[32m32\u001b[39m predictions = \u001b[43mrun_detection\u001b[49m\u001b[43m(\u001b[49m\u001b[43mmy_detector\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mtest_df\u001b[49m\u001b[43m)\u001b[49m\n\u001b[32m     33\u001b[39m evaluation_result = run_evaluation(predictions, test_df)\n\u001b[32m     35\u001b[39m evaluation_result\n",
+      "\u001b[36mFile \u001b[39m\u001b[32m~/miniconda3/envs/ml/lib/python3.11/site-packages/raid/detect.py:6\u001b[39m, in \u001b[36mrun_detection\u001b[39m\u001b[34m(f, df)\u001b[39m\n\u001b[32m      3\u001b[39m scores_df = df[[\u001b[33m\"\u001b[39m\u001b[33mid\u001b[39m\u001b[33m\"\u001b[39m]].copy()\n\u001b[32m      5\u001b[39m \u001b[38;5;66;03m# Run the detector function on the dataset and put output in score column\u001b[39;00m\n\u001b[32m----> \u001b[39m\u001b[32m6\u001b[39m scores_df[\u001b[33m\"\u001b[39m\u001b[33mscore\u001b[39m\u001b[33m\"\u001b[39m] = \u001b[43mf\u001b[49m\u001b[43m(\u001b[49m\u001b[43mdf\u001b[49m\u001b[43m[\u001b[49m\u001b[33;43m\"\u001b[39;49m\u001b[33;43mgeneration\u001b[39;49m\u001b[33;43m\"\u001b[39;49m\u001b[43m]\u001b[49m\u001b[43m.\u001b[49m\u001b[43mtolist\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\u001b[43m)\u001b[49m\n\u001b[32m      8\u001b[39m \u001b[38;5;66;03m# Convert scores and ids to dict in 'records' format for seralization\u001b[39;00m\n\u001b[32m      9\u001b[39m \u001b[38;5;66;03m# e.g. [{'id':'...', 'score':0}, {'id':'...', 'score':1}, ...]\u001b[39;00m\n\u001b[32m     10\u001b[39m results = scores_df[[\u001b[33m\"\u001b[39m\u001b[33mid\u001b[39m\u001b[33m\"\u001b[39m, \u001b[33m\"\u001b[39m\u001b[33mscore\u001b[39m\u001b[33m\"\u001b[39m]].to_dict(orient=\u001b[33m\"\u001b[39m\u001b[33mrecords\u001b[39m\u001b[33m\"\u001b[39m)\n",
+      "\u001b[36mCell\u001b[39m\u001b[36m \u001b[39m\u001b[32mIn[47]\u001b[39m\u001b[32m, line 24\u001b[39m, in \u001b[36mmy_detector\u001b[39m\u001b[34m(texts)\u001b[39m\n\u001b[32m     22\u001b[39m model.eval()\n\u001b[32m     23\u001b[39m \u001b[38;5;28;01mwith\u001b[39;00m torch.no_grad():\n\u001b[32m---> \u001b[39m\u001b[32m24\u001b[39m     outputs = \u001b[43mmodel\u001b[49m\u001b[43m(\u001b[49m\u001b[43m*\u001b[49m\u001b[43m*\u001b[49m\u001b[43minputs\u001b[49m\u001b[43m)\u001b[49m\n\u001b[32m     25\u001b[39m     logits = outputs.logits\n\u001b[32m     26\u001b[39m     probabilities = torch.softmax(logits, dim=-\u001b[32m1\u001b[39m).cpu().numpy()\n",
+      "\u001b[36mFile \u001b[39m\u001b[32m~/miniconda3/envs/ml/lib/python3.11/site-packages/torch/nn/modules/module.py:1736\u001b[39m, in \u001b[36mModule._wrapped_call_impl\u001b[39m\u001b[34m(self, *args, **kwargs)\u001b[39m\n\u001b[32m   1734\u001b[39m     \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28mself\u001b[39m._compiled_call_impl(*args, **kwargs)  \u001b[38;5;66;03m# type: ignore[misc]\u001b[39;00m\n\u001b[32m   1735\u001b[39m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[32m-> \u001b[39m\u001b[32m1736\u001b[39m     \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43mself\u001b[39;49m\u001b[43m.\u001b[49m\u001b[43m_call_impl\u001b[49m\u001b[43m(\u001b[49m\u001b[43m*\u001b[49m\u001b[43margs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43m*\u001b[49m\u001b[43m*\u001b[49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n",
+      "\u001b[36mFile \u001b[39m\u001b[32m~/miniconda3/envs/ml/lib/python3.11/site-packages/torch/nn/modules/module.py:1747\u001b[39m, in \u001b[36mModule._call_impl\u001b[39m\u001b[34m(self, *args, **kwargs)\u001b[39m\n\u001b[32m   1742\u001b[39m \u001b[38;5;66;03m# If we don't have any hooks, we want to skip the rest of the logic in\u001b[39;00m\n\u001b[32m   1743\u001b[39m \u001b[38;5;66;03m# this function, and just call forward.\u001b[39;00m\n\u001b[32m   1744\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m (\u001b[38;5;28mself\u001b[39m._backward_hooks \u001b[38;5;129;01mor\u001b[39;00m \u001b[38;5;28mself\u001b[39m._backward_pre_hooks \u001b[38;5;129;01mor\u001b[39;00m \u001b[38;5;28mself\u001b[39m._forward_hooks \u001b[38;5;129;01mor\u001b[39;00m \u001b[38;5;28mself\u001b[39m._forward_pre_hooks\n\u001b[32m   1745\u001b[39m         \u001b[38;5;129;01mor\u001b[39;00m _global_backward_pre_hooks \u001b[38;5;129;01mor\u001b[39;00m _global_backward_hooks\n\u001b[32m   1746\u001b[39m         \u001b[38;5;129;01mor\u001b[39;00m _global_forward_hooks \u001b[38;5;129;01mor\u001b[39;00m _global_forward_pre_hooks):\n\u001b[32m-> \u001b[39m\u001b[32m1747\u001b[39m     \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mforward_call\u001b[49m\u001b[43m(\u001b[49m\u001b[43m*\u001b[49m\u001b[43margs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43m*\u001b[49m\u001b[43m*\u001b[49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n\u001b[32m   1749\u001b[39m result = \u001b[38;5;28;01mNone\u001b[39;00m\n\u001b[32m   1750\u001b[39m called_always_called_hooks = \u001b[38;5;28mset\u001b[39m()\n",
+      "\u001b[36mFile \u001b[39m\u001b[32m~/miniconda3/envs/ml/lib/python3.11/site-packages/accelerate/utils/operations.py:819\u001b[39m, in \u001b[36mconvert_outputs_to_fp32.<locals>.forward\u001b[39m\u001b[34m(*args, **kwargs)\u001b[39m\n\u001b[32m    818\u001b[39m \u001b[38;5;28;01mdef\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34mforward\u001b[39m(*args, **kwargs):\n\u001b[32m--> \u001b[39m\u001b[32m819\u001b[39m     \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mmodel_forward\u001b[49m\u001b[43m(\u001b[49m\u001b[43m*\u001b[49m\u001b[43margs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43m*\u001b[49m\u001b[43m*\u001b[49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n",
+      "\u001b[36mFile \u001b[39m\u001b[32m~/miniconda3/envs/ml/lib/python3.11/site-packages/accelerate/utils/operations.py:807\u001b[39m, in \u001b[36mConvertOutputsToFp32.__call__\u001b[39m\u001b[34m(self, *args, **kwargs)\u001b[39m\n\u001b[32m    806\u001b[39m \u001b[38;5;28;01mdef\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34m__call__\u001b[39m(\u001b[38;5;28mself\u001b[39m, *args, **kwargs):\n\u001b[32m--> \u001b[39m\u001b[32m807\u001b[39m     \u001b[38;5;28;01mreturn\u001b[39;00m convert_to_fp32(\u001b[38;5;28;43mself\u001b[39;49m\u001b[43m.\u001b[49m\u001b[43mmodel_forward\u001b[49m\u001b[43m(\u001b[49m\u001b[43m*\u001b[49m\u001b[43margs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43m*\u001b[49m\u001b[43m*\u001b[49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m)\n",
+      "\u001b[36mFile \u001b[39m\u001b[32m~/miniconda3/envs/ml/lib/python3.11/site-packages/torch/amp/autocast_mode.py:44\u001b[39m, in \u001b[36mautocast_decorator.<locals>.decorate_autocast\u001b[39m\u001b[34m(*args, **kwargs)\u001b[39m\n\u001b[32m     41\u001b[39m \u001b[38;5;129m@functools\u001b[39m.wraps(func)\n\u001b[32m     42\u001b[39m \u001b[38;5;28;01mdef\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34mdecorate_autocast\u001b[39m(*args, **kwargs):\n\u001b[32m     43\u001b[39m     \u001b[38;5;28;01mwith\u001b[39;00m autocast_instance:\n\u001b[32m---> \u001b[39m\u001b[32m44\u001b[39m         \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mfunc\u001b[49m\u001b[43m(\u001b[49m\u001b[43m*\u001b[49m\u001b[43margs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43m*\u001b[49m\u001b[43m*\u001b[49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n",
+      "\u001b[36mFile \u001b[39m\u001b[32m~/miniconda3/envs/ml/lib/python3.11/site-packages/peft/peft_model.py:921\u001b[39m, in \u001b[36mPeftModel.forward\u001b[39m\u001b[34m(self, *args, **kwargs)\u001b[39m\n\u001b[32m    919\u001b[39m \u001b[38;5;28;01mwith\u001b[39;00m \u001b[38;5;28mself\u001b[39m._enable_peft_forward_hooks(*args, **kwargs):\n\u001b[32m    920\u001b[39m     kwargs = {k: v \u001b[38;5;28;01mfor\u001b[39;00m k, v \u001b[38;5;129;01min\u001b[39;00m kwargs.items() \u001b[38;5;28;01mif\u001b[39;00m k \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;129;01min\u001b[39;00m \u001b[38;5;28mself\u001b[39m.special_peft_forward_args}\n\u001b[32m--> \u001b[39m\u001b[32m921\u001b[39m     \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43mself\u001b[39;49m\u001b[43m.\u001b[49m\u001b[43mget_base_model\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\u001b[43m(\u001b[49m\u001b[43m*\u001b[49m\u001b[43margs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43m*\u001b[49m\u001b[43m*\u001b[49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n",
+      "\u001b[36mFile \u001b[39m\u001b[32m~/miniconda3/envs/ml/lib/python3.11/site-packages/torch/nn/modules/module.py:1736\u001b[39m, in \u001b[36mModule._wrapped_call_impl\u001b[39m\u001b[34m(self, *args, **kwargs)\u001b[39m\n\u001b[32m   1734\u001b[39m     \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28mself\u001b[39m._compiled_call_impl(*args, **kwargs)  \u001b[38;5;66;03m# type: ignore[misc]\u001b[39;00m\n\u001b[32m   1735\u001b[39m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[32m-> \u001b[39m\u001b[32m1736\u001b[39m     \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43mself\u001b[39;49m\u001b[43m.\u001b[49m\u001b[43m_call_impl\u001b[49m\u001b[43m(\u001b[49m\u001b[43m*\u001b[49m\u001b[43margs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43m*\u001b[49m\u001b[43m*\u001b[49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n",
+      "\u001b[36mFile \u001b[39m\u001b[32m~/miniconda3/envs/ml/lib/python3.11/site-packages/torch/nn/modules/module.py:1747\u001b[39m, in \u001b[36mModule._call_impl\u001b[39m\u001b[34m(self, *args, **kwargs)\u001b[39m\n\u001b[32m   1742\u001b[39m \u001b[38;5;66;03m# If we don't have any hooks, we want to skip the rest of the logic in\u001b[39;00m\n\u001b[32m   1743\u001b[39m \u001b[38;5;66;03m# this function, and just call forward.\u001b[39;00m\n\u001b[32m   1744\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m (\u001b[38;5;28mself\u001b[39m._backward_hooks \u001b[38;5;129;01mor\u001b[39;00m \u001b[38;5;28mself\u001b[39m._backward_pre_hooks \u001b[38;5;129;01mor\u001b[39;00m \u001b[38;5;28mself\u001b[39m._forward_hooks \u001b[38;5;129;01mor\u001b[39;00m \u001b[38;5;28mself\u001b[39m._forward_pre_hooks\n\u001b[32m   1745\u001b[39m         \u001b[38;5;129;01mor\u001b[39;00m _global_backward_pre_hooks \u001b[38;5;129;01mor\u001b[39;00m _global_backward_hooks\n\u001b[32m   1746\u001b[39m         \u001b[38;5;129;01mor\u001b[39;00m _global_forward_hooks \u001b[38;5;129;01mor\u001b[39;00m _global_forward_pre_hooks):\n\u001b[32m-> \u001b[39m\u001b[32m1747\u001b[39m     \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mforward_call\u001b[49m\u001b[43m(\u001b[49m\u001b[43m*\u001b[49m\u001b[43margs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43m*\u001b[49m\u001b[43m*\u001b[49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n\u001b[32m   1749\u001b[39m result = \u001b[38;5;28;01mNone\u001b[39;00m\n\u001b[32m   1750\u001b[39m called_always_called_hooks = \u001b[38;5;28mset\u001b[39m()\n",
+      "\u001b[36mFile \u001b[39m\u001b[32m~/miniconda3/envs/ml/lib/python3.11/site-packages/transformers/utils/generic.py:835\u001b[39m, in \u001b[36mcan_return_tuple.<locals>.wrapper\u001b[39m\u001b[34m(self, *args, **kwargs)\u001b[39m\n\u001b[32m    833\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m return_dict_passed \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m:\n\u001b[32m    834\u001b[39m     return_dict = return_dict_passed\n\u001b[32m--> \u001b[39m\u001b[32m835\u001b[39m output = \u001b[43mfunc\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43m*\u001b[49m\u001b[43margs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43m*\u001b[49m\u001b[43m*\u001b[49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n\u001b[32m    836\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m return_dict \u001b[38;5;129;01mand\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28misinstance\u001b[39m(output, \u001b[38;5;28mtuple\u001b[39m):\n\u001b[32m    837\u001b[39m     output = output.to_tuple()\n",
+      "\u001b[36mFile \u001b[39m\u001b[32m~/miniconda3/envs/ml/lib/python3.11/site-packages/transformers/models/bert/modeling_bert.py:1162\u001b[39m, in \u001b[36mBertForSequenceClassification.forward\u001b[39m\u001b[34m(self, input_ids, attention_mask, token_type_ids, position_ids, inputs_embeds, labels, **kwargs)\u001b[39m\n\u001b[32m   1144\u001b[39m \u001b[38;5;129m@can_return_tuple\u001b[39m\n\u001b[32m   1145\u001b[39m \u001b[38;5;129m@auto_docstring\u001b[39m\n\u001b[32m   1146\u001b[39m \u001b[38;5;28;01mdef\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34mforward\u001b[39m(\n\u001b[32m   (...)\u001b[39m\u001b[32m   1154\u001b[39m     **kwargs: Unpack[TransformersKwargs],\n\u001b[32m   1155\u001b[39m ) -> \u001b[38;5;28mtuple\u001b[39m[torch.Tensor] | SequenceClassifierOutput:\n\u001b[32m   1156\u001b[39m \u001b[38;5;250m    \u001b[39m\u001b[33mr\u001b[39m\u001b[33;03m\"\"\"\u001b[39;00m\n\u001b[32m   1157\u001b[39m \u001b[33;03m    labels (`torch.LongTensor` of shape `(batch_size,)`, *optional*):\u001b[39;00m\n\u001b[32m   1158\u001b[39m \u001b[33;03m        Labels for computing the sequence classification/regression loss. Indices should be in `[0, ...,\u001b[39;00m\n\u001b[32m   1159\u001b[39m \u001b[33;03m        config.num_labels - 1]`. If `config.num_labels == 1` a regression loss is computed (Mean-Square loss), If\u001b[39;00m\n\u001b[32m   1160\u001b[39m \u001b[33;03m        `config.num_labels > 1` a classification loss is computed (Cross-Entropy).\u001b[39;00m\n\u001b[32m   1161\u001b[39m \u001b[33;03m    \"\"\"\u001b[39;00m\n\u001b[32m-> \u001b[39m\u001b[32m1162\u001b[39m     outputs = \u001b[38;5;28;43mself\u001b[39;49m\u001b[43m.\u001b[49m\u001b[43mbert\u001b[49m\u001b[43m(\u001b[49m\n\u001b[32m   1163\u001b[39m \u001b[43m        \u001b[49m\u001b[43minput_ids\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m   1164\u001b[39m \u001b[43m        \u001b[49m\u001b[43mattention_mask\u001b[49m\u001b[43m=\u001b[49m\u001b[43mattention_mask\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m   1165\u001b[39m \u001b[43m        \u001b[49m\u001b[43mtoken_type_ids\u001b[49m\u001b[43m=\u001b[49m\u001b[43mtoken_type_ids\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m   1166\u001b[39m \u001b[43m        \u001b[49m\u001b[43mposition_ids\u001b[49m\u001b[43m=\u001b[49m\u001b[43mposition_ids\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m   1167\u001b[39m \u001b[43m        \u001b[49m\u001b[43minputs_embeds\u001b[49m\u001b[43m=\u001b[49m\u001b[43minputs_embeds\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m   1168\u001b[39m \u001b[43m        \u001b[49m\u001b[43mreturn_dict\u001b[49m\u001b[43m=\u001b[49m\u001b[38;5;28;43;01mTrue\u001b[39;49;00m\u001b[43m,\u001b[49m\n\u001b[32m   1169\u001b[39m \u001b[43m        \u001b[49m\u001b[43m*\u001b[49m\u001b[43m*\u001b[49m\u001b[43mkwargs\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m   1170\u001b[39m \u001b[43m    \u001b[49m\u001b[43m)\u001b[49m\n\u001b[32m   1172\u001b[39m     pooled_output = outputs[\u001b[32m1\u001b[39m]\n\u001b[32m   1174\u001b[39m     pooled_output = \u001b[38;5;28mself\u001b[39m.dropout(pooled_output)\n",
+      "\u001b[36mFile \u001b[39m\u001b[32m~/miniconda3/envs/ml/lib/python3.11/site-packages/torch/nn/modules/module.py:1736\u001b[39m, in \u001b[36mModule._wrapped_call_impl\u001b[39m\u001b[34m(self, *args, **kwargs)\u001b[39m\n\u001b[32m   1734\u001b[39m     \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28mself\u001b[39m._compiled_call_impl(*args, **kwargs)  \u001b[38;5;66;03m# type: ignore[misc]\u001b[39;00m\n\u001b[32m   1735\u001b[39m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[32m-> \u001b[39m\u001b[32m1736\u001b[39m     \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43mself\u001b[39;49m\u001b[43m.\u001b[49m\u001b[43m_call_impl\u001b[49m\u001b[43m(\u001b[49m\u001b[43m*\u001b[49m\u001b[43margs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43m*\u001b[49m\u001b[43m*\u001b[49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n",
+      "\u001b[36mFile \u001b[39m\u001b[32m~/miniconda3/envs/ml/lib/python3.11/site-packages/torch/nn/modules/module.py:1747\u001b[39m, in \u001b[36mModule._call_impl\u001b[39m\u001b[34m(self, *args, **kwargs)\u001b[39m\n\u001b[32m   1742\u001b[39m \u001b[38;5;66;03m# If we don't have any hooks, we want to skip the rest of the logic in\u001b[39;00m\n\u001b[32m   1743\u001b[39m \u001b[38;5;66;03m# this function, and just call forward.\u001b[39;00m\n\u001b[32m   1744\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m (\u001b[38;5;28mself\u001b[39m._backward_hooks \u001b[38;5;129;01mor\u001b[39;00m \u001b[38;5;28mself\u001b[39m._backward_pre_hooks \u001b[38;5;129;01mor\u001b[39;00m \u001b[38;5;28mself\u001b[39m._forward_hooks \u001b[38;5;129;01mor\u001b[39;00m \u001b[38;5;28mself\u001b[39m._forward_pre_hooks\n\u001b[32m   1745\u001b[39m         \u001b[38;5;129;01mor\u001b[39;00m _global_backward_pre_hooks \u001b[38;5;129;01mor\u001b[39;00m _global_backward_hooks\n\u001b[32m   1746\u001b[39m         \u001b[38;5;129;01mor\u001b[39;00m _global_forward_hooks \u001b[38;5;129;01mor\u001b[39;00m _global_forward_pre_hooks):\n\u001b[32m-> \u001b[39m\u001b[32m1747\u001b[39m     \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mforward_call\u001b[49m\u001b[43m(\u001b[49m\u001b[43m*\u001b[49m\u001b[43margs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43m*\u001b[49m\u001b[43m*\u001b[49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n\u001b[32m   1749\u001b[39m result = \u001b[38;5;28;01mNone\u001b[39;00m\n\u001b[32m   1750\u001b[39m called_always_called_hooks = \u001b[38;5;28mset\u001b[39m()\n",
+      "\u001b[36mFile \u001b[39m\u001b[32m~/miniconda3/envs/ml/lib/python3.11/site-packages/transformers/utils/generic.py:1002\u001b[39m, in \u001b[36mcheck_model_inputs.<locals>.wrapped_fn.<locals>.wrapper\u001b[39m\u001b[34m(self, *args, **kwargs)\u001b[39m\n\u001b[32m   1000\u001b[39m             outputs = func(\u001b[38;5;28mself\u001b[39m, *args, **kwargs)\n\u001b[32m   1001\u001b[39m     \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[32m-> \u001b[39m\u001b[32m1002\u001b[39m         outputs = \u001b[43mfunc\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43m*\u001b[49m\u001b[43margs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43m*\u001b[49m\u001b[43m*\u001b[49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n\u001b[32m   1003\u001b[39m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mTypeError\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m original_exception:\n\u001b[32m   1004\u001b[39m     \u001b[38;5;66;03m# If we get a TypeError, it's possible that the model is not receiving the recordable kwargs correctly.\u001b[39;00m\n\u001b[32m   1005\u001b[39m     \u001b[38;5;66;03m# Get a TypeError even after removing the recordable kwargs -> re-raise the original exception\u001b[39;00m\n\u001b[32m   1006\u001b[39m     \u001b[38;5;66;03m# Otherwise -> we're probably missing `**kwargs` in the decorated function\u001b[39;00m\n\u001b[32m   1007\u001b[39m     kwargs_without_recordable = {k: v \u001b[38;5;28;01mfor\u001b[39;00m k, v \u001b[38;5;129;01min\u001b[39;00m kwargs.items() \u001b[38;5;28;01mif\u001b[39;00m k \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;129;01min\u001b[39;00m recordable_keys}\n",
+      "\u001b[36mFile \u001b[39m\u001b[32m~/miniconda3/envs/ml/lib/python3.11/site-packages/transformers/models/bert/modeling_bert.py:679\u001b[39m, in \u001b[36mBertModel.forward\u001b[39m\u001b[34m(self, input_ids, attention_mask, token_type_ids, position_ids, inputs_embeds, encoder_hidden_states, encoder_attention_mask, past_key_values, use_cache, cache_position, **kwargs)\u001b[39m\n\u001b[32m    676\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m cache_position \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m:\n\u001b[32m    677\u001b[39m     cache_position = torch.arange(past_key_values_length, past_key_values_length + seq_length, device=device)\n\u001b[32m--> \u001b[39m\u001b[32m679\u001b[39m embedding_output = \u001b[38;5;28;43mself\u001b[39;49m\u001b[43m.\u001b[49m\u001b[43membeddings\u001b[49m\u001b[43m(\u001b[49m\n\u001b[32m    680\u001b[39m \u001b[43m    \u001b[49m\u001b[43minput_ids\u001b[49m\u001b[43m=\u001b[49m\u001b[43minput_ids\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m    681\u001b[39m \u001b[43m    \u001b[49m\u001b[43mposition_ids\u001b[49m\u001b[43m=\u001b[49m\u001b[43mposition_ids\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m    682\u001b[39m \u001b[43m    \u001b[49m\u001b[43mtoken_type_ids\u001b[49m\u001b[43m=\u001b[49m\u001b[43mtoken_type_ids\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m    683\u001b[39m \u001b[43m    \u001b[49m\u001b[43minputs_embeds\u001b[49m\u001b[43m=\u001b[49m\u001b[43minputs_embeds\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m    684\u001b[39m \u001b[43m    \u001b[49m\u001b[43mpast_key_values_length\u001b[49m\u001b[43m=\u001b[49m\u001b[43mpast_key_values_length\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m    685\u001b[39m \u001b[43m\u001b[49m\u001b[43m)\u001b[49m\n\u001b[32m    687\u001b[39m attention_mask, encoder_attention_mask = \u001b[38;5;28mself\u001b[39m._create_attention_masks(\n\u001b[32m    688\u001b[39m     attention_mask=attention_mask,\n\u001b[32m    689\u001b[39m     encoder_attention_mask=encoder_attention_mask,\n\u001b[32m   (...)\u001b[39m\u001b[32m    693\u001b[39m     past_key_values=past_key_values,\n\u001b[32m    694\u001b[39m )\n\u001b[32m    696\u001b[39m encoder_outputs = \u001b[38;5;28mself\u001b[39m.encoder(\n\u001b[32m    697\u001b[39m     embedding_output,\n\u001b[32m    698\u001b[39m     attention_mask=attention_mask,\n\u001b[32m   (...)\u001b[39m\u001b[32m    705\u001b[39m     **kwargs,\n\u001b[32m    706\u001b[39m )\n",
+      "\u001b[36mFile \u001b[39m\u001b[32m~/miniconda3/envs/ml/lib/python3.11/site-packages/torch/nn/modules/module.py:1736\u001b[39m, in \u001b[36mModule._wrapped_call_impl\u001b[39m\u001b[34m(self, *args, **kwargs)\u001b[39m\n\u001b[32m   1734\u001b[39m     \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28mself\u001b[39m._compiled_call_impl(*args, **kwargs)  \u001b[38;5;66;03m# type: ignore[misc]\u001b[39;00m\n\u001b[32m   1735\u001b[39m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[32m-> \u001b[39m\u001b[32m1736\u001b[39m     \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43mself\u001b[39;49m\u001b[43m.\u001b[49m\u001b[43m_call_impl\u001b[49m\u001b[43m(\u001b[49m\u001b[43m*\u001b[49m\u001b[43margs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43m*\u001b[49m\u001b[43m*\u001b[49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n",
+      "\u001b[36mFile \u001b[39m\u001b[32m~/miniconda3/envs/ml/lib/python3.11/site-packages/torch/nn/modules/module.py:1747\u001b[39m, in \u001b[36mModule._call_impl\u001b[39m\u001b[34m(self, *args, **kwargs)\u001b[39m\n\u001b[32m   1742\u001b[39m \u001b[38;5;66;03m# If we don't have any hooks, we want to skip the rest of the logic in\u001b[39;00m\n\u001b[32m   1743\u001b[39m \u001b[38;5;66;03m# this function, and just call forward.\u001b[39;00m\n\u001b[32m   1744\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m (\u001b[38;5;28mself\u001b[39m._backward_hooks \u001b[38;5;129;01mor\u001b[39;00m \u001b[38;5;28mself\u001b[39m._backward_pre_hooks \u001b[38;5;129;01mor\u001b[39;00m \u001b[38;5;28mself\u001b[39m._forward_hooks \u001b[38;5;129;01mor\u001b[39;00m \u001b[38;5;28mself\u001b[39m._forward_pre_hooks\n\u001b[32m   1745\u001b[39m         \u001b[38;5;129;01mor\u001b[39;00m _global_backward_pre_hooks \u001b[38;5;129;01mor\u001b[39;00m _global_backward_hooks\n\u001b[32m   1746\u001b[39m         \u001b[38;5;129;01mor\u001b[39;00m _global_forward_hooks \u001b[38;5;129;01mor\u001b[39;00m _global_forward_pre_hooks):\n\u001b[32m-> \u001b[39m\u001b[32m1747\u001b[39m     \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mforward_call\u001b[49m\u001b[43m(\u001b[49m\u001b[43m*\u001b[49m\u001b[43margs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43m*\u001b[49m\u001b[43m*\u001b[49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n\u001b[32m   1749\u001b[39m result = \u001b[38;5;28;01mNone\u001b[39;00m\n\u001b[32m   1750\u001b[39m called_always_called_hooks = \u001b[38;5;28mset\u001b[39m()\n",
+      "\u001b[36mFile \u001b[39m\u001b[32m~/miniconda3/envs/ml/lib/python3.11/site-packages/transformers/models/bert/modeling_bert.py:107\u001b[39m, in \u001b[36mBertEmbeddings.forward\u001b[39m\u001b[34m(self, input_ids, token_type_ids, position_ids, inputs_embeds, past_key_values_length)\u001b[39m\n\u001b[32m    104\u001b[39m embeddings = inputs_embeds + token_type_embeddings\n\u001b[32m    106\u001b[39m position_embeddings = \u001b[38;5;28mself\u001b[39m.position_embeddings(position_ids)\n\u001b[32m--> \u001b[39m\u001b[32m107\u001b[39m embeddings = \u001b[43membeddings\u001b[49m\u001b[43m \u001b[49m\u001b[43m+\u001b[49m\u001b[43m \u001b[49m\u001b[43mposition_embeddings\u001b[49m\n\u001b[32m    109\u001b[39m embeddings = \u001b[38;5;28mself\u001b[39m.LayerNorm(embeddings)\n\u001b[32m    110\u001b[39m embeddings = \u001b[38;5;28mself\u001b[39m.dropout(embeddings)\n",
+      "\u001b[31mOutOfMemoryError\u001b[39m: CUDA out of memory. Tried to allocate 768.00 MiB. GPU 0 has a total capacity of 3.68 GiB of which 719.12 MiB is free. Process 2034 has 46.03 MiB memory in use. Process 1961 has 6.78 MiB memory in use. Including non-PyTorch memory, this process has 2.90 GiB memory in use. Of the allocated memory 2.71 GiB is allocated by PyTorch, and 85.13 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)"
+     ]
+    }
+   ],
+   "source": [
+    "from raid import run_detection, run_evaluation\n",
+    "from raid.utils import load_data\n",
+    "\n",
+    "# Use test split and cap sample size for a quick RAID validation\n",
+    "test_df = load_data(split=\"test\")\n",
+    "sample_size = min(int(len(test_df) * 0.02), 512)\n",
+    "test_df = test_df.sample(n=sample_size, random_state=42)\n",
+    "\n",
+    "print(f\"Using {len(test_df)} samples for RAID quick test\")\n",
+    "\n",
+    "\n",
+    "def my_detector(texts: list[str]) -> list[float]:\n",
+    "    # RAID passes a batch/list of strings and expects a list of scores\n",
+    "    inputs = tokenizer(\n",
+    "        texts,\n",
+    "        max_length=512,\n",
+    "        truncation=True,\n",
+    "        padding=True,\n",
+    "        return_tensors=\"pt\",\n",
+    "    ).to(model.device)\n",
+    "\n",
+    "    model.eval()\n",
+    "    with torch.no_grad():\n",
+    "        outputs = model(**inputs)\n",
+    "        logits = outputs.logits\n",
+    "        probabilities = torch.softmax(logits, dim=-1).cpu().numpy()\n",
+    "\n",
+    "    # Return AI-class probability for each input text\n",
+    "    return probabilities[:, 1].astype(float).tolist()\n",
+    "\n",
+    "\n",
+    "predictions = run_detection(my_detector, test_df)\n",
+    "evaluation_result = run_evaluation(predictions, test_df)\n",
+    "\n",
+    "evaluation_result"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "6b6eb543",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "ml",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.6"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}

notebook/ai_vs_human/mainv2.ipynb ADDED Viewed

	@@ -0,0 +1,1170 @@

+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "464eefd0",
+   "metadata": {},
+   "source": [
+    "# AI vs Human Detector V2\n",
+    "This notebook trains a V2 model that explicitly supports short inputs (including sentences under 50 words) and saves artifacts in `v2_model/`."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0be0e8d9",
+   "metadata": {},
+   "source": [
+    "## ✅ Bug Fixes & Capabilities\n",
+    "\n",
+    "**Fixed Issues:**\n",
+    "1. ✅ Runtime error when calling `trainer.evaluate()` after training (removed duplicate evaluation)\n",
+    "2. ✅ Missing `accelerate` dependency (auto-installs if needed)\n",
+    "3. ✅ Recursive dataset loading from `./DATASET/` folder (supports `.jsonl`, `.json`, `.csv`)\n",
+    "4. ✅ Short sentence support (<50 words) with data augmentation\n",
+    "\n",
+    "**Model Capabilities:**\n",
+    "- ✅ Works with **all sentence types**: very short (1-10 words), short (10-50), medium (50-150), long (150+)\n",
+    "- ✅ Handles edge cases: single words, special characters, numbers, mixed formats\n",
+    "- ✅ Batch prediction support\n",
+    "- ✅ Saves to `v2_model/` with tokenizer, config, and label map\n",
+    "- ✅ Can be loaded independently after saving\n",
+    "\n",
+    "**Architecture:** DistilRoBERTa-base (faster, lighter than BERT)\n",
+    "\n",
+    "**Quick Start:**\n",
+    "1. Run cells 1-7 to prepare data\n",
+    "2. Run cell 8 to train (takes ~15-30 min on GPU)\n",
+    "3. Run cell 9 to save to `v2_model/`\n",
+    "4. Run cells 10-12 to test all sentence types"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3a8134db",
+   "metadata": {},
+   "source": [
+    "## Additional Testing: Extreme Edge Cases & Batch Prediction"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "f400f763",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Note: you may need to restart the kernel to use updated packages.\n"
+     ]
+    }
+   ],
+   "source": [
+    "%pip install -q -U datasets evaluate transformers torch pandas scikit-learn accelerate"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "0c3d4d6d",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/home/pujan/miniconda3/envs/ml/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
+      "  from .autonotebook import tqdm as notebook_tqdm\n",
+      "/home/pujan/miniconda3/envs/ml/lib/python3.11/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: 'Could not load this library: /home/pujan/miniconda3/envs/ml/lib/python3.11/site-packages/torchvision/image.so'If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?\n",
+      "  warn(\n",
+      "/home/pujan/miniconda3/envs/ml/lib/python3.11/site-packages/torchvision/datapoints/__init__.py:12: UserWarning: The torchvision.datapoints and torchvision.transforms.v2 namespaces are still Beta. While we do not expect major breaking changes, some APIs may still change according to user feedback. Please submit any feedback you may have in this issue: https://github.com/pytorch/vision/issues/6753, and you can also check out https://github.com/pytorch/vision/issues/7319 to learn more about the APIs that we suspect might involve future changes. You can silence this warning by calling torchvision.disable_beta_transforms_warning().\n",
+      "  warnings.warn(_BETA_TRANSFORMS_WARNING)\n",
+      "/home/pujan/miniconda3/envs/ml/lib/python3.11/site-packages/torchvision/transforms/v2/__init__.py:54: UserWarning: The torchvision.datapoints and torchvision.transforms.v2 namespaces are still Beta. While we do not expect major breaking changes, some APIs may still change according to user feedback. Please submit any feedback you may have in this issue: https://github.com/pytorch/vision/issues/6753, and you can also check out https://github.com/pytorch/vision/issues/7319 to learn more about the APIs that we suspect might involve future changes. You can silence this warning by calling torchvision.disable_beta_transforms_warning().\n",
+      "  warnings.warn(_BETA_TRANSFORMS_WARNING)\n"
+     ]
+    }
+   ],
+   "source": [
+    "from __future__ import annotations\n",
+    "\n",
+    "from dataclasses import dataclass\n",
+    "from functools import partial\n",
+    "from pathlib import Path\n",
+    "import json\n",
+    "import random\n",
+    "\n",
+    "import datasets\n",
+    "from datasets import Dataset, DatasetDict, concatenate_datasets\n",
+    "import evaluate\n",
+    "import numpy as np\n",
+    "import pandas as pd\n",
+    "import torch\n",
+    "from transformers import (\n",
+    "    AutoModelForSequenceClassification,\n",
+    "    AutoTokenizer,\n",
+    "    BatchEncoding,\n",
+    "    DataCollatorWithPadding,\n",
+    "    PreTrainedTokenizer,\n",
+    "    Trainer,\n",
+    "    TrainingArguments,\n",
+    ")\n",
+    "from packaging import version"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "624d23ba",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Base model: distilroberta-base\n",
+      "Device: cuda\n",
+      "Output path: ./v2_model\n"
+     ]
+    }
+   ],
+   "source": [
+    "@dataclass\n",
+    "class V2Config:\n",
+    "    base_model_name: str = \"distilroberta-base\"\n",
+    "    max_samples: int = 20000\n",
+    "    max_length: int = 256\n",
+    "    short_word_limit: int = 50\n",
+    "    short_aug_ratio: float = 0.35\n",
+    "    output_dir: str = \"./v2_model\"\n",
+    "    seed: int = 42\n",
+    "\n",
+    "\n",
+    "cfg = V2Config()\n",
+    "DEVICE = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n",
+    "random.seed(cfg.seed)\n",
+    "np.random.seed(cfg.seed)\n",
+    "torch.manual_seed(cfg.seed)\n",
+    "\n",
+    "print(f\"Base model: {cfg.base_model_name}\")\n",
+    "print(f\"Device: {DEVICE}\")\n",
+    "print(f\"Output path: {cfg.output_dir}\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "0a1f860a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def normalize_text(text: str) -> str:\n",
+    "    return \" \".join(str(text).split()).strip()\n",
+    "\n",
+    "\n",
+    "def count_words(text: str) -> int:\n",
+    "    return len(normalize_text(text).split())\n",
+    "\n",
+    "\n",
+    "def _load_local_file_to_text_labels(file_path: Path) -> tuple[list[str], list[int]]:\n",
+    "    texts: list[str] = []\n",
+    "    labels: list[int] = []\n",
+    "\n",
+    "    try:\n",
+    "        suffix = file_path.suffix.lower()\n",
+    "        if suffix == \".jsonl\":\n",
+    "            df = pd.read_json(file_path, lines=True)\n",
+    "        elif suffix == \".json\":\n",
+    "            df = pd.read_json(file_path)\n",
+    "        elif suffix == \".csv\":\n",
+    "            df = pd.read_csv(file_path)\n",
+    "        else:\n",
+    "            return texts, labels\n",
+    "\n",
+    "        if {\"human_text\", \"ai_text\"}.issubset(df.columns):\n",
+    "            human_texts = [normalize_text(x) for x in df[\"human_text\"].dropna().tolist()]\n",
+    "            ai_texts = [normalize_text(x) for x in df[\"ai_text\"].dropna().tolist()]\n",
+    "            human_texts = [x for x in human_texts if x]\n",
+    "            ai_texts = [x for x in ai_texts if x]\n",
+    "            texts.extend(human_texts)\n",
+    "            labels.extend([0] * len(human_texts))\n",
+    "            texts.extend(ai_texts)\n",
+    "            labels.extend([1] * len(ai_texts))\n",
+    "            return texts, labels\n",
+    "\n",
+    "        # Alternative schema fallback: text + label/ai_gen columns.\n",
+    "        if \"text\" in df.columns and (\"label\" in df.columns or \"ai_gen\" in df.columns):\n",
+    "            label_col = \"label\" if \"label\" in df.columns else \"ai_gen\"\n",
+    "            for _, row in df.iterrows():\n",
+    "                text = normalize_text(row.get(\"text\", \"\"))\n",
+    "                if not text:\n",
+    "                    continue\n",
+    "                val = str(row.get(label_col, \"\")).strip().lower()\n",
+    "                is_ai = val in {\"1\", \"true\", \"ai\", \"ai-generated\", \"ai_generated\"}\n",
+    "                texts.append(text)\n",
+    "                labels.append(1 if is_ai else 0)\n",
+    "            return texts, labels\n",
+    "\n",
+    "    except Exception as error:\n",
+    "        print(f\"Skipped file due to parse error: {file_path} ({error})\")\n",
+    "\n",
+    "    return texts, labels\n",
+    "\n",
+    "\n",
+    "def get_combined_dataset(max_samples: int = 20000, use_local: bool = True) -> DatasetDict:\n",
+    "    all_texts: list[str] = []\n",
+    "    all_labels: list[int] = []\n",
+    "\n",
+    "    try:\n",
+    "        hc3 = datasets.load_dataset(\"Hello-SimpleAI/HC3\", \"all\", split=\"train\")\n",
+    "        for row in hc3:\n",
+    "            for answer in row.get(\"human_answers\", [])[:1]:\n",
+    "                text = normalize_text(answer)\n",
+    "                if text:\n",
+    "                    all_texts.append(text)\n",
+    "                    all_labels.append(0)\n",
+    "            for answer in row.get(\"chatgpt_answers\", [])[:1]:\n",
+    "                text = normalize_text(answer)\n",
+    "                if text:\n",
+    "                    all_texts.append(text)\n",
+    "                    all_labels.append(1)\n",
+    "        print(f\"HC3 samples: {len(all_texts)}\")\n",
+    "    except Exception as error:\n",
+    "        print(f\"HC3 unavailable: {error}\")\n",
+    "\n",
+    "    if use_local:\n",
+    "        dataset_root = Path(\"./DATASET\")\n",
+    "        candidates = list(dataset_root.rglob(\"*.jsonl\")) + list(dataset_root.rglob(\"*.json\")) + list(dataset_root.rglob(\"*.csv\"))\n",
+    "\n",
+    "        local_before = len(all_texts)\n",
+    "        for file_path in candidates:\n",
+    "            texts, labels = _load_local_file_to_text_labels(file_path)\n",
+    "            all_texts.extend(texts)\n",
+    "            all_labels.extend(labels)\n",
+    "\n",
+    "        print(f\"Local recursive files scanned: {len(candidates)}\")\n",
+    "        print(f\"Local samples added: {len(all_texts) - local_before}\")\n",
+    "\n",
+    "    if not all_texts:\n",
+    "        raise ValueError(\"No training data loaded from HC3 or local dataset.\")\n",
+    "\n",
+    "    ds = Dataset.from_dict({\"text\": all_texts, \"label\": all_labels})\n",
+    "    ds = ds.filter(lambda x: x[\"text\"] is not None and len(normalize_text(x[\"text\"])) > 0)\n",
+    "    ds = ds.shuffle(seed=cfg.seed)\n",
+    "    if len(ds) > max_samples:\n",
+    "        ds = ds.select(range(max_samples))\n",
+    "\n",
+    "    split = ds.train_test_split(test_size=0.1, seed=cfg.seed)\n",
+    "    return split\n",
+    "\n",
+    "\n",
+    "def add_short_text_variants(dataset: Dataset, short_word_limit: int = 50, ratio: float = 0.35) -> Dataset:\n",
+    "    short_texts: list[str] = []\n",
+    "    short_labels: list[int] = []\n",
+    "\n",
+    "    for row in dataset:\n",
+    "        text = normalize_text(row[\"text\"])\n",
+    "        label = int(row[\"label\"])\n",
+    "        words = text.split()\n",
+    "\n",
+    "        if len(words) <= short_word_limit:\n",
+    "            if random.random() < ratio:\n",
+    "                short_texts.append(text)\n",
+    "                short_labels.append(label)\n",
+    "            continue\n",
+    "\n",
+    "        # Keep first N words as a short variant to train behavior on short inputs.\n",
+    "        if random.random() < ratio:\n",
+    "            short_text = \" \".join(words[:short_word_limit])\n",
+    "            short_texts.append(short_text)\n",
+    "            short_labels.append(label)\n",
+    "\n",
+    "    if not short_texts:\n",
+    "        return dataset\n",
+    "\n",
+    "    aug = Dataset.from_dict({\"text\": short_texts, \"label\": short_labels})\n",
+    "    return concatenate_datasets([dataset, aug]).shuffle(seed=cfg.seed)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "889c5e58",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "HC3 unavailable: Dataset scripts are no longer supported, but found HC3.py\n",
+      "Skipped file due to parse error: DATASET/test.csv (No columns to parse from file)\n",
+      "Local recursive files scanned: 2\n",
+      "Local samples added: 19940\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "Filter: 100%|██████████| 19940/19940 [00:00<00:00, 133317.22 examples/s]\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Train samples: 24213\n",
+      "Eval samples: 1994\n",
+      "Train short (<50 words): 6839\n",
+      "Eval short (<50 words): 569\n"
+     ]
+    }
+   ],
+   "source": [
+    "raw_data = get_combined_dataset(max_samples=cfg.max_samples)\n",
+    "train_data = add_short_text_variants(\n",
+    "    raw_data[\"train\"],\n",
+    "    short_word_limit=cfg.short_word_limit,\n",
+    "    ratio=cfg.short_aug_ratio,\n",
+    ")\n",
+    "eval_data = raw_data[\"test\"]\n",
+    "\n",
+    "short_train = sum(count_words(t) < 50 for t in train_data[\"text\"])\n",
+    "short_eval = sum(count_words(t) < 50 for t in eval_data[\"text\"])\n",
+    "\n",
+    "print(f\"Train samples: {len(train_data)}\")\n",
+    "print(f\"Eval samples: {len(eval_data)}\")\n",
+    "print(f\"Train short (<50 words): {short_train}\")\n",
+    "print(f\"Eval short (<50 words): {short_eval}\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "e8a2ff3e",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "Loading weights: 100%|██████████| 101/101 [00:00<00:00, 8921.80it/s]\n",
+      "\u001b[1mRobertaForSequenceClassification LOAD REPORT\u001b[0m from: distilroberta-base\n",
+      "Key                         | Status     | \n",
+      "----------------------------+------------+-\n",
+      "roberta.pooler.dense.weight | UNEXPECTED | \n",
+      "lm_head.dense.weight        | UNEXPECTED | \n",
+      "roberta.pooler.dense.bias   | UNEXPECTED | \n",
+      "lm_head.layer_norm.bias     | UNEXPECTED | \n",
+      "lm_head.dense.bias          | UNEXPECTED | \n",
+      "lm_head.layer_norm.weight   | UNEXPECTED | \n",
+      "lm_head.bias                | UNEXPECTED | \n",
+      "classifier.out_proj.bias    | MISSING    | \n",
+      "classifier.dense.weight     | MISSING    | \n",
+      "classifier.dense.bias       | MISSING    | \n",
+      "classifier.out_proj.weight  | MISSING    | \n",
+      "\n",
+      "\u001b[3mNotes:\n",
+      "- UNEXPECTED\u001b[3m\t:can be ignored when loading from different task/architecture; not ok if you expect identical arch.\n",
+      "- MISSING\u001b[3m\t:those params were newly initialized because missing from the checkpoint. Consider training on your downstream task.\u001b[0m\n",
+      "Map: 100%|██████████| 24213/24213 [00:01<00:00, 12285.23 examples/s]\n",
+      "Map: 100%|██████████| 1994/1994 [00:00<00:00, 11737.65 examples/s]\n"
+     ]
+    }
+   ],
+   "source": [
+    "tokenizer = AutoTokenizer.from_pretrained(cfg.base_model_name)\n",
+    "model = AutoModelForSequenceClassification.from_pretrained(cfg.base_model_name, num_labels=2).to(DEVICE)\n",
+    "\n",
+    "\n",
+    "def preprocess_batch(batch: dict, tokenizer: PreTrainedTokenizer, max_length: int = 256) -> BatchEncoding:\n",
+    "    encoded = tokenizer(\n",
+    "        batch[\"text\"],\n",
+    "        truncation=True,\n",
+    "        max_length=max_length,\n",
+    "    )\n",
+    "    encoded[\"labels\"] = batch[\"label\"]\n",
+    "    return encoded\n",
+    "\n",
+    "\n",
+    "tokenize_fn = partial(preprocess_batch, tokenizer=tokenizer, max_length=cfg.max_length)\n",
+    "tokenized_train = train_data.map(tokenize_fn, batched=True, remove_columns=[\"text\", \"label\"])\n",
+    "tokenized_eval = eval_data.map(tokenize_fn, batched=True, remove_columns=[\"text\", \"label\"])\n",
+    "\n",
+    "columns = tokenized_train.column_names\n",
+    "tensor_columns = [name for name in [\"input_ids\", \"attention_mask\", \"token_type_ids\", \"labels\"] if name in columns]\n",
+    "tokenized_train.set_format(type=\"torch\", columns=tensor_columns)\n",
+    "tokenized_eval.set_format(type=\"torch\", columns=tensor_columns)\n",
+    "\n",
+    "metric_accuracy = evaluate.load(\"accuracy\")\n",
+    "metric_f1 = evaluate.load(\"f1\")\n",
+    "\n",
+    "\n",
+    "def compute_metrics(eval_pred: tuple[np.ndarray, np.ndarray]) -> dict[str, float]:\n",
+    "    logits, labels = eval_pred\n",
+    "    if isinstance(logits, tuple):\n",
+    "        logits = logits[0]\n",
+    "    preds = np.argmax(logits, axis=1)\n",
+    "    acc = metric_accuracy.compute(predictions=preds, references=labels)\n",
+    "    f1 = metric_f1.compute(predictions=preds, references=labels)\n",
+    "    return {\"accuracy\": float(acc[\"accuracy\"]), \"f1\": float(f1[\"f1\"])}"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "00f52ac8",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Start training V2 model...\n"
+     ]
+    },
+    {
+     "data": {
+      "text/html": [
+       "\n",
+       "    <div>\n",
+       "      \n",
+       "      <progress value='4542' max='4542' style='width:300px; height:20px; vertical-align: middle;'></progress>\n",
+       "      [4542/4542 20:20, Epoch 3/3]\n",
+       "    </div>\n",
+       "    <table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       " <tr style=\"text-align: left;\">\n",
+       "      <th>Step</th>\n",
+       "      <th>Training Loss</th>\n",
+       "      <th>Validation Loss</th>\n",
+       "      <th>Accuracy</th>\n",
+       "      <th>F1</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <td>200</td>\n",
+       "      <td>0.666410</td>\n",
+       "      <td>0.350684</td>\n",
+       "      <td>0.834504</td>\n",
+       "      <td>0.855390</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>400</td>\n",
+       "      <td>0.598755</td>\n",
+       "      <td>0.256876</td>\n",
+       "      <td>0.897192</td>\n",
+       "      <td>0.904518</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>600</td>\n",
+       "      <td>0.574993</td>\n",
+       "      <td>0.198666</td>\n",
+       "      <td>0.919258</td>\n",
+       "      <td>0.917138</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>800</td>\n",
+       "      <td>0.560090</td>\n",
+       "      <td>0.555182</td>\n",
+       "      <td>0.849047</td>\n",
+       "      <td>0.868040</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>1000</td>\n",
+       "      <td>0.387553</td>\n",
+       "      <td>0.203730</td>\n",
+       "      <td>0.929288</td>\n",
+       "      <td>0.930848</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>1200</td>\n",
+       "      <td>0.411762</td>\n",
+       "      <td>0.521041</td>\n",
+       "      <td>0.849047</td>\n",
+       "      <td>0.868387</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>1400</td>\n",
+       "      <td>0.386610</td>\n",
+       "      <td>0.348940</td>\n",
+       "      <td>0.902708</td>\n",
+       "      <td>0.910434</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>1600</td>\n",
+       "      <td>0.244696</td>\n",
+       "      <td>0.346382</td>\n",
+       "      <td>0.916249</td>\n",
+       "      <td>0.921633</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>1800</td>\n",
+       "      <td>0.223823</td>\n",
+       "      <td>0.308763</td>\n",
+       "      <td>0.924774</td>\n",
+       "      <td>0.928977</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>2000</td>\n",
+       "      <td>0.249242</td>\n",
+       "      <td>0.358467</td>\n",
+       "      <td>0.919258</td>\n",
+       "      <td>0.924307</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>2200</td>\n",
+       "      <td>0.221226</td>\n",
+       "      <td>0.335397</td>\n",
+       "      <td>0.919759</td>\n",
+       "      <td>0.924599</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>2400</td>\n",
+       "      <td>0.221417</td>\n",
+       "      <td>0.587722</td>\n",
+       "      <td>0.882648</td>\n",
+       "      <td>0.894973</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>2600</td>\n",
+       "      <td>0.191291</td>\n",
+       "      <td>0.329566</td>\n",
+       "      <td>0.928285</td>\n",
+       "      <td>0.931677</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>2800</td>\n",
+       "      <td>0.219115</td>\n",
+       "      <td>0.368331</td>\n",
+       "      <td>0.919759</td>\n",
+       "      <td>0.925164</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>3000</td>\n",
+       "      <td>0.308968</td>\n",
+       "      <td>0.277328</td>\n",
+       "      <td>0.931795</td>\n",
+       "      <td>0.934928</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>3200</td>\n",
+       "      <td>0.131352</td>\n",
+       "      <td>0.585112</td>\n",
+       "      <td>0.891174</td>\n",
+       "      <td>0.901854</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>3400</td>\n",
+       "      <td>0.152614</td>\n",
+       "      <td>0.388915</td>\n",
+       "      <td>0.924273</td>\n",
+       "      <td>0.929208</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>3600</td>\n",
+       "      <td>0.145248</td>\n",
+       "      <td>0.439313</td>\n",
+       "      <td>0.921765</td>\n",
+       "      <td>0.926898</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>3800</td>\n",
+       "      <td>0.086042</td>\n",
+       "      <td>0.467167</td>\n",
+       "      <td>0.920762</td>\n",
+       "      <td>0.926099</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>4000</td>\n",
+       "      <td>0.051121</td>\n",
+       "      <td>0.561893</td>\n",
+       "      <td>0.909729</td>\n",
+       "      <td>0.916898</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>4200</td>\n",
+       "      <td>0.141769</td>\n",
+       "      <td>0.477382</td>\n",
+       "      <td>0.920762</td>\n",
+       "      <td>0.926168</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>4400</td>\n",
+       "      <td>0.016825</td>\n",
+       "      <td>0.506922</td>\n",
+       "      <td>0.918255</td>\n",
+       "      <td>0.924151</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table><p>"
+      ],
+      "text/plain": [
+       "<IPython.core.display.HTML object>"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "Writing model shards: 100%|██████████| 1/1 [00:00<00:00,  4.48it/s]\n",
+      "Writing model shards: 100%|██████████| 1/1 [00:00<00:00,  4.84it/s]\n",
+      "Writing model shards: 100%|██████████| 1/1 [00:00<00:00,  4.64it/s]\n",
+      "Writing model shards: 100%|██████████| 1/1 [00:00<00:00,  5.02it/s]\n",
+      "Writing model shards: 100%|██████████| 1/1 [00:00<00:00,  4.96it/s]\n",
+      "Writing model shards: 100%|██████████| 1/1 [00:00<00:00,  5.07it/s]\n",
+      "Writing model shards: 100%|██████████| 1/1 [00:00<00:00,  4.79it/s]\n",
+      "Writing model shards: 100%|██████████| 1/1 [00:00<00:00,  5.02it/s]\n",
+      "Writing model shards: 100%|██████████| 1/1 [00:00<00:00,  4.03it/s]\n",
+      "Writing model shards: 100%|██████████| 1/1 [00:00<00:00,  5.03it/s]\n",
+      "Writing model shards: 100%|██████████| 1/1 [00:00<00:00,  5.00it/s]\n",
+      "Writing model shards: 100%|██████████| 1/1 [00:00<00:00,  5.10it/s]\n",
+      "Writing model shards: 100%|██████████| 1/1 [00:00<00:00,  4.59it/s]\n",
+      "Writing model shards: 100%|██████████| 1/1 [00:00<00:00,  5.23it/s]\n",
+      "Writing model shards: 100%|██████████| 1/1 [00:00<00:00,  5.16it/s]\n",
+      "Writing model shards: 100%|██████████| 1/1 [00:00<00:00,  5.19it/s]\n",
+      "Writing model shards: 100%|██████████| 1/1 [00:00<00:00,  5.14it/s]\n",
+      "Writing model shards: 100%|██████████| 1/1 [00:00<00:00,  5.14it/s]\n",
+      "Writing model shards: 100%|██████████| 1/1 [00:00<00:00,  5.00it/s]\n",
+      "Writing model shards: 100%|██████████| 1/1 [00:00<00:00,  5.21it/s]\n",
+      "Writing model shards: 100%|██████████| 1/1 [00:00<00:00,  5.17it/s]\n",
+      "Writing model shards: 100%|██████████| 1/1 [00:00<00:00,  4.99it/s]\n",
+      "Writing model shards: 100%|██████████| 1/1 [00:00<00:00,  4.22it/s]\n",
+      "There were missing keys in the checkpoint model loaded: ['roberta.embeddings.LayerNorm.weight', 'roberta.embeddings.LayerNorm.bias', 'roberta.encoder.layer.0.attention.output.LayerNorm.weight', 'roberta.encoder.layer.0.attention.output.LayerNorm.bias', 'roberta.encoder.layer.0.output.LayerNorm.weight', 'roberta.encoder.layer.0.output.LayerNorm.bias', 'roberta.encoder.layer.1.attention.output.LayerNorm.weight', 'roberta.encoder.layer.1.attention.output.LayerNorm.bias', 'roberta.encoder.layer.1.output.LayerNorm.weight', 'roberta.encoder.layer.1.output.LayerNorm.bias', 'roberta.encoder.layer.2.attention.output.LayerNorm.weight', 'roberta.encoder.layer.2.attention.output.LayerNorm.bias', 'roberta.encoder.layer.2.output.LayerNorm.weight', 'roberta.encoder.layer.2.output.LayerNorm.bias', 'roberta.encoder.layer.3.attention.output.LayerNorm.weight', 'roberta.encoder.layer.3.attention.output.LayerNorm.bias', 'roberta.encoder.layer.3.output.LayerNorm.weight', 'roberta.encoder.layer.3.output.LayerNorm.bias', 'roberta.encoder.layer.4.attention.output.LayerNorm.weight', 'roberta.encoder.layer.4.attention.output.LayerNorm.bias', 'roberta.encoder.layer.4.output.LayerNorm.weight', 'roberta.encoder.layer.4.output.LayerNorm.bias', 'roberta.encoder.layer.5.attention.output.LayerNorm.weight', 'roberta.encoder.layer.5.attention.output.LayerNorm.bias', 'roberta.encoder.layer.5.output.LayerNorm.weight', 'roberta.encoder.layer.5.output.LayerNorm.bias'].\n",
+      "There were unexpected keys in the checkpoint model loaded: ['roberta.embeddings.LayerNorm.beta', 'roberta.embeddings.LayerNorm.gamma', 'roberta.encoder.layer.0.attention.output.LayerNorm.beta', 'roberta.encoder.layer.0.attention.output.LayerNorm.gamma', 'roberta.encoder.layer.0.output.LayerNorm.beta', 'roberta.encoder.layer.0.output.LayerNorm.gamma', 'roberta.encoder.layer.1.attention.output.LayerNorm.beta', 'roberta.encoder.layer.1.attention.output.LayerNorm.gamma', 'roberta.encoder.layer.1.output.LayerNorm.beta', 'roberta.encoder.layer.1.output.LayerNorm.gamma', 'roberta.encoder.layer.2.attention.output.LayerNorm.beta', 'roberta.encoder.layer.2.attention.output.LayerNorm.gamma', 'roberta.encoder.layer.2.output.LayerNorm.beta', 'roberta.encoder.layer.2.output.LayerNorm.gamma', 'roberta.encoder.layer.3.attention.output.LayerNorm.beta', 'roberta.encoder.layer.3.attention.output.LayerNorm.gamma', 'roberta.encoder.layer.3.output.LayerNorm.beta', 'roberta.encoder.layer.3.output.LayerNorm.gamma', 'roberta.encoder.layer.4.attention.output.LayerNorm.beta', 'roberta.encoder.layer.4.attention.output.LayerNorm.gamma', 'roberta.encoder.layer.4.output.LayerNorm.beta', 'roberta.encoder.layer.4.output.LayerNorm.gamma', 'roberta.encoder.layer.5.attention.output.LayerNorm.beta', 'roberta.encoder.layer.5.attention.output.LayerNorm.gamma', 'roberta.encoder.layer.5.output.LayerNorm.beta', 'roberta.encoder.layer.5.output.LayerNorm.gamma'].\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Final evaluation...\n"
+     ]
+    },
+    {
+     "data": {
+      "text/html": [
+       "\n",
+       "    <div>\n",
+       "      \n",
+       "      <progress value='250' max='250' style='width:300px; height:20px; vertical-align: middle;'></progress>\n",
+       "      [250/250 00:07]\n",
+       "    </div>\n",
+       "    "
+      ],
+      "text/plain": [
+       "<IPython.core.display.HTML object>"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "ename": "RuntimeError",
+     "evalue": "on_train_begin must be called before on_evaluate",
+     "output_type": "error",
+     "traceback": [
+      "\u001b[31m---------------------------------------------------------------------------\u001b[39m",
+      "\u001b[31mRuntimeError\u001b[39m                              Traceback (most recent call last)",
+      "\u001b[36mCell\u001b[39m\u001b[36m \u001b[39m\u001b[32mIn[8]\u001b[39m\u001b[32m, line 55\u001b[39m\n\u001b[32m     52\u001b[39m trainer.train()\n\u001b[32m     54\u001b[39m \u001b[38;5;28mprint\u001b[39m(\u001b[33m\"\u001b[39m\u001b[33mFinal evaluation...\u001b[39m\u001b[33m\"\u001b[39m)\n\u001b[32m---> \u001b[39m\u001b[32m55\u001b[39m eval_result = \u001b[43mtrainer\u001b[49m\u001b[43m.\u001b[49m\u001b[43mevaluate\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\n\u001b[32m     56\u001b[39m \u001b[38;5;28mprint\u001b[39m(json.dumps(eval_result, indent=\u001b[32m2\u001b[39m, default=\u001b[38;5;28mstr\u001b[39m))\n",
+      "\u001b[36mFile \u001b[39m\u001b[32m~/miniconda3/envs/ml/lib/python3.11/site-packages/transformers/trainer.py:2602\u001b[39m, in \u001b[36mTrainer.evaluate\u001b[39m\u001b[34m(self, eval_dataset, ignore_keys, metric_key_prefix)\u001b[39m\n\u001b[32m   2599\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m DebugOption.TPU_METRICS_DEBUG \u001b[38;5;129;01min\u001b[39;00m \u001b[38;5;28mself\u001b[39m.args.debug:\n\u001b[32m   2600\u001b[39m     xm.master_print(met.metrics_report())\n\u001b[32m-> \u001b[39m\u001b[32m2602\u001b[39m \u001b[38;5;28mself\u001b[39m.control = \u001b[38;5;28;43mself\u001b[39;49m\u001b[43m.\u001b[49m\u001b[43mcallback_handler\u001b[49m\u001b[43m.\u001b[49m\u001b[43mon_evaluate\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[43m.\u001b[49m\u001b[43margs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[43m.\u001b[49m\u001b[43mstate\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[43m.\u001b[49m\u001b[43mcontrol\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43moutput\u001b[49m\u001b[43m.\u001b[49m\u001b[43mmetrics\u001b[49m\u001b[43m)\u001b[49m\n\u001b[32m   2604\u001b[39m \u001b[38;5;28mself\u001b[39m._memory_tracker.stop_and_update_metrics(output.metrics)\n\u001b[32m   2606\u001b[39m \u001b[38;5;28;01mreturn\u001b[39;00m output.metrics\n",
+      "\u001b[36mFile \u001b[39m\u001b[32m~/miniconda3/envs/ml/lib/python3.11/site-packages/transformers/trainer_callback.py:524\u001b[39m, in \u001b[36mCallbackHandler.on_evaluate\u001b[39m\u001b[34m(self, args, state, control, metrics)\u001b[39m\n\u001b[32m    522\u001b[39m \u001b[38;5;28;01mdef\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34mon_evaluate\u001b[39m(\u001b[38;5;28mself\u001b[39m, args: TrainingArguments, state: TrainerState, control: TrainerControl, metrics):\n\u001b[32m    523\u001b[39m     control.should_evaluate = \u001b[38;5;28;01mFalse\u001b[39;00m\n\u001b[32m--> \u001b[39m\u001b[32m524\u001b[39m     \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43mself\u001b[39;49m\u001b[43m.\u001b[49m\u001b[43mcall_event\u001b[49m\u001b[43m(\u001b[49m\u001b[33;43m\"\u001b[39;49m\u001b[33;43mon_evaluate\u001b[39;49m\u001b[33;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43margs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mstate\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mcontrol\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mmetrics\u001b[49m\u001b[43m=\u001b[49m\u001b[43mmetrics\u001b[49m\u001b[43m)\u001b[49m\n",
+      "\u001b[36mFile \u001b[39m\u001b[32m~/miniconda3/envs/ml/lib/python3.11/site-packages/transformers/trainer_callback.py:545\u001b[39m, in \u001b[36mCallbackHandler.call_event\u001b[39m\u001b[34m(self, event, args, state, control, **kwargs)\u001b[39m\n\u001b[32m    543\u001b[39m \u001b[38;5;28;01mdef\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34mcall_event\u001b[39m(\u001b[38;5;28mself\u001b[39m, event, args, state, control, **kwargs):\n\u001b[32m    544\u001b[39m     \u001b[38;5;28;01mfor\u001b[39;00m callback \u001b[38;5;129;01min\u001b[39;00m \u001b[38;5;28mself\u001b[39m.callbacks:\n\u001b[32m--> \u001b[39m\u001b[32m545\u001b[39m         result = \u001b[38;5;28;43mgetattr\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43mcallback\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mevent\u001b[49m\u001b[43m)\u001b[49m\u001b[43m(\u001b[49m\n\u001b[32m    546\u001b[39m \u001b[43m            \u001b[49m\u001b[43margs\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m    547\u001b[39m \u001b[43m            \u001b[49m\u001b[43mstate\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m    548\u001b[39m \u001b[43m            \u001b[49m\u001b[43mcontrol\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m    549\u001b[39m \u001b[43m            \u001b[49m\u001b[43mmodel\u001b[49m\u001b[43m=\u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[43m.\u001b[49m\u001b[43mmodel\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m    550\u001b[39m \u001b[43m            \u001b[49m\u001b[43mprocessing_class\u001b[49m\u001b[43m=\u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[43m.\u001b[49m\u001b[43mprocessing_class\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m    551\u001b[39m \u001b[43m            \u001b[49m\u001b[43moptimizer\u001b[49m\u001b[43m=\u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[43m.\u001b[49m\u001b[43moptimizer\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m    552\u001b[39m \u001b[43m            \u001b[49m\u001b[43mlr_scheduler\u001b[49m\u001b[43m=\u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[43m.\u001b[49m\u001b[43mlr_scheduler\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m    553\u001b[39m \u001b[43m            \u001b[49m\u001b[43mtrain_dataloader\u001b[49m\u001b[43m=\u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[43m.\u001b[49m\u001b[43mtrain_dataloader\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m    554\u001b[39m \u001b[43m            \u001b[49m\u001b[43meval_dataloader\u001b[49m\u001b[43m=\u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[43m.\u001b[49m\u001b[43meval_dataloader\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m    555\u001b[39m \u001b[43m            \u001b[49m\u001b[43m*\u001b[49m\u001b[43m*\u001b[49m\u001b[43mkwargs\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m    556\u001b[39m \u001b[43m        \u001b[49m\u001b[43m)\u001b[49m\n\u001b[32m    557\u001b[39m         \u001b[38;5;66;03m# A Callback can skip the return of `control` if it doesn't change it.\u001b[39;00m\n\u001b[32m    558\u001b[39m         \u001b[38;5;28;01mif\u001b[39;00m result \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m:\n",
+      "\u001b[36mFile \u001b[39m\u001b[32m~/miniconda3/envs/ml/lib/python3.11/site-packages/transformers/utils/notebook.py:354\u001b[39m, in \u001b[36mNotebookProgressCallback.on_evaluate\u001b[39m\u001b[34m(self, args, state, control, metrics, **kwargs)\u001b[39m\n\u001b[32m    353\u001b[39m \u001b[38;5;28;01mdef\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34mon_evaluate\u001b[39m(\u001b[38;5;28mself\u001b[39m, args, state, control, metrics=\u001b[38;5;28;01mNone\u001b[39;00m, **kwargs):\n\u001b[32m--> \u001b[39m\u001b[32m354\u001b[39m     tt = \u001b[43m_require\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[43m.\u001b[49m\u001b[43mtraining_tracker\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[33;43m\"\u001b[39;49m\u001b[33;43mon_train_begin must be called before on_evaluate\u001b[39;49m\u001b[33;43m\"\u001b[39;49m\u001b[43m)\u001b[49m\n\u001b[32m    356\u001b[39m     values = {\u001b[33m\"\u001b[39m\u001b[33mTraining Loss\u001b[39m\u001b[33m\"\u001b[39m: \u001b[33m\"\u001b[39m\u001b[33mNo log\u001b[39m\u001b[33m\"\u001b[39m, \u001b[33m\"\u001b[39m\u001b[33mValidation Loss\u001b[39m\u001b[33m\"\u001b[39m: \u001b[33m\"\u001b[39m\u001b[33mNo log\u001b[39m\u001b[33m\"\u001b[39m}\n\u001b[32m    357\u001b[39m     \u001b[38;5;28;01mfor\u001b[39;00m log \u001b[38;5;129;01min\u001b[39;00m \u001b[38;5;28mreversed\u001b[39m(state.log_history):\n",
+      "\u001b[36mFile \u001b[39m\u001b[32m~/miniconda3/envs/ml/lib/python3.11/site-packages/transformers/utils/notebook.py:31\u001b[39m, in \u001b[36m_require\u001b[39m\u001b[34m(x, msg)\u001b[39m\n\u001b[32m     29\u001b[39m \u001b[38;5;28;01mdef\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34m_require\u001b[39m(x: _T | \u001b[38;5;28;01mNone\u001b[39;00m, msg: \u001b[38;5;28mstr\u001b[39m) -> _T:\n\u001b[32m     30\u001b[39m     \u001b[38;5;28;01mif\u001b[39;00m x \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m:\n\u001b[32m---> \u001b[39m\u001b[32m31\u001b[39m         \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mRuntimeError\u001b[39;00m(msg)\n\u001b[32m     32\u001b[39m     \u001b[38;5;28;01mreturn\u001b[39;00m x\n",
+      "\u001b[31mRuntimeError\u001b[39m: on_train_begin must be called before on_evaluate"
+     ]
+    }
+   ],
+   "source": [
+    "import sys\n",
+    "import subprocess\n",
+    "\n",
+    "\n",
+    "def _ensure_accelerate(min_version: str = \"1.1.0\") -> None:\n",
+    "    try:\n",
+    "        import accelerate  # noqa: F401\n",
+    "        from packaging import version\n",
+    "\n",
+    "        if version.parse(accelerate.__version__) < version.parse(min_version):\n",
+    "            raise ImportError(f\"accelerate version too old: {accelerate.__version__}\")\n",
+    "    except Exception:\n",
+    "        print(\"Installing/upgrading accelerate in current kernel environment...\")\n",
+    "        subprocess.check_call([sys.executable, \"-m\", \"pip\", \"install\", \"-q\", f\"accelerate>={min_version}\"])\n",
+    "\n",
+    "\n",
+    "_ensure_accelerate()\n",
+    "\n",
+    "train_args = TrainingArguments(\n",
+    "    output_dir=\"./results/v2-distilroberta\",\n",
+    "    num_train_epochs=3,\n",
+    "    learning_rate=2e-5,\n",
+    "    weight_decay=0.01,\n",
+    "    per_device_train_batch_size=8,\n",
+    "    per_device_eval_batch_size=8,\n",
+    "    gradient_accumulation_steps=2,\n",
+    "    fp16=torch.cuda.is_available(),\n",
+    "    eval_strategy=\"steps\",\n",
+    "    eval_steps=200,\n",
+    "    save_strategy=\"steps\",\n",
+    "    save_steps=200,\n",
+    "    save_total_limit=2,\n",
+    "    logging_steps=50,\n",
+    "    metric_for_best_model=\"eval_f1\",\n",
+    "    load_best_model_at_end=True,\n",
+    "    remove_unused_columns=False,\n",
+    "    report_to=\"none\",\n",
+    ")\n",
+    "\n",
+    "data_collator = DataCollatorWithPadding(tokenizer=tokenizer)\n",
+    "\n",
+    "trainer = Trainer(\n",
+    "    model=model,\n",
+    "    args=train_args,\n",
+    "    train_dataset=tokenized_train,\n",
+    "    eval_dataset=tokenized_eval,\n",
+    "    data_collator=data_collator,\n",
+    "    compute_metrics=compute_metrics,\n",
+    ")\n",
+    "\n",
+    "print(\"Start training V2 model...\")\n",
+    "train_result = trainer.train()\n",
+    "\n",
+    "print(\"\\n✓ Training complete!\")\n",
+    "print(f\"Final training metrics:\")\n",
+    "if hasattr(trainer.state, 'log_history') and trainer.state.log_history:\n",
+    "    # Get the last evaluation metrics from log history\n",
+    "    for log_entry in reversed(trainer.state.log_history):\n",
+    "        if 'eval_loss' in log_entry:\n",
+    "            print(f\"  Eval Loss: {log_entry.get('eval_loss', 'N/A'):.4f}\")\n",
+    "            print(f\"  Eval Accuracy: {log_entry.get('eval_accuracy', 'N/A'):.4f}\")\n",
+    "            print(f\"  Eval F1: {log_entry.get('eval_f1', 'N/A'):.4f}\")\n",
+    "            break"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "1b601515",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "Writing model shards: 100%|██████████| 1/1 [00:00<00:00,  4.29it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Saved V2 model to: /mnt/linux-data/Work/aiapi/notebook/ai_vs_human/v2_model\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "save_dir = Path(cfg.output_dir)\n",
+    "save_dir.mkdir(parents=True, exist_ok=True)\n",
+    "trainer.save_model(str(save_dir))\n",
+    "tokenizer.save_pretrained(str(save_dir))\n",
+    "\n",
+    "label_map = {\"0\": \"human\", \"1\": \"ai\"}\n",
+    "(save_dir / \"label_map.json\").write_text(json.dumps(label_map, indent=2), encoding=\"utf-8\")\n",
+    "\n",
+    "print(f\"Saved V2 model to: {save_dir.resolve()}\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "id": "93f0e5a0",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "================================================================================\n",
+      "COMPREHENSIVE TEST: All Sentence Types\n",
+      "================================================================================\n",
+      "\n",
+      "1. VERY SHORT SENTENCES (< 10 words):\n",
+      "  [2 words] human: Hello world.\n",
+      "  [3 words] human: AI is powerful.\n",
+      "  [3 words] human: I like coding.\n",
+      "  [4 words] human: Machine learning works well.\n",
+      "\n",
+      "2. SHORT SENTENCES (10-50 words):\n",
+      "  [10 words] human: AI writes fast, but humans add personal experience and emoti...\n",
+      "  [14 words] human: I woke up late, missed the bus, and ran all the way to class...\n",
+      "  [11 words] human: This response was generated by a language model in one pass....\n",
+      "  [17 words] human: The field of data science combines statistics, programming, ...\n",
+      "\n",
+      "3. MEDIUM SENTENCES (50-150 words):\n",
+      "  [74 words] human: Artificial intelligence systems can process massive amounts ...\n",
+      "  [87 words] human: I once tried to learn guitar in a single weekend because I t...\n",
+      "\n",
+      "4. LONG SENTENCES (150+ words):\n",
+      "  [153 words] ai: Machine learning represents a subset of artificial intellige...\n",
+      "\n",
+      "5. EDGE CASES:\n",
+      "  [1 words] human: 'A'\n",
+      "  [4 words] human: 'This is a test.'\n",
+      "  [4 words] human: 'Multiple   spaces   between   words'\n",
+      "\n",
+      "================================================================================\n",
+      "✓ All sentence types tested successfully!\n",
+      "================================================================================\n"
+     ]
+    }
+   ],
+   "source": [
+    "def predict_v2(text: str) -> dict[str, float | int | str]:\n",
+    "    \"\"\"Predict whether text is AI or human-written. Works for all sentence lengths.\"\"\"\n",
+    "    cleaned = normalize_text(text)\n",
+    "    if not cleaned:\n",
+    "        raise ValueError(\"Input text is empty.\")\n",
+    "\n",
+    "    inputs = tokenizer(\n",
+    "        cleaned,\n",
+    "        truncation=True,\n",
+    "        max_length=cfg.max_length,\n",
+    "        return_tensors=\"pt\",\n",
+    "    ).to(model.device)\n",
+    "\n",
+    "    model.eval()\n",
+    "    with torch.no_grad():\n",
+    "        logits = model(**inputs).logits\n",
+    "        probs = torch.softmax(logits, dim=-1).cpu().numpy()[0]\n",
+    "\n",
+    "    pred = int(np.argmax(probs))\n",
+    "    wc = count_words(cleaned)\n",
+    "\n",
+    "    return {\n",
+    "        \"text\": cleaned,\n",
+    "        \"word_count\": wc,\n",
+    "        \"predicted_label\": pred,\n",
+    "        \"predicted_name\": \"ai\" if pred == 1 else \"human\",\n",
+    "        \"probability_human\": float(probs[0]),\n",
+    "        \"probability_ai\": float(probs[1]),\n",
+    "        \"short_text\": wc < 50,\n",
+    "    }\n",
+    "\n",
+    "\n",
+    "print(\"=\" * 80)\n",
+    "print(\"COMPREHENSIVE TEST: All Sentence Types\")\n",
+    "print(\"=\" * 80)\n",
+    "\n",
+    "# Test 1: Very short sentences (under 10 words)\n",
+    "print(\"\\n1. VERY SHORT SENTENCES (< 10 words):\")\n",
+    "very_short = [\n",
+    "    \"Hello world.\",\n",
+    "    \"AI is powerful.\",\n",
+    "    \"I like coding.\",\n",
+    "    \"Machine learning works well.\",\n",
+    "]\n",
+    "for text in very_short:\n",
+    "    result = predict_v2(text)\n",
+    "    print(f\"  [{result['word_count']} words] {result['predicted_name']}: {text[:60]}\")\n",
+    "\n",
+    "# Test 2: Short sentences (10-50 words)\n",
+    "print(\"\\n2. SHORT SENTENCES (10-50 words):\")\n",
+    "short_examples = [\n",
+    "    \"AI writes fast, but humans add personal experience and emotion.\",\n",
+    "    \"I woke up late, missed the bus, and ran all the way to class.\",\n",
+    "    \"This response was generated by a language model in one pass.\",\n",
+    "    \"The field of data science combines statistics, programming, and domain knowledge to extract meaningful insights from data.\",\n",
+    "]\n",
+    "for text in short_examples:\n",
+    "    result = predict_v2(text)\n",
+    "    print(f\"  [{result['word_count']} words] {result['predicted_name']}: {text[:60]}...\")\n",
+    "\n",
+    "# Test 3: Medium sentences (50-150 words)\n",
+    "print(\"\\n3. MEDIUM SENTENCES (50-150 words):\")\n",
+    "medium_examples = [\n",
+    "    \"Artificial intelligence systems can process massive amounts of data extremely quickly compared to humans. They are designed to analyze large datasets, identify patterns, and extract useful insights within seconds or minutes. Using advanced algorithms and machine learning models, AI systems can examine structured and unstructured data such as text, images, audio, and numerical information. By learning from historical data, these systems can recognize complex relationships between variables and make accurate predictions about future outcomes.\",\n",
+    "    \"I once tried to learn guitar in a single weekend because I thought it would be easy. Turns out my fingers had other plans. After two hours of awkward chords and random noises, I realized that music requires patience, practice, and a lot more discipline than I originally expected. My friends laughed when they heard me trying to play, but I kept practicing anyway because I genuinely wanted to improve. Eventually, after weeks of consistent effort, I could finally play a simple song from start to finish.\",\n",
+    "]\n",
+    "for text in medium_examples:\n",
+    "    result = predict_v2(text)\n",
+    "    print(f\"  [{result['word_count']} words] {result['predicted_name']}: {text[:60]}...\")\n",
+    "\n",
+    "# Test 4: Long sentences (150+ words)\n",
+    "print(\"\\n4. LONG SENTENCES (150+ words):\")\n",
+    "long_examples = [\n",
+    "    \"Machine learning represents a subset of artificial intelligence that enables computer systems to automatically learn and improve from experience without being explicitly programmed for every single task. The fundamental idea behind machine learning is to develop algorithms that can receive input data and use statistical analysis to predict an output while updating outputs as new data becomes available. This field has grown exponentially over the past few decades, driven by increases in computational power, the availability of large datasets, and breakthroughs in algorithmic approaches. Modern machine learning systems power everything from recommendation engines on streaming platforms to autonomous vehicles, medical diagnosis tools, and natural language processing applications. The three main categories of machine learning include supervised learning, where models are trained on labeled data; unsupervised learning, where patterns are discovered in unlabeled data; and reinforcement learning, where agents learn to make decisions by receiving rewards or penalties for their actions in an environment.\",\n",
+    "]\n",
+    "for text in long_examples:\n",
+    "    result = predict_v2(text)\n",
+    "    print(f\"  [{result['word_count']} words] {result['predicted_name']}: {text[:60]}...\")\n",
+    "\n",
+    "# Test 5: Edge cases\n",
+    "print(\"\\n5. EDGE CASES:\")\n",
+    "edge_cases = [\n",
+    "    \"A\",  # Single word\n",
+    "    \"This is a test.\",  # Very basic\n",
+    "    \"   Multiple   spaces   between   words   \",  # Extra whitespace\n",
+    "]\n",
+    "for text in edge_cases:\n",
+    "    try:\n",
+    "        result = predict_v2(text)\n",
+    "        print(f\"  [{result['word_count']} words] {result['predicted_name']}: '{text.strip()}'\")\n",
+    "    except Exception as e:\n",
+    "        print(f\"  ERROR: {text.strip()[:30]} - {str(e)}\")\n",
+    "\n",
+    "print(\"\\n\" + \"=\" * 80)\n",
+    "print(\"✓ All sentence types tested successfully!\")\n",
+    "print(\"=\" * 80)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "id": "98ef7c7d",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "================================================================================\n",
+      "TESTING SAVED V2 MODEL FROM DISK\n",
+      "================================================================================\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "Loading weights: 100%|██████████| 105/105 [00:00<00:00, 8556.64it/s]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\n",
+      "✓ Loaded model from: v2_model\n",
+      "\n",
+      "Running inference tests:\n",
+      "  [very short  ] human  (AI: 0.50%): Hi there!\n",
+      "  [short       ] human  (AI: 0.09%): I love programming and building cool projects.\n",
+      "  [medium      ] human  (AI: 3.09%): Artificial intelligence has revolutionized many in\n",
+      "\n",
+      "✓ Saved model works correctly for all sentence types!\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Load and test the saved v2_model independently\n",
+    "print(\"=\" * 80)\n",
+    "print(\"TESTING SAVED V2 MODEL FROM DISK\")\n",
+    "print(\"=\" * 80)\n",
+    "\n",
+    "saved_model_path = Path(cfg.output_dir)\n",
+    "if saved_model_path.exists():\n",
+    "    # Load fresh model and tokenizer from saved checkpoint\n",
+    "    saved_tokenizer = AutoTokenizer.from_pretrained(str(saved_model_path))\n",
+    "    saved_model = AutoModelForSequenceClassification.from_pretrained(str(saved_model_path)).to(DEVICE)\n",
+    "    \n",
+    "    print(f\"\\n✓ Loaded model from: {saved_model_path}\")\n",
+    "    \n",
+    "    # Test with diverse examples\n",
+    "    test_cases = [\n",
+    "        (\"Hi there!\", \"very short\"),\n",
+    "        (\"I love programming and building cool projects.\", \"short\"),\n",
+    "        (\"Artificial intelligence has revolutionized many industries by enabling automation, improving decision-making, and creating new opportunities for innovation.\", \"medium\"),\n",
+    "    ]\n",
+    "    \n",
+    "    print(\"\\nRunning inference tests:\")\n",
+    "    for text, category in test_cases:\n",
+    "        inputs = saved_tokenizer(text, truncation=True, max_length=256, return_tensors=\"pt\").to(DEVICE)\n",
+    "        saved_model.eval()\n",
+    "        with torch.no_grad():\n",
+    "            logits = saved_model(**inputs).logits\n",
+    "            probs = torch.softmax(logits, dim=-1).cpu().numpy()[0]\n",
+    "            pred_label = int(np.argmax(probs))\n",
+    "            pred_name = \"ai\" if pred_label == 1 else \"human\"\n",
+    "        \n",
+    "        print(f\"  [{category:12}] {pred_name:6} (AI: {probs[1]:.2%}): {text[:50]}\")\n",
+    "    \n",
+    "    print(\"\\n✓ Saved model works correctly for all sentence types!\")\n",
+    "else:\n",
+    "    print(f\"⚠ Model not found at: {saved_model_path}\")\n",
+    "    print(\"  Run the save cell first to create v2_model/\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "id": "2f63e591",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "================================================================================\n",
+      "EXTREME EDGE CASE TESTING\n",
+      "================================================================================\n",
+      "\n",
+      "Testing extreme edge cases:\n",
+      "  ✓ Single character     [ 1w] human  (99.3%): 'A'\n",
+      "  ✓ Single word          [ 1w] human  (99.4%): 'Hello'\n",
+      "  ✓ Two words            [ 2w] human  (99.6%): 'Hello world'\n",
+      "  ✓ Numbers only         [ 3w] human  (98.7%): '123 456 789'\n",
+      "  ✓ Special chars        [ 4w] human  (99.8%): '!!! ### $$$ ???'\n",
+      "  ✓ Mixed alphanumeric   [ 3w] human  (99.3%): 'Test123 ABC456 xyz789'\n",
+      "  ✓ Very long word       [ 1w] human  (99.1%): 'supercalifragilisticexpialidocious'\n",
+      "  ✓ Repeated words       [ 5w] human  (99.6%): 'test test test test test'\n",
+      "  ✓ Newlines             [ 6w] human  (99.4%): 'Line one\\nLine two\\nLine three'\n",
+      "  ✓ Tabs                 [ 3w] human  (99.5%): 'Col1\\tCol2\\tCol3'\n",
+      "  ✓ Multiple spaces      [ 3w] human  (99.7%): 'Too     many      spaces'\n",
+      "  ✓ Punctuation heavy    [ 5w] human  (99.8%): 'Wow! Really? Yes! No... Maybe?'\n",
+      "  ✗ Empty-like           ERROR: Input text is empty.\n",
+      "  ✓ Mixed case           [ 5w] human  (99.3%): 'ThIs Is MiXeD cAsE tExT'\n",
+      "  ✓ All caps             [ 4w] human  (99.3%): 'THIS IS ALL CAPITALS'\n",
+      "  ✓ All lower            [ 4w] human  (99.9%): 'this is all lowercase'\n",
+      "\n",
+      "Result: 15 passed, 1 failed\n",
+      "\n",
+      "================================================================================\n",
+      "BATCH PREDICTION TEST\n",
+      "================================================================================\n",
+      "\n",
+      "Predicting batch of mixed-length sentences:\n",
+      "\n",
+      "  Sentence 1 (1 words):\n",
+      "    Text: Short....\n",
+      "    Prediction: human\n",
+      "    Confidence: AI=0.1%, Human=99.9%\n",
+      "\n",
+      "  Sentence 2 (9 words):\n",
+      "    Text: This is a medium length sentence with some content....\n",
+      "    Prediction: human\n",
+      "    Confidence: AI=0.1%, Human=99.9%\n",
+      "\n",
+      "  Sentence 3 (29 words):\n",
+      "    Text: This is a longer sentence that contains more words and provi...\n",
+      "    Prediction: human\n",
+      "    Confidence: AI=0.1%, Human=99.9%\n",
+      "\n",
+      "================================================================================\n",
+      "✓ ALL EDGE CASES AND BATCH TESTS COMPLETE!\n",
+      "================================================================================\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(\"=\" * 80)\n",
+    "print(\"EXTREME EDGE CASE TESTING\")\n",
+    "print(\"=\" * 80)\n",
+    "\n",
+    "# Test various edge cases that might break the model\n",
+    "edge_test_cases = {\n",
+    "    \"Single character\": \"A\",\n",
+    "    \"Single word\": \"Hello\",\n",
+    "    \"Two words\": \"Hello world\",\n",
+    "    \"Numbers only\": \"123 456 789\",\n",
+    "    \"Special chars\": \"!!! ### $$$ ???\",\n",
+    "    \"Mixed alphanumeric\": \"Test123 ABC456 xyz789\",\n",
+    "    \"Very long word\": \"supercalifragilisticexpialidocious\",\n",
+    "    \"Repeated words\": \"test test test test test\",\n",
+    "    \"Newlines\": \"Line one\\nLine two\\nLine three\",\n",
+    "    \"Tabs\": \"Col1\\tCol2\\tCol3\",\n",
+    "    \"Multiple spaces\": \"Too     many      spaces\",\n",
+    "    \"Punctuation heavy\": \"Wow! Really? Yes! No... Maybe?\",\n",
+    "    \"Empty-like\": \"   \",\n",
+    "    \"Mixed case\": \"ThIs Is MiXeD cAsE tExT\",\n",
+    "    \"All caps\": \"THIS IS ALL CAPITALS\",\n",
+    "    \"All lower\": \"this is all lowercase\",\n",
+    "}\n",
+    "\n",
+    "print(\"\\nTesting extreme edge cases:\")\n",
+    "passed = 0\n",
+    "failed = 0\n",
+    "\n",
+    "for case_name, text in edge_test_cases.items():\n",
+    "    try:\n",
+    "        result = predict_v2(text)\n",
+    "        wc = result['word_count']\n",
+    "        pred = result['predicted_name']\n",
+    "        conf = result['probability_ai'] if pred == 'ai' else result['probability_human']\n",
+    "        \n",
+    "        # Handle display of text with special characters\n",
+    "        display_text = text.replace('\\n', '\\\\n').replace('\\t', '\\\\t')[:40]\n",
+    "        print(f\"  ✓ {case_name:20} [{wc:2}w] {pred:6} ({conf:.1%}): '{display_text}'\")\n",
+    "        passed += 1\n",
+    "    except Exception as e:\n",
+    "        print(f\"  ✗ {case_name:20} ERROR: {str(e)[:50]}\")\n",
+    "        failed += 1\n",
+    "\n",
+    "print(f\"\\nResult: {passed} passed, {failed} failed\")\n",
+    "\n",
+    "# Batch prediction test\n",
+    "print(\"\\n\" + \"=\" * 80)\n",
+    "print(\"BATCH PREDICTION TEST\")\n",
+    "print(\"=\" * 80)\n",
+    "\n",
+    "batch_texts = [\n",
+    "    \"Short.\",\n",
+    "    \"This is a medium length sentence with some content.\",\n",
+    "    \"This is a longer sentence that contains more words and provides more context for the model to analyze and make predictions based on the patterns it learned during training.\",\n",
+    "]\n",
+    "\n",
+    "print(\"\\nPredicting batch of mixed-length sentences:\")\n",
+    "batch_results = [predict_v2(text) for text in batch_texts]\n",
+    "\n",
+    "for i, (text, result) in enumerate(zip(batch_texts, batch_results), 1):\n",
+    "    print(f\"\\n  Sentence {i} ({result['word_count']} words):\")\n",
+    "    print(f\"    Text: {text[:60]}...\")\n",
+    "    print(f\"    Prediction: {result['predicted_name']}\")\n",
+    "    print(f\"    Confidence: AI={result['probability_ai']:.1%}, Human={result['probability_human']:.1%}\")\n",
+    "\n",
+    "print(\"\\n\" + \"=\" * 80)\n",
+    "print(\"✓ ALL EDGE CASES AND BATCH TESTS COMPLETE!\")\n",
+    "print(\"=\" * 80)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "ml",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.14"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}

notebook/ai_vs_human_nepali/notebook/documentation.md ADDED Viewed

	@@ -0,0 +1,435 @@

+# Nepali AI vs Human Notebook Documentation
+This folder contains a small notebook series for building an AI-vs-human text detector for Nepali text. The notebooks are not identical copies; they represent the evolution of the project from a lightweight scikit-learn baseline to a stronger hybrid model and a transformer-based experiment.
+## Notebook Inventory
+The notebooks in this directory are:
+- [main.ipynb](main.ipynb)
+- [working model.ipynb](working%20model.ipynb)
+- [Nepali_Ai_vs_Human.ipynb](Nepali_Ai_vs_Human.ipynb)
+- [final_main.ipynb](final_main.ipynb)
+## Shared Goal
+All notebooks solve the same binary classification task:
+- Class 0 = Human-written Nepali text
+- Class 1 = AI-generated Nepali text
+The notebooks differ in how they prepare the data, which features they extract, and which model family they train.
+## Shared Data Sources
+Across the notebooks, the dataset is built from one or more CSV files under the notebook dataset folders. The common column pattern is:
+- human_text
+- ai_generated_text
+Some notebooks also use:
+- title
+- label
+- paragraph
+The data preparation usually performs some combination of:
+- dropping null rows
+- stripping whitespace
+- removing duplicates
+- converting two source columns into one text column plus one label column
+- balancing classes by sampling
+- splitting long texts into smaller chunks
+## Notebook Relationship
+The notebooks form a progression:
+1. main.ipynb is the first lightweight sklearn baseline.
+2. working model.ipynb refines the baseline with better text chunking.
+3. Nepali_Ai_vs_Human.ipynb switches to a transformer-style neural model.
+4. final_main.ipynb is the most complete hybrid notebook and is the closest thing to a production workflow.
+## main.ipynb
+### Purpose
+This is the earliest baseline notebook. It focuses on a CPU-friendly approach using TF-IDF plus hand-crafted text features, then compares several classic machine learning models.
+### Data Preparation
+The notebook loads several CSV files and concatenates them into one dataframe. The data is drawn from:
+- ../DATASET/data.csv
+- ../DATASET/new_data.csv
+- /mnt/linux-data/Work/aiapi/notebook/ai_vs_human_nepali/news_scrap_new2.fixed.csv
+The notebook creates separate cleaned columns for human text and AI text, then stacks them into a single training dataframe with labels.
+Important preprocessing steps:
+- remove URLs
+- keep only Nepali Unicode characters and whitespace
+- lowercase the text
+- remove consecutive repeated words
+### Feature Engineering
+The notebook combines two feature families:
+- Word-level TF-IDF with 1-2 gram features
+- Dense, hand-crafted features based on text structure
+The hand-crafted features include:
+- burstiness statistics from sentence lengths
+- average word length
+- average sentence length
+- lexical diversity
+- punctuation ratio
+- repeated bigram ratio
+- Devanagari diacritic density
+The sparse TF-IDF matrix is concatenated with the dense feature matrix using horizontal stacking.
+### Models Trained
+The notebook compares several standard classifiers:
+- LogisticRegressionCV
+- RidgeClassifierCV
+- MultinomialNB
+- BernoulliNB
+- RandomForestClassifier
+- GradientBoostingClassifier
+- LinearSVC
+- KNeighborsClassifier
+Dense conversion is applied only where needed, such as for LinearSVC and KNeighbors.
+### Evaluation
+The notebook evaluates the models with:
+- validation accuracy
+- weighted F1 score
+- classification reports
+- confusion matrices
+- ROC curves
+The top models are selected by validation accuracy and re-used in later prediction cells.
+### Prediction Demo
+The notebook includes several sample Nepali texts for inference. It prints per-model predictions and, where possible, confidence values.
+### Saved Artifacts
+Each model is saved as a pickle file in a local saved_models directory.
+### Known Issues
+- Several cells are duplicated, especially the dataset loading cells.
+- The vectorizer and the feature builder are not saved with the models, so full reloading is incomplete.
+- There are repeated prediction sections, which makes the notebook harder to maintain.
+- Some cells appear to be placeholders or empty.
+## working model.ipynb
+### Purpose
+This notebook is a refinement of main.ipynb. It keeps the same overall classifier strategy but improves how long Nepali articles are handled.
+### Main Difference From main.ipynb
+The key improvement is sentence chunking:
+- long texts are split into smaller chunks
+- chunk boundaries prefer Nepali danda punctuation
+- each chunk is limited to a small number of sentences or words
+This makes the dataset more granular and helps the classifier train on smaller, more uniform samples.
+### Preprocessing
+The notebook defines:
+- clean_text
+- remove_auto_repeating
+- split_into_sentence_chunks
+- expand_texts_to_chunks
+These functions preserve sentence punctuation for chunking, then normalize the cleaned chunks for downstream training.
+### Feature Engineering and Models
+The rest of the pipeline is essentially the same as main.ipynb:
+- TF-IDF word n-grams
+- burstiness and stylometric features
+- concatenated sparse + dense representation
+- the same family of sklearn classifiers
+### Evaluation and Inference
+The notebook follows the same model comparison, ROC plotting, confusion matrix plotting, and sample prediction pattern as the baseline notebook.
+### Saved Artifacts
+Like main.ipynb, the fitted sklearn models are stored under saved_models as individual pickle files.
+### Known Issues
+- The notebook has redundant cells and repeated code blocks.
+- It still does not serialize the vectorizer and feature transformer together with the model artifacts.
+- Some prediction logic is repeated more than once.
+## Nepali_Ai_vs_Human.ipynb
+### Purpose
+This notebook is the deep learning branch of the project. Instead of hand-crafted features plus classical classifiers, it uses a transformer encoder with a classification head.
+### Data Preparation
+The notebook reads one CSV file and converts the two-column source format into a single text-label dataframe.
+Important preparation steps:
+- validate required columns
+- drop nulls
+- build a unified dataframe with text and label
+- filter short texts
+- drop duplicate text rows
+- shuffle the dataset
+The notebook keeps the raw text mostly intact rather than applying aggressive regex cleaning.
+### Model Architecture
+The model pipeline is built around Hugging Face transformers and PyTorch:
+- tokenizer from a multilingual BERT-style model
+- AutoModel backbone
+- classification head with dropout
+- binary output layer
+The notebook defines a custom PyTorch module named IndicBERTClassifier.
+### Training Setup
+The notebook uses:
+- train/validation split with stratification
+- DataLoader-based batching
+- AdamW optimizer
+- cross-entropy loss
+- linear warmup scheduler
+- gradient accumulation
+- mixed precision when CUDA is available
+- early stopping on validation F1
+This makes it more GPU-oriented than the sklearn notebooks.
+### Evaluation
+Per-epoch evaluation includes:
+- accuracy
+- F1 score
+- classification report
+The notebook also saves improved checkpoints when validation F1 improves.
+### Prediction Demo
+The notebook defines a predict function that:
+- tokenizes the input text
+- runs the transformer model
+- applies softmax
+- returns the predicted class and confidence
+Several sample Nepali sentences are passed through the predictor at the end of the notebook.
+### Saved Artifacts
+The notebook saves:
+- model_best.pth
+- model_latest.pth
+- tokenizer files in ./nepali_xlmr_classifier
+There is also a Colab-oriented zip export section.
+### Known Issues
+- The notebook mixes local notebook execution with Colab-specific code.
+- Some cells show CUDA or environment-related warnings.
+- The training flow is more complex and less polished than the final hybrid notebook.
+- Paths are hard-coded in a few places.
+## final_main.ipynb
+### Purpose
+This is the most complete notebook in the folder. It combines semantic embeddings from Sentence Transformers with stylometric features, then trains a linear model and an XGBoost model on the fused feature vector.
+### Data Preparation
+The notebook reads the dataset from:
+- ../DATASET/Final_data/final_news345.csv
+- /mnt/linux-data/Work/aiapi/notebook/ai_vs_human_nepali/Final_data/final_news345.csv
+The notebook expects a label column with string values and maps them to binary classes.
+It also includes a preprocessing utility that can:
+- split very long Nepali texts into chunks
+- preserve danda-based sentence boundaries
+- filter out extremely short chunks
+- balance the dataset by sampling each class to the same count
+### Visualization
+The notebook includes exploratory plots for:
+- class distribution
+- character count distribution
+- word count distribution
+- sentence count distribution
+- cleaned text length distribution
+- stylometric feature comparison plots
+This makes it the most documented and inspection-friendly notebook in the folder.
+### Text Cleaning
+The notebook defines clean_nepali_text, which:
+- lowercases the text
+- normalizes Nepali and common Unicode punctuation
+- removes unwanted characters
+- collapses repeated whitespace
+- trims the result
+This cleaned text is used for both embeddings and stylometric extraction.
+### Stylometric Features
+The notebook uses six hand-crafted features:
+- word_count
+- sentence_count
+- avg_word_length
+- avg_sentence_length
+- type_token_ratio
+- punctuation_ratio
+These features are extracted from the cleaned text and then standardized with StandardScaler.
+### Semantic Embeddings
+The notebook uses the Sentence Transformers model:
+- sentence-transformers/paraphrase-multilingual-mpnet-base-v2
+This produces 768-dimensional multilingual sentence embeddings. The notebook loads the embedder on CPU to reduce CUDA memory pressure.
+### Feature Fusion
+The final feature matrix is built by concatenating:
+- 768 embedding dimensions
+- 6 scaled stylometric dimensions
+So each sample becomes a 774-dimensional vector.
+### Models Trained
+Two models are trained on the fused features:
+- Logistic Regression
+- XGBoost
+XGBoost is configured with class imbalance handling through scale_pos_weight.
+### Evaluation
+The notebook evaluates both models using:
+- accuracy
+- precision
+- recall
+- F1 score
+- confusion matrices
+- ROC curves and AUC
+It also computes and visualizes XGBoost feature importance.
+### Prediction Flow
+The prediction function follows this exact sequence:
+1. clean the input
+2. extract stylometric features
+3. build the sentence embedding
+4. scale the stylometric vector
+5. concatenate the two feature blocks
+6. predict with XGBoost
+The function returns a dictionary containing the label, numeric class id, and probability.
+### Saved Artifacts
+The notebook saves a joblib bundle at:
+- ../models/ai_text_detector_model.pkl
+The saved artifact includes:
+- xgb_model
+- lr_model
+- scaler
+- embed_model name string
+- stylo_cols
+- label_map
+### Known Issues
+- The XGBoost fit call uses the test set as an eval_set, which is acceptable for monitoring but not ideal if you want strict validation separation.
+- The embedding model name is saved, but the embedder itself is not serialized.
+- The notebook is the strongest production candidate, but it still lacks a separate load-and-predict helper for end users.
+## Comparison Summary
+| Notebook | Main Approach | Strength | Weakness |
+|---|---|---|---|
+| main.ipynb | TF-IDF + stylometry + classic ML | Simple baseline, easy to inspect | Repetitive and not fully serializable |
+| working model.ipynb | TF-IDF + stylometry + chunking | Better handling of long text | Still mostly a baseline notebook |
+| Nepali_Ai_vs_Human.ipynb | Transformer classifier | Strong semantic modeling | Heavier, more environment-sensitive |
+| final_main.ipynb | Sentence embeddings + stylometry + XGBoost | Best balance of performance, clarity, and deployability | Uses a saved model name string instead of serializing the embedder |
+## Recommended Reading Order
+If you want to understand the project evolution, read the notebooks in this order:
+1. main.ipynb
+2. working model.ipynb
+3. Nepali_Ai_vs_Human.ipynb
+4. final_main.ipynb
+If you only want the most useful notebook for reuse or deployment, start with final_main.ipynb.
+## Practical Notes
+- Several notebooks contain duplicated or stale cells from experimentation.
+- Not every cell has been executed successfully.
+- Paths are sometimes hard-coded for the local workspace, so moving the folder may require path cleanup.
+- The project alternates between three styles of modeling: classical sklearn, transformer fine-tuning, and hybrid embedding-based classification.
+## Suggested Next Step
+If you want, the next useful document to add is an inference guide that explains how to load the saved model bundle from final_main.ipynb and run predictions on new Nepali text.