Pankaj001
/

Opensource_attack_examples

Model card Files Files and versions

xet

Community

Pankaj001 commited on Jul 31, 2024

Commit

d35af4f

verified ·

1 Parent(s): 3e33b5d

Upload txt_attk.ipynb

Browse files

generate attack vector for text classification model (NLP)

Files changed (1) hide show

txt_attk.ipynb +588 -0

txt_attk.ipynb ADDED Viewed

	@@ -0,0 +1,588 @@

+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "ea969734-6e63-4b44-ac0f-8442f785616a",
+   "metadata": {},
+   "source": [
+    "# Text-Attack example\n",
+    "\n",
+    "The script demonstrates a simple example of using Text-Attack with TensorFlow v2.x. The example train a small model on the IMDB\n",
+    "dataset. Here we use the Text-Attack to create the Adversial example, it would also be possible to provide a pretrained model to the Text-Attack.\n",
+    "The parameters are chosen for reduced computational requirements of the script and not optimised for accuracy.\n",
+    "\n",
+    "* reference: https://textattack.readthedocs.io/en/master/"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "609d3a4c-647c-498c-ab0a-54fef4f5eed6",
+   "metadata": {},
+   "source": [
+    "### Text Classification\n",
+    "\n",
+    "* Date: 07/30/2024\n",
+    "* Author: Pawan Kumar\n",
+    "* Type of attack: Text-attack\n",
+    "\n",
+    "### Metadata\n",
+    "* Dataset: IMDB\n",
+    "* Size of training set: 25,000\n",
+    "* Size of testing set : 25,000\n",
+    "* Number of class : 2\n",
+    "* Original Model: LSTM model trained "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "id": "e5cd330e-0bc5-4676-8b7d-03bea1e0e8cb",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2024-07-30T06:50:51.373277Z",
+     "iopub.status.busy": "2024-07-30T06:50:51.372281Z",
+     "iopub.status.idle": "2024-07-30T06:50:51.379825Z",
+     "shell.execute_reply": "2024-07-30T06:50:51.379825Z",
+     "shell.execute_reply.started": "2024-07-30T06:50:51.373277Z"
+    }
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "'\\nDescription: Uncomment and run to install libraries. Needed for running first time only. \\n'"
+      ]
+     },
+     "execution_count": 12,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "\"\"\"\n",
+    "Description: Uncomment and run to install libraries. Needed for running first time only. \n",
+    "\"\"\"\n",
+    "# !pip install textattack[tensorflow]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "d49c8793-9032-4af1-aa8f-29ff05d2409a",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2024-07-30T06:40:27.398710Z",
+     "iopub.status.busy": "2024-07-30T06:40:27.398710Z",
+     "iopub.status.idle": "2024-07-30T06:40:27.517530Z",
+     "shell.execute_reply": "2024-07-30T06:40:27.516313Z",
+     "shell.execute_reply.started": "2024-07-30T06:40:27.398710Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "# Importing necessary libraries\n",
+    "import os\n",
+    "import numpy as np\n",
+    "\n",
+    "import tensorflow as tf\n",
+    "import matplotlib.pyplot as plt\n",
+    "\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "from tensorflow.keras.preprocessing.text import Tokenizer\n",
+    "from tensorflow.keras.preprocessing.sequence import pad_sequences\n",
+    "from tensorflow.keras.models import Sequential\n",
+    "from tensorflow.keras.layers import LSTM, Embedding, Dense, Dropout, SimpleRNN\n",
+    "from tensorflow.keras.datasets import imdb\n",
+    "\n",
+    "from transformers import TFAutoModelForSequenceClassification, AutoTokenizer\n",
+    "\n",
+    "from textattack.models.wrappers import ModelWrapper\n",
+    "from textattack.datasets import HuggingFaceDataset\n",
+    "from textattack.attack_recipes import PWWSRen2019\n",
+    "from textattack import Attacker\n",
+    "import textattack"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "6dc3c1c1-42a5-490e-bbf3-723c051b8054",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2024-07-30T06:40:30.160388Z",
+     "iopub.status.busy": "2024-07-30T06:40:30.160388Z",
+     "iopub.status.idle": "2024-07-30T06:40:30.169572Z",
+     "shell.execute_reply": "2024-07-30T06:40:30.169071Z",
+     "shell.execute_reply.started": "2024-07-30T06:40:30.160388Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "# Flag to determine whether to train a new model or use a pre-trained one\n",
+    "model_train = True # False-> download from Huggingface"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3fb6b427-f7d7-42d9-9939-5b90159e60ed",
+   "metadata": {},
+   "source": [
+    "# Step 1: Load the IMDB dataset"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "0e121e62-50bb-46a9-99b8-131d55e3f105",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2024-07-30T06:40:31.010886Z",
+     "iopub.status.busy": "2024-07-30T06:40:31.010886Z",
+     "iopub.status.idle": "2024-07-30T06:40:33.803320Z",
+     "shell.execute_reply": "2024-07-30T06:40:33.803320Z",
+     "shell.execute_reply.started": "2024-07-30T06:40:31.010886Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=10000)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "082b6240-13d2-4e32-854e-0011e8f2fd6d",
+   "metadata": {},
+   "source": [
+    "# Step 2: Create the model"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "3d732d28-d0e8-483c-a1b4-fb6a1650e7de",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2024-07-30T06:40:34.958372Z",
+     "iopub.status.busy": "2024-07-30T06:40:34.957385Z",
+     "iopub.status.idle": "2024-07-30T06:48:25.810813Z",
+     "shell.execute_reply": "2024-07-30T06:48:25.810813Z",
+     "shell.execute_reply.started": "2024-07-30T06:40:34.958372Z"
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Epoch 1/30\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "C:\\Users\\CUP3KOR\\.conda\\envs\\env_torch\\lib\\site-packages\\keras\\src\\layers\\core\\embedding.py:90: UserWarning: Argument `input_length` is deprecated. Just remove it.\n",
+      "  warnings.warn(\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\u001b[1m782/782\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m17s\u001b[0m 20ms/step - accuracy: 0.6455 - loss: 0.6067 - val_accuracy: 0.7898 - val_loss: 0.4815\n",
+      "Epoch 2/30\n",
+      "\u001b[1m782/782\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m15s\u001b[0m 19ms/step - accuracy: 0.8382 - loss: 0.3954 - val_accuracy: 0.8062 - val_loss: 0.4408\n",
+      "Epoch 3/30\n",
+      "\u001b[1m782/782\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m15s\u001b[0m 20ms/step - accuracy: 0.8786 - loss: 0.3209 - val_accuracy: 0.8006 - val_loss: 0.4716\n",
+      "Epoch 4/30\n",
+      "\u001b[1m782/782\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m15s\u001b[0m 20ms/step - accuracy: 0.8982 - loss: 0.2883 - val_accuracy: 0.8050 - val_loss: 0.4872\n",
+      "Epoch 5/30\n",
+      "\u001b[1m782/782\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m15s\u001b[0m 20ms/step - accuracy: 0.9219 - loss: 0.2219 - val_accuracy: 0.7987 - val_loss: 0.4955\n",
+      "Epoch 6/30\n",
+      "\u001b[1m782/782\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m15s\u001b[0m 20ms/step - accuracy: 0.9352 - loss: 0.1917 - val_accuracy: 0.7968 - val_loss: 0.5100\n",
+      "Epoch 7/30\n",
+      "\u001b[1m782/782\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m16s\u001b[0m 20ms/step - accuracy: 0.9362 - loss: 0.1891 - val_accuracy: 0.7905 - val_loss: 0.6276\n",
+      "Epoch 8/30\n",
+      "\u001b[1m782/782\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m16s\u001b[0m 20ms/step - accuracy: 0.8274 - loss: 0.3710 - val_accuracy: 0.7948 - val_loss: 0.5578\n",
+      "Epoch 9/30\n",
+      "\u001b[1m782/782\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m15s\u001b[0m 20ms/step - accuracy: 0.9198 - loss: 0.2287 - val_accuracy: 0.7854 - val_loss: 0.5871\n",
+      "Epoch 10/30\n",
+      "\u001b[1m782/782\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m15s\u001b[0m 20ms/step - accuracy: 0.9081 - loss: 0.2333 - val_accuracy: 0.7876 - val_loss: 0.6009\n",
+      "Epoch 11/30\n",
+      "\u001b[1m782/782\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m16s\u001b[0m 20ms/step - accuracy: 0.9553 - loss: 0.1420 - val_accuracy: 0.7883 - val_loss: 0.6265\n",
+      "Epoch 12/30\n",
+      "\u001b[1m782/782\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m16s\u001b[0m 20ms/step - accuracy: 0.9598 - loss: 0.1232 - val_accuracy: 0.7889 - val_loss: 0.6716\n",
+      "Epoch 13/30\n",
+      "\u001b[1m782/782\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m15s\u001b[0m 20ms/step - accuracy: 0.9661 - loss: 0.1096 - val_accuracy: 0.7870 - val_loss: 0.7236\n",
+      "Epoch 14/30\n",
+      "\u001b[1m782/782\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m16s\u001b[0m 20ms/step - accuracy: 0.9676 - loss: 0.0999 - val_accuracy: 0.7833 - val_loss: 0.6662\n",
+      "Epoch 15/30\n",
+      "\u001b[1m782/782\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m15s\u001b[0m 20ms/step - accuracy: 0.9692 - loss: 0.1014 - val_accuracy: 0.7816 - val_loss: 0.7717\n",
+      "Epoch 16/30\n",
+      "\u001b[1m782/782\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m15s\u001b[0m 20ms/step - accuracy: 0.9722 - loss: 0.0873 - val_accuracy: 0.7804 - val_loss: 0.8158\n",
+      "Epoch 17/30\n",
+      "\u001b[1m782/782\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m15s\u001b[0m 19ms/step - accuracy: 0.9716 - loss: 0.0925 - val_accuracy: 0.6377 - val_loss: 0.7183\n",
+      "Epoch 18/30\n",
+      "\u001b[1m782/782\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m16s\u001b[0m 20ms/step - accuracy: 0.7759 - loss: 0.4766 - val_accuracy: 0.7776 - val_loss: 0.6318\n",
+      "Epoch 19/30\n",
+      "\u001b[1m782/782\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m16s\u001b[0m 21ms/step - accuracy: 0.9597 - loss: 0.1204 - val_accuracy: 0.7835 - val_loss: 0.7238\n",
+      "Epoch 20/30\n",
+      "\u001b[1m782/782\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m16s\u001b[0m 20ms/step - accuracy: 0.9777 - loss: 0.0756 - val_accuracy: 0.7847 - val_loss: 0.8460\n",
+      "Epoch 21/30\n",
+      "\u001b[1m782/782\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m15s\u001b[0m 19ms/step - accuracy: 0.9848 - loss: 0.0564 - val_accuracy: 0.7824 - val_loss: 0.8455\n",
+      "Epoch 22/30\n",
+      "\u001b[1m782/782\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m16s\u001b[0m 20ms/step - accuracy: 0.9850 - loss: 0.0542 - val_accuracy: 0.7817 - val_loss: 0.8955\n",
+      "Epoch 23/30\n",
+      "\u001b[1m782/782\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m15s\u001b[0m 19ms/step - accuracy: 0.9776 - loss: 0.0691 - val_accuracy: 0.7771 - val_loss: 0.9468\n",
+      "Epoch 24/30\n",
+      "\u001b[1m782/782\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m16s\u001b[0m 20ms/step - accuracy: 0.9638 - loss: 0.1353 - val_accuracy: 0.7729 - val_loss: 0.8872\n",
+      "Epoch 25/30\n",
+      "\u001b[1m782/782\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m15s\u001b[0m 20ms/step - accuracy: 0.9812 - loss: 0.0623 - val_accuracy: 0.7790 - val_loss: 0.9489\n",
+      "Epoch 26/30\n",
+      "\u001b[1m782/782\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m16s\u001b[0m 20ms/step - accuracy: 0.9879 - loss: 0.0441 - val_accuracy: 0.7706 - val_loss: 1.1105\n",
+      "Epoch 27/30\n",
+      "\u001b[1m782/782\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m15s\u001b[0m 20ms/step - accuracy: 0.9912 - loss: 0.0328 - val_accuracy: 0.7786 - val_loss: 1.0273\n",
+      "Epoch 28/30\n",
+      "\u001b[1m782/782\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m15s\u001b[0m 20ms/step - accuracy: 0.9910 - loss: 0.0322 - val_accuracy: 0.7792 - val_loss: 1.1005\n",
+      "Epoch 29/30\n",
+      "\u001b[1m782/782\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m15s\u001b[0m 20ms/step - accuracy: 0.9577 - loss: 0.1139 - val_accuracy: 0.7803 - val_loss: 0.7637\n",
+      "Epoch 30/30\n",
+      "\u001b[1m782/782\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m16s\u001b[0m 20ms/step - accuracy: 0.9815 - loss: 0.0598 - val_accuracy: 0.7768 - val_loss: 0.9161\n"
+     ]
+    }
+   ],
+   "source": [
+    "if model_train:\n",
+    "    # Setting up parameters for the IMDB dataset and model\n",
+    "    vocab_size = 10000  # Number of words to keep in the vocabulary\n",
+    "    max_length = 100    # Maximum length of each sequence\n",
+    "    embedding_dim = 16  # Embedding dimensions\n",
+    "    oov_tok = \"<OOV>\"   # Out of vocabulary token\n",
+    "    \n",
+    "    # Loading the IMDB dataset\n",
+    "    (x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=vocab_size)\n",
+    "\n",
+    "    # Padding sequences to ensure uniform length\n",
+    "    x_train = pad_sequences(x_train, maxlen=max_length, padding='post', truncating='post')\n",
+    "    x_test = pad_sequences(x_test, maxlen=max_length, padding='post', truncating='post')\n",
+    "    \n",
+    "    # Creating word index for vocabulary\n",
+    "    word_index = imdb.get_word_index()\n",
+    "    word_index = {k: (v + 3) for k, v in word_index.items() if v < vocab_size}\n",
+    "    word_index[\"<PAD>\"] = 0\n",
+    "    word_index[\"<START>\"] = 1\n",
+    "    word_index[\"<UNK>\"] = 2\n",
+    "    word_index[\"<UNUSED>\"] = 3\n",
+    "    \n",
+    "    # Create an inverse word index to decode integer sequences back to words (if needed)\n",
+    "    inverse_word_index = {v: k for k, v in word_index.items()}\n",
+    "    \n",
+    "    # creating the tokenizer\n",
+    "    tokenizer = Tokenizer(num_words=vocab_size)\n",
+    "    tokenizer.word_index = word_index\n",
+    "    \n",
+    "    # Defining the model architecture\n",
+    "    model = Sequential([\n",
+    "        Embedding(vocab_size, embedding_dim, input_length=max_length),\n",
+    "        LSTM(32),\n",
+    "        Dense(1, activation='sigmoid')\n",
+    "    ])\n",
+    "\n",
+    "    # Compiling and training the model\n",
+    "    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])\n",
+    "    model.fit(x_train, y_train, epochs=30, validation_data=(x_test, y_test))\n",
+    "\n",
+    "else:\n",
+    "    # Using a pre-trained model from Hugging Face\n",
+    "    model_name = \"finiteautomata/bertweet-base-sentiment-analysis\"\n",
+    "\n",
+    "    # Load the model\n",
+    "    model = TFAutoModelForSequenceClassification.from_pretrained(model_name)\n",
+    "    \n",
+    "    # Load the tokenizer\n",
+    "    tokenizer = AutoTokenizer.from_pretrained(model_name)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2096e6b0-cbdb-40af-bb4b-49d4aa3f6867",
+   "metadata": {},
+   "source": [
+    "# Step 3: Create the Text-Attack classifier"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "de8e1990-bb40-4154-af2f-4a728945a893",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2024-07-30T06:48:25.812810Z",
+     "iopub.status.busy": "2024-07-30T06:48:25.811814Z",
+     "iopub.status.idle": "2024-07-30T06:48:25.828265Z",
+     "shell.execute_reply": "2024-07-30T06:48:25.827447Z",
+     "shell.execute_reply.started": "2024-07-30T06:48:25.812810Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "class CustomTensorFlowModelWrapper(ModelWrapper):\n",
+    "    def __init__(self, model,tokenizer,model_type,max_length = None,preprocess_text = None):\n",
+    "        self.model = model\n",
+    "        self.tokenizer = tokenizer\n",
+    "        self.max_length = max_length\n",
+    "        self.preprocess_text = preprocess_text\n",
+    "        self.model_type = model_type\n",
+    "\n",
+    "    def __call__(self, text_list):\n",
+    "        for idx,text in enumerate(text_list):\n",
+    "            if self.model_type.lower() == \"transformer\":\n",
+    "                # Preprocessing for transformer models\n",
+    "                preprocessed_text = self.tokenizer.encode(text,return_tensors=\"tf\")\n",
+    "                preds = self.model(preprocessed_text).logits\n",
+    "                logits = tf.nn.sigmoid(preds)\n",
+    "                final_preds = np.stack(logits, axis=0)\n",
+    "            else:\n",
+    "                # Preprocessing for Other models\n",
+    "                sequences = self.tokenizer.texts_to_sequences([text])\n",
+    "                preprocessed_text = pad_sequences(sequences, maxlen=self.max_length, padding='post', truncating='post')\n",
+    "                preds = self.model(preprocessed_text).numpy()\n",
+    "                logits = np.array(preds[0])\n",
+    "                final_preds = np.stack((1 - logits, logits), axis=1)\n",
+    "                \n",
+    "            if idx == 0:\n",
+    "                all_preds = final_preds\n",
+    "            else:\n",
+    "                all_preds = np.concatenate((all_preds, final_preds), axis=0)\n",
+    "        return all_preds"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "29bfd315-1b01-414b-ae8d-421aad21767f",
+   "metadata": {},
+   "source": [
+    "# Step 4: Creating The attack Vectors on benign test examples"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "b1c3280d-03b0-4c06-bdb8-1e5e57700bb2",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2024-07-30T06:48:25.829899Z",
+     "iopub.status.busy": "2024-07-30T06:48:25.829271Z",
+     "iopub.status.idle": "2024-07-30T06:49:06.376939Z",
+     "shell.execute_reply": "2024-07-30T06:49:06.376939Z",
+     "shell.execute_reply.started": "2024-07-30T06:48:25.829899Z"
+    },
+    "scrolled": true
+   },
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "[nltk_data] Error loading omw-1.4: <urlopen error [Errno 11001]\n",
+      "[nltk_data]     getaddrinfo failed>\n",
+      "textattack: Unknown if model of class <class 'keras.src.models.sequential.Sequential'> compatible with goal function <class 'textattack.goal_functions.classification.untargeted_classification.UntargetedClassification'>.\n",
+      "textattack: Attempting to attack 10 samples when only 2 are available.\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Attack(\n",
+      "  (search_method): GreedyWordSwapWIR(\n",
+      "    (wir_method):  weighted-saliency\n",
+      "  )\n",
+      "  (goal_function):  UntargetedClassification\n",
+      "  (transformation):  WordSwapWordNet\n",
+      "  (constraints): \n",
+      "    (0): RepeatModification\n",
+      "    (1): StopwordModification\n",
+      "  (is_black_box):  True\n",
+      ") \n",
+      "\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      " 10%|████████▎                                                                          | 1/10 [00:35<05:18, 35.40s/it]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "--------------------------------------------- Result 1 ---------------------------------------------\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "[Succeeded / Failed / Skipped / Total] 1 / 0 / 0 / 1:  10%|██▉                          | 1/10 [00:36<05:24, 36.06s/it]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[[0 (96%)]] --> [[1 (91%)]]\n",
+      "\n",
+      "Don't [[waste]] your time or money on this one. This book is terrible. Whatever happened to Amanda Quick writing great books. She used to be my favorite autor. It will be a long time before I ever purchase another one of her books.\n",
+      "\n",
+      "Don't [[desolate]] your time or money on this one. This book is terrible. Whatever happened to Amanda Quick writing great books. She used to be my favorite autor. It will be a long time before I ever purchase another one of her books.\n",
+      "\n",
+      "\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "[Succeeded / Failed / Skipped / Total] 1 / 1 / 0 / 2:  20%|█████▊                       | 2/10 [00:38<02:34, 19.30s/it]"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "--------------------------------------------- Result 2 ---------------------------------------------\n",
+      "[[1 (94%)]] --> [[[FAILED]]]\n",
+      "\n",
+      "I am happy\n",
+      "\n",
+      "\n",
+      "\n",
+      "+-------------------------------+--------+\n",
+      "| Attack Results                |        |\n",
+      "+-------------------------------+--------+\n",
+      "| Number of successful attacks: | 1      |\n",
+      "| Number of failed attacks:     | 1      |\n",
+      "| Number of skipped attacks:    | 0      |\n",
+      "| Original accuracy:            | 100.0% |\n",
+      "| Accuracy under attack:        | 50.0%  |\n",
+      "| Attack success rate:          | 50.0%  |\n",
+      "| Average perturbed word %:     | 2.33%  |\n",
+      "| Average num. words per input: | 23.0   |\n",
+      "| Avg num queries:              | 158.5  |\n",
+      "+-------------------------------+--------+\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Wrapping the model for TextAttack\n",
+    "model_wrapper = CustomTensorFlowModelWrapper(model,tokenizer,\"lstm\",max_length)\n",
+    "\n",
+    "# Preparing input data for the attack\n",
+    "input_data = [(\"\"\"Don't waste your time or money on this one. This book is terrible. Whatever happened to Amanda Quick writing great books. She used to be my favorite autor. It will be a long time before I ever purchase another one of her books.\"\"\", 0),\n",
+    "             (\"I am happy\",1)]\n",
+    "dataset = textattack.datasets.Dataset(input_data)\n",
+    "\n",
+    "# Setting up the attack\n",
+    "attack = PWWSRen2019.build(model_wrapper)\n",
+    "\n",
+    "# Launching the attack\n",
+    "attacker = Attacker(attack, dataset)\n",
+    "attacked_data = attacker.attack_dataset()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7965f887-b4b4-4907-938c-08dbcbbf8f77",
+   "metadata": {},
+   "source": [
+    "# Step 5: Result of Text-Attack on benign test examples"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "id": "bef21ec2-d1d0-4752-9b0a-2c99c006291e",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2024-07-30T06:54:43.238422Z",
+     "iopub.status.busy": "2024-07-30T06:54:43.238422Z",
+     "iopub.status.idle": "2024-07-30T06:54:43.254191Z",
+     "shell.execute_reply": "2024-07-30T06:54:43.253694Z",
+     "shell.execute_reply.started": "2024-07-30T06:54:43.238422Z"
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Original_text -> Don't waste your time or money on this one. This book is terrible. Whatever happened to Amanda Quick writing great books. She used to be my favorite autor. It will be a long time before I ever purchase another one of her books.\n",
+      "Original_text_Label -> 0\n",
+      "\n",
+      "Perturbed_text -> Don't desolate your time or money on this one. This book is terrible. Whatever happened to Amanda Quick writing great books. She used to be my favorite autor. It will be a long time before I ever purchase another one of her books.\n",
+      "Perturbed_text_Label -> 1\n",
+      "\n",
+      "---------------------------------------------------------------------------\n",
+      "Original_text -> I am happy\n",
+      "Original_text_Label -> 1\n",
+      "\n",
+      "Perturbed_text -> 1 am happy\n",
+      "Perturbed_text_Label -> 1\n",
+      "\n",
+      "---------------------------------------------------------------------------\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Displaying the results of the attack\n",
+    "for data in attacked_data:\n",
+    "    print(f\"Original_text -> {data.original_text()}\")\n",
+    "    print(f\"Original_text_Label -> {data.original_result.ground_truth_output}\")\n",
+    "    print()\n",
+    "    print(f\"Perturbed_text -> {data.perturbed_text()}\")\n",
+    "    print(f\"Perturbed_text_Label -> {data.perturbed_result.output}\")\n",
+    "    print()\n",
+    "    print('-'*75)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "062d4ac2-7d76-44ad-b9db-9da31a461ddb",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "env_torch",
+   "language": "python",
+   "name": "env_torch"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.19"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}