{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "ea969734-6e63-4b44-ac0f-8442f785616a",
   "metadata": {},
   "source": [
    "# Text-Attack example\n",
    "\n",
    "The script demonstrates a simple example of using Text-Attack with TensorFlow v2.x. The example train a small model on the IMDB\n",
    "dataset. Here we use the Text-Attack to create the Adversial example, it would also be possible to provide a pretrained model to the Text-Attack.\n",
    "The parameters are chosen for reduced computational requirements of the script and not optimised for accuracy.\n",
    "\n",
    "* reference: https://textattack.readthedocs.io/en/master/"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "609d3a4c-647c-498c-ab0a-54fef4f5eed6",
   "metadata": {},
   "source": [
    "### Text Classification\n",
    "\n",
    "* Date: 07/30/2024\n",
    "* Author: Pawan Kumar\n",
    "* Type of attack: Text-attack\n",
    "\n",
    "### Metadata\n",
    "* Dataset: IMDB\n",
    "* Size of training set: 25,000\n",
    "* Size of testing set : 25,000\n",
    "* Number of class : 2\n",
    "* Original Model: LSTM model trained "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "e5cd330e-0bc5-4676-8b7d-03bea1e0e8cb",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-07-30T06:50:51.373277Z",
     "iopub.status.busy": "2024-07-30T06:50:51.372281Z",
     "iopub.status.idle": "2024-07-30T06:50:51.379825Z",
     "shell.execute_reply": "2024-07-30T06:50:51.379825Z",
     "shell.execute_reply.started": "2024-07-30T06:50:51.373277Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'\\nDescription: Uncomment and run to install libraries. Needed for running first time only. \\n'"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "\"\"\"\n",
    "Description: Uncomment and run to install libraries. Needed for running first time only. \n",
    "\"\"\"\n",
    "# !pip install textattack[tensorflow]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "d49c8793-9032-4af1-aa8f-29ff05d2409a",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-07-30T06:40:27.398710Z",
     "iopub.status.busy": "2024-07-30T06:40:27.398710Z",
     "iopub.status.idle": "2024-07-30T06:40:27.517530Z",
     "shell.execute_reply": "2024-07-30T06:40:27.516313Z",
     "shell.execute_reply.started": "2024-07-30T06:40:27.398710Z"
    }
   },
   "outputs": [],
   "source": [
    "\"\"\"\n",
    "Description: import library \n",
    "\"\"\"\n",
    "# Importing necessary libraries\n",
    "import os\n",
    "import numpy as np\n",
    "\n",
    "import tensorflow as tf\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "from sklearn.model_selection import train_test_split\n",
    "from tensorflow.keras.preprocessing.text import Tokenizer\n",
    "from tensorflow.keras.preprocessing.sequence import pad_sequences\n",
    "from tensorflow.keras.models import Sequential\n",
    "from tensorflow.keras.layers import LSTM, Embedding, Dense, Dropout, SimpleRNN\n",
    "from tensorflow.keras.datasets import imdb\n",
    "\n",
    "from transformers import TFAutoModelForSequenceClassification, AutoTokenizer\n",
    "\n",
    "from textattack.models.wrappers import ModelWrapper\n",
    "from textattack.datasets import HuggingFaceDataset\n",
    "from textattack.attack_recipes import PWWSRen2019\n",
    "from textattack import Attacker\n",
    "import textattack"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "6dc3c1c1-42a5-490e-bbf3-723c051b8054",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-07-30T06:40:30.160388Z",
     "iopub.status.busy": "2024-07-30T06:40:30.160388Z",
     "iopub.status.idle": "2024-07-30T06:40:30.169572Z",
     "shell.execute_reply": "2024-07-30T06:40:30.169071Z",
     "shell.execute_reply.started": "2024-07-30T06:40:30.160388Z"
    }
   },
   "outputs": [],
   "source": [
    "\"\"\"\n",
    "Description: Assigning a flag value for Model Training  or Loading from huggingface.  \n",
    "\"\"\"\n",
    "\n",
    "# Flag to determine whether to train a new model or use a pre-trained one\n",
    "model_train = True # False-> download from Huggingface"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3fb6b427-f7d7-42d9-9939-5b90159e60ed",
   "metadata": {},
   "source": [
    "# Step 1: Load the IMDB dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "0e121e62-50bb-46a9-99b8-131d55e3f105",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-07-30T06:40:31.010886Z",
     "iopub.status.busy": "2024-07-30T06:40:31.010886Z",
     "iopub.status.idle": "2024-07-30T06:40:33.803320Z",
     "shell.execute_reply": "2024-07-30T06:40:33.803320Z",
     "shell.execute_reply.started": "2024-07-30T06:40:31.010886Z"
    }
   },
   "outputs": [],
   "source": [
    "\"\"\"\n",
    "Description: Load IMDB data with art functionality.  \n",
    "\"\"\"\n",
    "(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=10000)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "082b6240-13d2-4e32-854e-0011e8f2fd6d",
   "metadata": {},
   "source": [
    "# Step 2: Create the model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "3d732d28-d0e8-483c-a1b4-fb6a1650e7de",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-07-30T06:40:34.958372Z",
     "iopub.status.busy": "2024-07-30T06:40:34.957385Z",
     "iopub.status.idle": "2024-07-30T06:48:25.810813Z",
     "shell.execute_reply": "2024-07-30T06:48:25.810813Z",
     "shell.execute_reply.started": "2024-07-30T06:40:34.958372Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Epoch 1/30\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "C:\\Users\\CUP3KOR\\.conda\\envs\\env_torch\\lib\\site-packages\\keras\\src\\layers\\core\\embedding.py:90: UserWarning: Argument `input_length` is deprecated. Just remove it.\n",
      "  warnings.warn(\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[1m782/782\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m17s\u001b[0m 20ms/step - accuracy: 0.6455 - loss: 0.6067 - val_accuracy: 0.7898 - val_loss: 0.4815\n",
      "Epoch 2/30\n",
      "\u001b[1m782/782\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m15s\u001b[0m 19ms/step - accuracy: 0.8382 - loss: 0.3954 - val_accuracy: 0.8062 - val_loss: 0.4408\n",
      "Epoch 3/30\n",
      "\u001b[1m782/782\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m15s\u001b[0m 20ms/step - accuracy: 0.8786 - loss: 0.3209 - val_accuracy: 0.8006 - val_loss: 0.4716\n",
      "Epoch 4/30\n",
      "\u001b[1m782/782\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m15s\u001b[0m 20ms/step - accuracy: 0.8982 - loss: 0.2883 - val_accuracy: 0.8050 - val_loss: 0.4872\n",
      "Epoch 5/30\n",
      "\u001b[1m782/782\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m15s\u001b[0m 20ms/step - accuracy: 0.9219 - loss: 0.2219 - val_accuracy: 0.7987 - val_loss: 0.4955\n",
      "Epoch 6/30\n",
      "\u001b[1m782/782\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m15s\u001b[0m 20ms/step - accuracy: 0.9352 - loss: 0.1917 - val_accuracy: 0.7968 - val_loss: 0.5100\n",
      "Epoch 7/30\n",
      "\u001b[1m782/782\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m16s\u001b[0m 20ms/step - accuracy: 0.9362 - loss: 0.1891 - val_accuracy: 0.7905 - val_loss: 0.6276\n",
      "Epoch 8/30\n",
      "\u001b[1m782/782\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m16s\u001b[0m 20ms/step - accuracy: 0.8274 - loss: 0.3710 - val_accuracy: 0.7948 - val_loss: 0.5578\n",
      "Epoch 9/30\n",
      "\u001b[1m782/782\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m15s\u001b[0m 20ms/step - accuracy: 0.9198 - loss: 0.2287 - val_accuracy: 0.7854 - val_loss: 0.5871\n",
      "Epoch 10/30\n",
      "\u001b[1m782/782\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m15s\u001b[0m 20ms/step - accuracy: 0.9081 - loss: 0.2333 - val_accuracy: 0.7876 - val_loss: 0.6009\n",
      "Epoch 11/30\n",
      "\u001b[1m782/782\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m16s\u001b[0m 20ms/step - accuracy: 0.9553 - loss: 0.1420 - val_accuracy: 0.7883 - val_loss: 0.6265\n",
      "Epoch 12/30\n",
      "\u001b[1m782/782\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m16s\u001b[0m 20ms/step - accuracy: 0.9598 - loss: 0.1232 - val_accuracy: 0.7889 - val_loss: 0.6716\n",
      "Epoch 13/30\n",
      "\u001b[1m782/782\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m15s\u001b[0m 20ms/step - accuracy: 0.9661 - loss: 0.1096 - val_accuracy: 0.7870 - val_loss: 0.7236\n",
      "Epoch 14/30\n",
      "\u001b[1m782/782\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m16s\u001b[0m 20ms/step - accuracy: 0.9676 - loss: 0.0999 - val_accuracy: 0.7833 - val_loss: 0.6662\n",
      "Epoch 15/30\n",
      "\u001b[1m782/782\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m15s\u001b[0m 20ms/step - accuracy: 0.9692 - loss: 0.1014 - val_accuracy: 0.7816 - val_loss: 0.7717\n",
      "Epoch 16/30\n",
      "\u001b[1m782/782\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m15s\u001b[0m 20ms/step - accuracy: 0.9722 - loss: 0.0873 - val_accuracy: 0.7804 - val_loss: 0.8158\n",
      "Epoch 17/30\n",
      "\u001b[1m782/782\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m15s\u001b[0m 19ms/step - accuracy: 0.9716 - loss: 0.0925 - val_accuracy: 0.6377 - val_loss: 0.7183\n",
      "Epoch 18/30\n",
      "\u001b[1m782/782\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m16s\u001b[0m 20ms/step - accuracy: 0.7759 - loss: 0.4766 - val_accuracy: 0.7776 - val_loss: 0.6318\n",
      "Epoch 19/30\n",
      "\u001b[1m782/782\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m16s\u001b[0m 21ms/step - accuracy: 0.9597 - loss: 0.1204 - val_accuracy: 0.7835 - val_loss: 0.7238\n",
      "Epoch 20/30\n",
      "\u001b[1m782/782\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m16s\u001b[0m 20ms/step - accuracy: 0.9777 - loss: 0.0756 - val_accuracy: 0.7847 - val_loss: 0.8460\n",
      "Epoch 21/30\n",
      "\u001b[1m782/782\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m15s\u001b[0m 19ms/step - accuracy: 0.9848 - loss: 0.0564 - val_accuracy: 0.7824 - val_loss: 0.8455\n",
      "Epoch 22/30\n",
      "\u001b[1m782/782\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m16s\u001b[0m 20ms/step - accuracy: 0.9850 - loss: 0.0542 - val_accuracy: 0.7817 - val_loss: 0.8955\n",
      "Epoch 23/30\n",
      "\u001b[1m782/782\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m15s\u001b[0m 19ms/step - accuracy: 0.9776 - loss: 0.0691 - val_accuracy: 0.7771 - val_loss: 0.9468\n",
      "Epoch 24/30\n",
      "\u001b[1m782/782\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m16s\u001b[0m 20ms/step - accuracy: 0.9638 - loss: 0.1353 - val_accuracy: 0.7729 - val_loss: 0.8872\n",
      "Epoch 25/30\n",
      "\u001b[1m782/782\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m15s\u001b[0m 20ms/step - accuracy: 0.9812 - loss: 0.0623 - val_accuracy: 0.7790 - val_loss: 0.9489\n",
      "Epoch 26/30\n",
      "\u001b[1m782/782\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m16s\u001b[0m 20ms/step - accuracy: 0.9879 - loss: 0.0441 - val_accuracy: 0.7706 - val_loss: 1.1105\n",
      "Epoch 27/30\n",
      "\u001b[1m782/782\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m15s\u001b[0m 20ms/step - accuracy: 0.9912 - loss: 0.0328 - val_accuracy: 0.7786 - val_loss: 1.0273\n",
      "Epoch 28/30\n",
      "\u001b[1m782/782\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m15s\u001b[0m 20ms/step - accuracy: 0.9910 - loss: 0.0322 - val_accuracy: 0.7792 - val_loss: 1.1005\n",
      "Epoch 29/30\n",
      "\u001b[1m782/782\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m15s\u001b[0m 20ms/step - accuracy: 0.9577 - loss: 0.1139 - val_accuracy: 0.7803 - val_loss: 0.7637\n",
      "Epoch 30/30\n",
      "\u001b[1m782/782\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m16s\u001b[0m 20ms/step - accuracy: 0.9815 - loss: 0.0598 - val_accuracy: 0.7768 - val_loss: 0.9161\n"
     ]
    }
   ],
   "source": [
    "\"\"\"\n",
    "Description: Training or loading the model.\n",
    "\"\"\"\n",
    "\n",
    "if model_train:\n",
    "    # Setting up parameters for the IMDB dataset and model\n",
    "    vocab_size = 10000  # Number of words to keep in the vocabulary\n",
    "    max_length = 100    # Maximum length of each sequence\n",
    "    embedding_dim = 16  # Embedding dimensions\n",
    "    oov_tok = \"<OOV>\"   # Out of vocabulary token\n",
    "    \n",
    "    # Loading the IMDB dataset\n",
    "    (x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=vocab_size)\n",
    "\n",
    "    # Padding sequences to ensure uniform length\n",
    "    x_train = pad_sequences(x_train, maxlen=max_length, padding='post', truncating='post')\n",
    "    x_test = pad_sequences(x_test, maxlen=max_length, padding='post', truncating='post')\n",
    "    \n",
    "    # Creating word index for vocabulary\n",
    "    word_index = imdb.get_word_index()\n",
    "    word_index = {k: (v + 3) for k, v in word_index.items() if v < vocab_size}\n",
    "    word_index[\"<PAD>\"] = 0\n",
    "    word_index[\"<START>\"] = 1\n",
    "    word_index[\"<UNK>\"] = 2\n",
    "    word_index[\"<UNUSED>\"] = 3\n",
    "    \n",
    "    # Create an inverse word index to decode integer sequences back to words (if needed)\n",
    "    inverse_word_index = {v: k for k, v in word_index.items()}\n",
    "    \n",
    "    # creating the tokenizer\n",
    "    tokenizer = Tokenizer(num_words=vocab_size)\n",
    "    tokenizer.word_index = word_index\n",
    "    \n",
    "    # Defining the model architecture\n",
    "    model = Sequential([\n",
    "        Embedding(vocab_size, embedding_dim, input_length=max_length),\n",
    "        LSTM(32),\n",
    "        Dense(1, activation='sigmoid')\n",
    "    ])\n",
    "\n",
    "    # Compiling and training the model\n",
    "    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])\n",
    "    model.fit(x_train, y_train, epochs=30, validation_data=(x_test, y_test))\n",
    "\n",
    "else:\n",
    "    # Using a pre-trained model from Hugging Face\n",
    "    model_name = \"finiteautomata/bertweet-base-sentiment-analysis\"\n",
    "\n",
    "    # Load the model\n",
    "    model = TFAutoModelForSequenceClassification.from_pretrained(model_name)\n",
    "    \n",
    "    # Load the tokenizer\n",
    "    tokenizer = AutoTokenizer.from_pretrained(model_name)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2096e6b0-cbdb-40af-bb4b-49d4aa3f6867",
   "metadata": {},
   "source": [
    "# Step 3: Create the Text-Attack classifier"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "de8e1990-bb40-4154-af2f-4a728945a893",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-07-30T06:48:25.812810Z",
     "iopub.status.busy": "2024-07-30T06:48:25.811814Z",
     "iopub.status.idle": "2024-07-30T06:48:25.828265Z",
     "shell.execute_reply": "2024-07-30T06:48:25.827447Z",
     "shell.execute_reply.started": "2024-07-30T06:48:25.812810Z"
    }
   },
   "outputs": [],
   "source": [
    "\"\"\"\n",
    "Description: create class to design architecture of model wrapper for text-attack\n",
    "\"\"\"\n",
    "\n",
    "class CustomTensorFlowModelWrapper(ModelWrapper):\n",
    "    def __init__(self, model,tokenizer,model_type,max_length = None,preprocess_text = None):\n",
    "        self.model = model\n",
    "        self.tokenizer = tokenizer\n",
    "        self.max_length = max_length\n",
    "        self.preprocess_text = preprocess_text\n",
    "        self.model_type = model_type\n",
    "\n",
    "    def __call__(self, text_list):\n",
    "        for idx,text in enumerate(text_list):\n",
    "            if self.model_type.lower() == \"transformer\":\n",
    "                # Preprocessing for transformer models\n",
    "                preprocessed_text = self.tokenizer.encode(text,return_tensors=\"tf\")\n",
    "                preds = self.model(preprocessed_text).logits\n",
    "                logits = tf.nn.sigmoid(preds)\n",
    "                final_preds = np.stack(logits, axis=0)\n",
    "            else:\n",
    "                # Preprocessing for Other models\n",
    "                sequences = self.tokenizer.texts_to_sequences([text])\n",
    "                preprocessed_text = pad_sequences(sequences, maxlen=self.max_length, padding='post', truncating='post')\n",
    "                preds = self.model(preprocessed_text).numpy()\n",
    "                logits = np.array(preds[0])\n",
    "                final_preds = np.stack((1 - logits, logits), axis=1)\n",
    "                \n",
    "            if idx == 0:\n",
    "                all_preds = final_preds\n",
    "            else:\n",
    "                all_preds = np.concatenate((all_preds, final_preds), axis=0)\n",
    "        return all_preds"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "29bfd315-1b01-414b-ae8d-421aad21767f",
   "metadata": {},
   "source": [
    "# Step 4: Creating The attack Vectors on benign test examples"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "b1c3280d-03b0-4c06-bdb8-1e5e57700bb2",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-07-31T05:10:49.360446Z",
     "iopub.status.busy": "2024-07-31T05:10:49.359376Z",
     "iopub.status.idle": "2024-07-31T05:11:32.103320Z",
     "shell.execute_reply": "2024-07-31T05:11:32.103320Z",
     "shell.execute_reply.started": "2024-07-31T05:10:49.360446Z"
    },
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "[nltk_data] Error loading omw-1.4: <urlopen error [Errno 11001]\n",
      "[nltk_data]     getaddrinfo failed>\n",
      "textattack: Unknown if model of class <class 'keras.src.models.sequential.Sequential'> compatible with goal function <class 'textattack.goal_functions.classification.untargeted_classification.UntargetedClassification'>.\n",
      "textattack: Attempting to attack 10 samples when only 2 are available.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Attack(\n",
      "  (search_method): GreedyWordSwapWIR(\n",
      "    (wir_method):  weighted-saliency\n",
      "  )\n",
      "  (goal_function):  UntargetedClassification\n",
      "  (transformation):  WordSwapWordNet\n",
      "  (constraints): \n",
      "    (0): RepeatModification\n",
      "    (1): StopwordModification\n",
      "  (is_black_box):  True\n",
      ") \n",
      "\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "[Succeeded / Failed / Skipped / Total] 1 / 0 / 0 / 1:  10%|██▉                          | 1/10 [00:37<05:35, 37.25s/it]"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "--------------------------------------------- Result 1 ---------------------------------------------\n",
      "[[0 (96%)]] --> [[1 (91%)]]\n",
      "\n",
      "Don't [[waste]] your time or money on this one. This book is terrible. Whatever happened to Amanda Quick writing great books. She used to be my favorite autor. It will be a long time before I ever purchase another one of her books.\n",
      "\n",
      "Don't [[desolate]] your time or money on this one. This book is terrible. Whatever happened to Amanda Quick writing great books. She used to be my favorite autor. It will be a long time before I ever purchase another one of her books.\n",
      "\n",
      "\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "[Succeeded / Failed / Skipped / Total] 2 / 0 / 0 / 2:  20%|█████▊                       | 2/10 [00:41<02:47, 20.93s/it]"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "--------------------------------------------- Result 2 ---------------------------------------------\n",
      "[[1 (98%)]] --> [[0 (74%)]]\n",
      "\n",
      "I am happy as it was a [[wonderful]] experience\n",
      "\n",
      "I am happy as it was a [[marvellous]] experience\n",
      "\n",
      "\n",
      "\n",
      "+-------------------------------+--------+\n",
      "| Attack Results                |        |\n",
      "+-------------------------------+--------+\n",
      "| Number of successful attacks: | 2      |\n",
      "| Number of failed attacks:     | 0      |\n",
      "| Number of skipped attacks:    | 0      |\n",
      "| Original accuracy:            | 100.0% |\n",
      "| Accuracy under attack:        | 0.0%   |\n",
      "| Attack success rate:          | 100.0% |\n",
      "| Average perturbed word %:     | 6.72%  |\n",
      "| Average num. words per input: | 26.0   |\n",
      "| Avg num queries:              | 166.0  |\n",
      "+-------------------------------+--------+\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    }
   ],
   "source": [
    "\"\"\"\n",
    "Description: Generating text attack vector\n",
    "\"\"\"\n",
    "\n",
    "\n",
    "# Wrapping the model for TextAttack\n",
    "model_wrapper = CustomTensorFlowModelWrapper(model,tokenizer,\"lstm\",max_length) \n",
    "                                            \n",
    "# if transformer no need to assign max length\n",
    "\n",
    "# Preparing input data for the attack\n",
    "input_data = [(\"\"\"Don't waste your time or money on this one. This book is terrible. Whatever happened to Amanda Quick writing great books. She used to be my favorite autor. It will be a long time before I ever purchase another one of her books.\"\"\", 0),\n",
    "             (\"I am happy as it was a wonderful experience\",1)]\n",
    "dataset = textattack.datasets.Dataset(input_data)\n",
    "\n",
    "# Setting up the attack\n",
    "attack = PWWSRen2019.build(model_wrapper)\n",
    "\n",
    "# Launching the attack\n",
    "attacker = Attacker(attack, dataset)\n",
    "attacked_data = attacker.attack_dataset()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7965f887-b4b4-4907-938c-08dbcbbf8f77",
   "metadata": {},
   "source": [
    "# Step 5: Result of Text-Attack on benign test examples"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "bef21ec2-d1d0-4752-9b0a-2c99c006291e",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-07-31T05:11:32.105310Z",
     "iopub.status.busy": "2024-07-31T05:11:32.105310Z",
     "iopub.status.idle": "2024-07-31T05:11:32.120300Z",
     "shell.execute_reply": "2024-07-31T05:11:32.119361Z",
     "shell.execute_reply.started": "2024-07-31T05:11:32.105310Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Original_text -> Don't waste your time or money on this one. This book is terrible. Whatever happened to Amanda Quick writing great books. She used to be my favorite autor. It will be a long time before I ever purchase another one of her books.\n",
      "Original_text_Label -> 0\n",
      "\n",
      "Perturbed_text -> Don't desolate your time or money on this one. This book is terrible. Whatever happened to Amanda Quick writing great books. She used to be my favorite autor. It will be a long time before I ever purchase another one of her books.\n",
      "Perturbed_text_Label -> 1\n",
      "\n",
      "---------------------------------------------------------------------------\n",
      "Original_text -> I am happy as it was a wonderful experience\n",
      "Original_text_Label -> 1\n",
      "\n",
      "Perturbed_text -> I am happy as it was a marvellous experience\n",
      "Perturbed_text_Label -> 0\n",
      "\n",
      "---------------------------------------------------------------------------\n"
     ]
    }
   ],
   "source": [
    "\"\"\"\n",
    "Description: Displaying result of text attack\n",
    "\"\"\"\n",
    "# Displaying the results of the attack\n",
    "for data in attacked_data:\n",
    "    print(f\"Original_text -> {data.original_text()}\")\n",
    "    print(f\"Original_text_Label -> {data.original_result.ground_truth_output}\")\n",
    "    print()\n",
    "    print(f\"Perturbed_text -> {data.perturbed_text()}\")\n",
    "    print(f\"Perturbed_text_Label -> {data.perturbed_result.output}\")\n",
    "    print()\n",
    "    print('-'*75)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "062d4ac2-7d76-44ad-b9db-9da31a461ddb",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "env_torch",
   "language": "python",
   "name": "env_torch"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.19"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}