Spaces:

raymondEDS
/

DS_webclass

Sleeping

App Files Files Community

raymondEDS commited on Apr 28, 2025

Commit

3215313

1 Parent(s): 63a7f01

removing files

Browse files

Files changed (2) hide show

Reference files/Week2_ref/Ch02-statlearn-lab.ipynb +0 -3229
Reference files/Week2_ref/Lecture_1_basics.ipynb +0 -0

Reference files/Week2_ref/Ch02-statlearn-lab.ipynb DELETED Viewed

@@ -1,3229 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "id": "245f0c86",
-   "metadata": {},
-   "source": [
-    "\n",
-    "# Chapter 2\n",
-    "\n",
-    "# Lab: Introduction to Python\n",
-    "\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "5ab29948",
-   "metadata": {},
-   "source": [
-    "## Getting Started"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "ed622870",
-   "metadata": {},
-   "source": [
-    "To run the labs in this book, you will need two things:\n",
-    "\n",
-    "* An installation of `Python3`, which is the specific version of `Python`  used in the labs. \n",
-    "* Access to  `Jupyter`, a very popular `Python` interface that runs code through a file called a *notebook*. "
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "844d37fc",
-   "metadata": {},
-   "source": [
-    "You can download and install  `Python3`   by following the instructions available at [anaconda.com](http://anaconda.com). "
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "462ff1fe",
-   "metadata": {},
-   "source": [
-    " There are a number of ways to get access to `Jupyter`. Here are just a few:\n",
-    " \n",
-    " * Using Google's `Colaboratory` service: [colab.research.google.com/](https://colab.research.google.com/). \n",
-    " * Using `JupyterHub`, available at [jupyter.org/hub](https://jupyter.org/hub). \n",
-    " * Using your own `jupyter` installation. Installation instructions are available at [jupyter.org/install](https://jupyter.org/install). \n",
-    " \n",
-    "Please see the `Python` resources page on the book website [statlearning.com](https://www.statlearning.com) for up-to-date information about getting `Python` and `Jupyter` working on your computer. \n",
-    "\n",
-    "You will need to install the `ISLP` package, which provides access to the datasets and custom-built functions that we provide.\n",
-    "Inside a macOS or Linux terminal type `pip install ISLP`; this also installs most other packages needed in the labs. The `Python` resources page has a link to the `ISLP` documentation website.\n",
-    "\n",
-    "To run this lab, download the file `Ch2-statlearn-lab.ipynb` from the `Python` resources page. \n",
-    "Now run the following code at the command line: `jupyter lab Ch2-statlearn-lab.ipynb`.\n",
-    "\n",
-    "If you're using Windows, you can use the `start menu` to access `anaconda`, and follow the links. For example, to install `ISLP` and run this lab, you can run the same code above in an `anaconda` shell.\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "b46f9182",
-   "metadata": {},
-   "source": [
-    "## Basic Commands\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "54060fd9",
-   "metadata": {},
-   "source": [
-    "In this lab, we will introduce some simple `Python` commands. \n",
-    " For more resources about `Python` in general, readers may want to consult the tutorial at [docs.python.org/3/tutorial/](https://docs.python.org/3/tutorial/). \n",
-    "\n",
-    "\n",
-    " \n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "d3dbd0e9",
-   "metadata": {},
-   "source": [
-    "Like most programming languages, `Python` uses *functions*\n",
-    "to perform operations.   To run a\n",
-    "function called `fun`, we type\n",
-    "`fun(input1,input2)`, where the inputs (or *arguments*)\n",
-    "`input1` and `input2` tell\n",
-    "`Python` how to run the function.  A function can have any number of\n",
-    "inputs. For example, the\n",
-    "`print()`  function outputs a text representation of all of its arguments to the console."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 1,
-   "id": "9e8aa21f",
-   "metadata": {
-    "execution": {}
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "fit a model with 11 variables\n"
-     ]
-    }
-   ],
-   "source": [
-    "print('fit a model with', 11, 'variables')\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "27d935f8",
-   "metadata": {},
-   "source": [
-    " The following command will provide information about the `print()` function."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "d62ec119",
-   "metadata": {
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "print?\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "04b3e2a3",
-   "metadata": {},
-   "source": [
-    "Adding two integers in `Python` is pretty intuitive."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "c64e9f4d",
-   "metadata": {
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "3 + 5\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "cd754cba",
-   "metadata": {},
-   "source": [
-    "In `Python`, textual data is handled using\n",
-    "*strings*. For instance, `\"hello\"` and\n",
-    "`'hello'`\n",
-    "are strings. \n",
-    "We can concatenate them using the addition `+` symbol."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "9abccc1f",
-   "metadata": {
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "\"hello\" + \"world\"\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "c28db903",
-   "metadata": {},
-   "source": [
-    " A string is actually a type of *sequence*: this is a generic term for an ordered list. \n",
-    " The three most important types of sequences are lists, tuples, and strings.  \n",
-    "We introduce lists now. "
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "5fdcc5a1",
-   "metadata": {},
-   "source": [
-    "The following command instructs `Python` to join together\n",
-    "the numbers 3, 4, and 5, and to save them as a\n",
-    "*list* named `x`. When we\n",
-    "type `x`, it gives us back the list."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "802ca33c",
-   "metadata": {
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "x = [3, 4, 5]\n",
-    "x\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "5492ecd1",
-   "metadata": {},
-   "source": [
-    "Note that we used the brackets\n",
-    "`[]` to construct this list. \n",
-    "\n",
-    "We will often want to add two sets of numbers together. It is reasonable to try the following code,\n",
-    "though it will not produce the desired results."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "a8c72744",
-   "metadata": {
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "y = [4, 9, 7]\n",
-    "x + y\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "b84f9d0e",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "x[3]"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "8f42ea1d",
-   "metadata": {},
-   "source": [
-    "The result may appear slightly counterintuitive: why did `Python` not add the entries of the lists\n",
-    "element-by-element? \n",
-    " In `Python`, lists hold *arbitrary* objects, and  are added using  *concatenation*. \n",
-    " In fact, concatenation is the behavior that we saw earlier when we entered `\"hello\" + \" \" + \"world\"`. \n",
-    " "
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "69015df5",
-   "metadata": {},
-   "source": [
-    "This example reflects the fact that \n",
-    " `Python` is a general-purpose programming language. Much of `Python`'s  data-specific\n",
-    "functionality comes from other packages, notably `numpy`\n",
-    "and `pandas`. \n",
-    "In the next section, we will introduce the  `numpy` package. \n",
-    "See [docs.scipy.org/doc/numpy/user/quickstart.html](https://docs.scipy.org/doc/numpy/user/quickstart.html) for more information about `numpy`.\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "16bfc4a2",
-   "metadata": {},
-   "source": [
-    "## Introduction to Numerical Python\n",
-    "\n",
-    "As mentioned earlier, this book makes use of functionality   that is contained in the `numpy` \n",
-    " *library*, or *package*. A package is a collection of modules that are not necessarily included in \n",
-    " the base `Python` distribution. The name `numpy` is an abbreviation for *numerical Python*. "
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "f5bed3f0",
-   "metadata": {},
-   "source": [
-    "  To access `numpy`, we must first `import` it."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "f1c7d1db",
-   "metadata": {
-    "execution": {},
-    "lines_to_next_cell": 0
-   },
-   "outputs": [],
-   "source": [
-    "import numpy as np "
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "5c8614e7",
-   "metadata": {},
-   "source": [
-    "In the previous line, we named the `numpy` *module* `np`; an abbreviation for easier referencing."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "ba1224a6",
-   "metadata": {},
-   "source": [
-    "In `numpy`, an *array* is  a generic term for a multidimensional\n",
-    "set of numbers.\n",
-    "We use the `np.array()` function to define   `x` and `y`, which are one-dimensional arrays, i.e. vectors."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "e2ea2bfd",
-   "metadata": {
-    "execution": {},
-    "lines_to_next_cell": 0
-   },
-   "outputs": [],
-   "source": [
-    "x = np.array([3, 4, 5])\n",
-    "y = np.array([4, 9, 7])"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "a977e05a",
-   "metadata": {},
-   "source": [
-    "Note that if you forgot to run the `import numpy as np` command earlier, then\n",
-    "you will encounter an error in calling the `np.array()` function in the previous line. \n",
-    " The syntax `np.array()` indicates that the function being called\n",
-    "is part of the `numpy` package, which we have abbreviated as `np`. "
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "742431b6",
-   "metadata": {},
-   "source": [
-    "Since `x` and `y` have been defined using `np.array()`, we get a sensible result when we add them together. Compare this to our results in the previous section,\n",
-    " when we tried to add two lists without using `numpy`. "
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "59fbf9fd",
-   "metadata": {
-    "execution": {},
-    "lines_to_next_cell": 0
-   },
-   "outputs": [],
-   "source": [
-    "x + y"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "2ceccc2b",
-   "metadata": {},
-   "source": [
-    "    \n",
-    " \n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "74be6d74",
-   "metadata": {},
-   "source": [
-    "In `numpy`, matrices are typically represented as two-dimensional arrays, and vectors as one-dimensional arrays. {While it is also possible to create matrices using  `np.matrix()`, we will use `np.array()` throughout the labs in this book.}\n",
-    "We can create a two-dimensional array as follows. "
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "2279437e",
-   "metadata": {
-    "execution": {},
-    "lines_to_next_cell": 0
-   },
-   "outputs": [],
-   "source": [
-    "x = np.array([[1, 2], [3, 4]])\n",
-    "x"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "f96f304d",
-   "metadata": {},
-   "source": [
-    "    \n",
-    "\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "f764f7d1",
-   "metadata": {},
-   "source": [
-    "The object `x` has several \n",
-    "*attributes*, or associated objects. To access an attribute of `x`, we type `x.attribute`, where we replace `attribute`\n",
-    "with the name of the attribute. \n",
-    "For instance, we can access the `ndim` attribute of  `x` as follows. "
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "75bf1b1e",
-   "metadata": {
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "x.ndim"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "4e3b83bf",
-   "metadata": {},
-   "source": [
-    "The output indicates that `x` is a two-dimensional array.  \n",
-    "Similarly, `x.dtype` is the *data type* attribute of the object `x`. This indicates that `x` is \n",
-    "comprised of 64-bit integers:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "58292240",
-   "metadata": {
-    "execution": {},
-    "lines_to_next_cell": 0
-   },
-   "outputs": [],
-   "source": [
-    "x.dtype"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "cf9cf94b",
-   "metadata": {},
-   "source": [
-    "Why is `x` comprised of integers? This is because we created `x` by passing in exclusively integers to the `np.array()` function.\n",
-    "  If\n",
-    "we had passed in any decimals, then we would have obtained an array of\n",
-    "*floating point numbers* (i.e. real-valued numbers). "
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "fc5fff57",
-   "metadata": {
-    "execution": {},
-    "lines_to_next_cell": 2
-   },
-   "outputs": [],
-   "source": [
-    "np.array([[1, 2], [3.0, 4]]).dtype\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "41a79641",
-   "metadata": {},
-   "source": [
-    "Typing `fun?` will cause `Python` to display \n",
-    "documentation associated with the function `fun`, if it exists.\n",
-    "We can try this for `np.array()`. "
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "762562a6",
-   "metadata": {
-    "execution": {},
-    "lines_to_next_cell": 0
-   },
-   "outputs": [],
-   "source": [
-    "np.array?\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "d4d82167",
-   "metadata": {},
-   "source": [
-    "This documentation indicates that we could create a floating point array by passing a `dtype` argument into `np.array()`."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "66d2b82a",
-   "metadata": {
-    "execution": {},
-    "lines_to_next_cell": 2
-   },
-   "outputs": [],
-   "source": [
-    "np.array([[1, 2], [3, 4]], float).dtype\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "1e3ba5be",
-   "metadata": {},
-   "source": [
-    "The array `x` is two-dimensional. We can find out the number of rows and columns by looking\n",
-    "at its `shape` attribute."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "89881402",
-   "metadata": {
-    "execution": {},
-    "lines_to_next_cell": 2
-   },
-   "outputs": [],
-   "source": [
-    "x.shape\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "2967b644",
-   "metadata": {},
-   "source": [
-    "A *method* is a function that is associated with an\n",
-    "object. \n",
-    "For instance, given an array `x`, the expression\n",
-    "`x.sum()` sums all of its elements, using the `sum()`\n",
-    "method for arrays. \n",
-    "The call `x.sum()` automatically provides `x` as the\n",
-    "first argument to its `sum()` method."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "0572d3f6",
-   "metadata": {
-    "execution": {},
-    "lines_to_next_cell": 0
-   },
-   "outputs": [],
-   "source": [
-    "x = np.array([1, 2, 3, 4])\n",
-    "x.sum()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "e3f49995",
-   "metadata": {},
-   "source": [
-    "We could also sum the elements of `x` by passing in `x` as an argument to the `np.sum()` function. "
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "33b10a6f",
-   "metadata": {
-    "execution": {},
-    "lines_to_next_cell": 0
-   },
-   "outputs": [],
-   "source": [
-    "x = np.array([1, 2, 3, 4])\n",
-    "np.sum(x)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "2f3dd2c3",
-   "metadata": {},
-   "source": [
-    " As another example, the\n",
-    "`reshape()` method returns a new array with the same elements as\n",
-    "`x`, but a different shape.\n",
-    " We do this by passing in a `tuple` in our call to\n",
-    " `reshape()`, in this case `(2, 3)`.  This tuple specifies that we would like to create a two-dimensional array with \n",
-    "$2$ rows and $3$ columns. {Like lists, tuples represent a sequence of objects. Why do we need more than one way to create a sequence? There are a few differences between tuples and lists, but perhaps the most important is that elements of a tuple cannot be modified, whereas elements of a list can be.}\n",
-    " \n",
-    "In what follows, the\n",
-    "`\\n` character creates a *new line*."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "a32716db",
-   "metadata": {
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "x = np.array([1, 2, 3, 4, 5, 6])\n",
-    "print('beginning x:\\n', x)\n",
-    "x_reshape = x.reshape((2, 3))\n",
-    "print('reshaped x:\\n', x_reshape)\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "2483179e",
-   "metadata": {},
-   "source": [
-    "The previous output reveals that `numpy` arrays are specified as a sequence\n",
-    "of *rows*. This is  called *row-major ordering*, as opposed to *column-major ordering*. "
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "e256575f",
-   "metadata": {},
-   "source": [
-    "`Python` (and hence `numpy`) uses 0-based\n",
-    "indexing. This means that to access the top left element of `x_reshape`, \n",
-    "we type in `x_reshape[0,0]`."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "3db6e1cf",
-   "metadata": {
-    "execution": {},
-    "lines_to_next_cell": 0
-   },
-   "outputs": [],
-   "source": [
-    "x_reshape[0, 0] "
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "0e10119e",
-   "metadata": {},
-   "source": [
-    "Similarly, `x_reshape[1,2]` yields the element in the second row and the third column \n",
-    "of `x_reshape`. "
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "e15c753f",
-   "metadata": {
-    "execution": {},
-    "lines_to_next_cell": 0
-   },
-   "outputs": [],
-   "source": [
-    "x_reshape[1, 2] "
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "f9c55622",
-   "metadata": {},
-   "source": [
-    "Similarly, `x[2]` yields the\n",
-    "third entry of `x`. \n",
-    "\n",
-    "Now, let's modify the top left element of `x_reshape`.  To our surprise, we discover that the first element of `x` has been modified as well!\n",
-    "\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "91c6e7d8",
-   "metadata": {
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "print('x before we modify x_reshape:\\n', x)\n",
-    "print('x_reshape before we modify x_reshape:\\n', x_reshape)\n",
-    "x_reshape[0, 0] = 5\n",
-    "print('x_reshape after we modify its top left element:\\n', x_reshape)\n",
-    "print('x after we modify top left element of x_reshape:\\n', x)\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "8a840507",
-   "metadata": {},
-   "source": [
-    "Modifying `x_reshape` also modified `x` because the two objects occupy the same space in memory.\n",
-    " \n",
-    "\n",
-    "    "
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "ec551f3e",
-   "metadata": {},
-   "source": [
-    "We just saw that we can modify an element of an array. Can we also modify a tuple? It turns out that we cannot --- and trying to do so introduces\n",
-    "an *exception*, or error."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "59d95dce",
-   "metadata": {
-    "execution": {},
-    "lines_to_next_cell": 2
-   },
-   "outputs": [],
-   "source": [
-    "my_tuple = (3, 4, 5)\n",
-    "my_tuple[0] = 2\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "d594f1af",
-   "metadata": {},
-   "source": [
-    "We now briefly mention some attributes of arrays that will come in handy. An array's `shape` attribute contains its dimension; this is always a tuple.\n",
-    "The  `ndim` attribute yields the number of dimensions, and `T` provides its transpose. "
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "a6fde9af",
-   "metadata": {
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "x_reshape.shape, x_reshape.ndim, x_reshape.T\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "76d20b98",
-   "metadata": {},
-   "source": [
-    "Notice that the three individual outputs `(2,3)`, `2`, and `array([[5, 4],[2, 5], [3,6]])` are themselves output as a tuple. \n",
-    " \n",
-    "We will often want to apply functions to arrays. \n",
-    "For instance, we can compute the\n",
-    "square root of the entries using the `np.sqrt()` function: "
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "fadb6b45",
-   "metadata": {
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "np.sqrt(x)\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "22fab2ce",
-   "metadata": {},
-   "source": [
-    "We can also square the elements:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "fda3134b",
-   "metadata": {
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "x**2\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "1278f26b",
-   "metadata": {},
-   "source": [
-    "We can compute the square roots using the same notation, raising to the power of $1/2$ instead of 2."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "52eb335b",
-   "metadata": {
-    "execution": {},
-    "lines_to_next_cell": 2
-   },
-   "outputs": [],
-   "source": [
-    "x**0.5\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "299a5a85",
-   "metadata": {},
-   "source": [
-    "Throughout this book, we will often want to generate random data. \n",
-    "The `np.random.normal()`  function generates a vector of random\n",
-    "normal variables. We can learn more about this function by looking at the help page, via a call to `np.random.normal?`.\n",
-    "The first line of the help page  reads `normal(loc=0.0, scale=1.0, size=None)`. \n",
-    " This  *signature* line tells us that the function's arguments are  `loc`, `scale`, and `size`. These are *keyword* arguments, which means that when they are passed into\n",
-    " the function, they can be referred to by name (in any order). {`Python` also uses *positional* arguments. Positional arguments do not need to use a keyword. To see an example, type in `np.sum?`. We see that `a` is a positional argument, i.e. this function assumes that the first unnamed argument that it receives is the array to be summed. By contrast, `axis` and `dtype` are keyword arguments: the position in which these arguments are entered into `np.sum()` does not matter.}\n",
-    " By default, this function will generate random normal variable(s) with mean (`loc`) $0$ and standard deviation (`scale`) $1$; furthermore, \n",
-    " a single random variable will be generated unless the argument to `size` is changed. \n",
-    "\n",
-    "We now generate 50 independent random variables from a $N(0,1)$ distribution. "
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "ac5e9d29",
-   "metadata": {
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "x = np.random.normal(size=50)\n",
-    "x\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "d77cf45a",
-   "metadata": {},
-   "source": [
-    "We create an array `y` by adding an independent $N(50,1)$ random variable to each element of `x`."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "55fa905e",
-   "metadata": {
-    "execution": {},
-    "lines_to_next_cell": 0
-   },
-   "outputs": [],
-   "source": [
-    "y = x + np.random.normal(loc=50, scale=1, size=50)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "eacfecc9",
-   "metadata": {},
-   "source": [
-    "The `np.corrcoef()` function computes the correlation matrix between `x` and `y`. The off-diagonal elements give the \n",
-    "correlation between `x` and `y`. "
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "fde0dc19",
-   "metadata": {
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "np.corrcoef(x, y)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "8a594218",
-   "metadata": {},
-   "source": [
-    "If you're following along in your own `Jupyter` notebook, then you probably noticed that you got a different set of results when you ran the past few \n",
-    "commands. In particular, \n",
-    " each\n",
-    "time we call `np.random.normal()`, we will get a different answer, as shown in the following example."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "5099cf54",
-   "metadata": {
-    "execution": {},
-    "lines_to_next_cell": 0
-   },
-   "outputs": [],
-   "source": [
-    "print(np.random.normal(scale=5, size=2))\n",
-    "print(np.random.normal(scale=5, size=2)) \n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "2e209118",
-   "metadata": {},
-   "source": [
-    "    "
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "ed7697a4",
-   "metadata": {},
-   "source": [
-    "In order to ensure that our code provides exactly the same results\n",
-    "each time it is run, we can set a *random seed* \n",
-    "using the \n",
-    "`np.random.default_rng()` function.\n",
-    "This function takes an arbitrary, user-specified integer argument. If we set a random seed before \n",
-    "generating random data, then re-running our code will yield the same results. The\n",
-    "object `rng` has essentially all the random number generating methods found in `np.random`. Hence, to\n",
-    "generate normal data we use `rng.normal()`."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "9d8074e5",
-   "metadata": {
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "rng = np.random.default_rng(1303)\n",
-    "print(rng.normal(scale=5, size=2))\n",
-    "rng2 = np.random.default_rng(1303)\n",
-    "print(rng2.normal(scale=5, size=2)) "
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "93f826ef",
-   "metadata": {},
-   "source": [
-    "Throughout the labs in this book, we use `np.random.default_rng()`  whenever we\n",
-    "perform calculations involving random quantities within `numpy`.  In principle, this\n",
-    "should enable the reader to exactly reproduce the stated results. However, as new versions of `numpy` become available, it is possible\n",
-    "that some small discrepancies may occur between the output\n",
-    "in the labs and the output\n",
-    "from `numpy`.\n",
-    "\n",
-    "The `np.mean()`,  `np.var()`, and `np.std()`  functions can be used\n",
-    "to compute the mean, variance, and standard deviation of arrays.  These functions are also\n",
-    "available as methods on the arrays."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "e98472df",
-   "metadata": {
-    "execution": {},
-    "lines_to_next_cell": 0
-   },
-   "outputs": [],
-   "source": [
-    "rng = np.random.default_rng(3)\n",
-    "y = rng.standard_normal(10)\n",
-    "np.mean(y), y.mean()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "2870d61f",
-   "metadata": {},
-   "source": [
-    "    \n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "8c2784fd",
-   "metadata": {
-    "execution": {},
-    "lines_to_next_cell": 2
-   },
-   "outputs": [],
-   "source": [
-    "np.var(y), y.var(), np.mean((y - y.mean())**2)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "86261a69",
-   "metadata": {},
-   "source": [
-    "Notice that by default `np.var()` divides by the sample size $n$ rather\n",
-    "than $n-1$; see the `ddof` argument in `np.var?`.\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "7e7205f2",
-   "metadata": {
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "np.sqrt(np.var(y)), np.std(y)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "d4faf901",
-   "metadata": {},
-   "source": [
-    "The `np.mean()`,  `np.var()`, and `np.std()` functions can also be applied to the rows and columns of a matrix. \n",
-    "To see this, we construct a $10 \\times 3$ matrix of $N(0,1)$ random variables, and consider computing its row sums. "
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "fce06849",
-   "metadata": {
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "X = rng.standard_normal((10, 3))\n",
-    "X"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "6cc355d2",
-   "metadata": {},
-   "source": [
-    "Since arrays are row-major ordered, the first axis, i.e. `axis=0`, refers to its rows. We pass this argument into the `mean()` method for the object `X`. "
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "1403ff7a",
-   "metadata": {
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "X.mean(axis=0)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "6785c0ec",
-   "metadata": {},
-   "source": [
-    "The following yields the same result."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "7e9255ba",
-   "metadata": {
-    "execution": {},
-    "lines_to_next_cell": 0
-   },
-   "outputs": [],
-   "source": [
-    "X.mean(0)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "5de246dc",
-   "metadata": {},
-   "source": [
-    "    "
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "30b002fa",
-   "metadata": {},
-   "source": [
-    "## Graphics\n",
-    "In `Python`, common practice is to use  the library\n",
-    "`matplotlib` for graphics.\n",
-    "However, since `Python` was not written with data analysis in mind,\n",
-    "  the notion of plotting is not intrinsic to the language. \n",
-    "We will use the `subplots()` function\n",
-    "from `matplotlib.pyplot` to create a figure and the\n",
-    "axes onto which we plot our data.\n",
-    "For many more examples of how to make plots in `Python`,\n",
-    "readers are encouraged to visit [matplotlib.org/stable/gallery/](https://matplotlib.org/stable/gallery/index.html).\n",
-    "\n",
-    "In `matplotlib`, a plot consists of a *figure* and one or more *axes*. You can think of the figure as the blank canvas upon which \n",
-    "one or more plots will be displayed: it is the entire plotting window. \n",
-    "The *axes* contain important information about each plot, such as its $x$- and $y$-axis labels,\n",
-    "title,  and more. (Note that in `matplotlib`, the word *axes* is not the plural of *axis*: a plot's *axes* contains much more information \n",
-    "than just the $x$-axis and  the $y$-axis.)\n",
-    "\n",
-    "We begin by importing the `subplots()` function\n",
-    "from `matplotlib`. We use this function\n",
-    "throughout when creating figures.\n",
-    "The function returns a tuple of length two: a figure\n",
-    "object as well as the relevant axes object. We will typically\n",
-    "pass `figsize` as a keyword argument.\n",
-    "Having created our axes, we attempt our first plot using its  `plot()` method.\n",
-    "To learn more about it, \n",
-    "type `ax.plot?`."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "8236e5f7",
-   "metadata": {
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "from matplotlib.pyplot import subplots\n",
-    "fig, ax = subplots(figsize=(8, 8))\n",
-    "x = rng.standard_normal(100)\n",
-    "y = rng.standard_normal(100)\n",
-    "ax.plot(x, y);\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "bbef67e6",
-   "metadata": {},
-   "source": [
-    "We pause here to note that we have *unpacked* the tuple of length two returned by `subplots()` into the two distinct\n",
-    "variables `fig` and `ax`. Unpacking\n",
-    "is typically preferred to the following equivalent but slightly more verbose code:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "ddc9ed4f",
-   "metadata": {
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "output = subplots(figsize=(8, 8))\n",
-    "fig = output[0]\n",
-    "ax = output[1]"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "104d6b8f",
-   "metadata": {},
-   "source": [
-    "We see that our earlier cell produced a line plot, which is the default. To create a scatterplot, we provide an additional argument to `ax.plot()`, indicating that circles should be displayed."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "c64ed600",
-   "metadata": {
-    "execution": {},
-    "lines_to_next_cell": 0
-   },
-   "outputs": [],
-   "source": [
-    "fig, ax = subplots(figsize=(8, 8))\n",
-    "ax.plot(x, y, 'o');"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "840be2a9",
-   "metadata": {},
-   "source": [
-    "Different values\n",
-    "of this additional argument can be used to produce different colored lines\n",
-    "as well as different linestyles. \n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "971b98bd",
-   "metadata": {},
-   "source": [
-    "As an alternative, we could use the  `ax.scatter()` function to create a scatterplot."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "bc6245e2",
-   "metadata": {
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "fig, ax = subplots(figsize=(8, 8))\n",
-    "ax.scatter(x, y, marker='o');"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "97f36df0",
-   "metadata": {},
-   "source": [
-    "Notice that in the code blocks above, we have ended\n",
-    "the last line with a semicolon. This prevents `ax.plot(x, y)` from printing\n",
-    "text  to the notebook. However, it does not prevent a plot from being produced. \n",
-    " If we omit the trailing semi-colon, then we obtain the following output:  "
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "2454807b",
-   "metadata": {
-    "execution": {},
-    "lines_to_next_cell": 0
-   },
-   "outputs": [],
-   "source": [
-    "fig, ax = subplots(figsize=(8, 8))\n",
-    "ax.scatter(x, y, marker='o')\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "1230c0a6",
-   "metadata": {},
-   "source": [
-    "In what follows, we will use\n",
-    " trailing semicolons whenever the text that would be output is not\n",
-    "germane to the discussion at hand.\n",
-    "\n",
-    "\n",
-    "\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "0ccb9964",
-   "metadata": {},
-   "source": [
-    "To label our plot, we  make use of the `set_xlabel()`,  `set_ylabel()`, and  `set_title()` methods\n",
-    "of `ax`.\n",
-    "  "
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "1e18a793",
-   "metadata": {
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "fig, ax = subplots(figsize=(8, 8))\n",
-    "ax.scatter(x, y, marker='o')\n",
-    "ax.set_xlabel(\"this is the x-axis\")\n",
-    "ax.set_ylabel(\"this is the y-axis\")\n",
-    "ax.set_title(\"Plot of X vs Y\");"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "f2d818ee",
-   "metadata": {},
-   "source": [
-    " Having access to the figure object `fig` itself means that we can go in and change some aspects and then redisplay it. Here, we change\n",
-    "  the size from `(8, 8)` to `(12, 3)`.\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "aec3f009",
-   "metadata": {
-    "execution": {},
-    "lines_to_next_cell": 0
-   },
-   "outputs": [],
-   "source": [
-    "fig.set_size_inches(12,3)\n",
-    "fig"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "dee531cc",
-   "metadata": {},
-   "source": [
-    " "
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "011bf802",
-   "metadata": {},
-   "source": [
-    "Occasionally we will want to create several plots within a figure. This can be\n",
-    "achieved by passing additional arguments to `subplots()`. \n",
-    "Below, we create a  $2 \\times 3$ grid of plots\n",
-    "in a figure of size determined by the `figsize` argument. In such\n",
-    "situations, there is often a relationship between the axes in the plots. For example,\n",
-    "all plots may have a common $x$-axis. The `subplots()` function can automatically handle\n",
-    "this situation when passed the keyword argument `sharex=True`.\n",
-    "The `axes` object below is an array pointing to different plots in the figure. "
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "2cbc7fd4",
-   "metadata": {
-    "execution": {},
-    "lines_to_next_cell": 0
-   },
-   "outputs": [],
-   "source": [
-    "fig, axes = subplots(nrows=2,\n",
-    "                     ncols=3,\n",
-    "                     figsize=(15, 5))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "b8ff2e6d",
-   "metadata": {},
-   "source": [
-    "We now produce a scatter plot with `'o'` in the second column of the first row and\n",
-    "a scatter plot with `'+'` in the third column of the second row."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "702f80d9",
-   "metadata": {
-    "execution": {},
-    "lines_to_next_cell": 0
-   },
-   "outputs": [],
-   "source": [
-    "axes[0,1].plot(x, y, 'o')\n",
-    "axes[1,2].scatter(x, y, marker='+')\n",
-    "fig"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "5b265f8b",
-   "metadata": {},
-   "source": [
-    "Type  `subplots?` to learn more about \n",
-    "`subplots()`. \n",
-    "\n",
-    "\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "1bd7e707",
-   "metadata": {},
-   "source": [
-    "To save the output of `fig`, we call its `savefig()`\n",
-    "method. The argument `dpi` is the dots per inch, used\n",
-    "to determine how large the figure will be in pixels."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "5493d229",
-   "metadata": {
-    "execution": {},
-    "lines_to_next_cell": 2
-   },
-   "outputs": [],
-   "source": [
-    "fig.savefig(\"Figure.png\", dpi=400)\n",
-    "fig.savefig(\"Figure.pdf\", dpi=200);\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "7152d0c7",
-   "metadata": {},
-   "source": [
-    "We can continue to modify `fig` using step-by-step updates; for example, we can modify the range of the $x$-axis, re-save the figure, and even re-display it. "
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "bd07af12",
-   "metadata": {
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "axes[0,1].set_xlim([-1,1])\n",
-    "fig.savefig(\"Figure_updated.jpg\")\n",
-    "fig"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "b5278857",
-   "metadata": {},
-   "source": [
-    "We now create some more sophisticated plots. The \n",
-    "`ax.contour()` method  produces a  *contour plot* \n",
-    "in order to represent three-dimensional data, similar to a\n",
-    "topographical map.  It takes three arguments:\n",
-    "\n",
-    "* A vector of `x` values (the first dimension),\n",
-    "* A vector of `y` values (the second dimension), and\n",
-    "* A matrix whose elements correspond to the `z` value (the third\n",
-    "dimension) for each pair of `(x,y)` coordinates.\n",
-    "\n",
-    "To create `x` and `y`, we’ll use the command  `np.linspace(a, b, n)`, \n",
-    "which returns a vector of `n` numbers starting at  `a` and  ending at `b`."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "01019508",
-   "metadata": {
-    "execution": {},
-    "lines_to_next_cell": 0
-   },
-   "outputs": [],
-   "source": [
-    "fig, ax = subplots(figsize=(8, 8))\n",
-    "x = np.linspace(-np.pi, np.pi, 50)\n",
-    "y = x\n",
-    "f = np.multiply.outer(np.cos(y), 1 / (1 + x**2))\n",
-    "ax.contour(x, y, f);\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "9ef3c475",
-   "metadata": {},
-   "source": [
-    "We can increase the resolution by adding more levels to the image."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "7d08992f",
-   "metadata": {
-    "execution": {},
-    "lines_to_next_cell": 0
-   },
-   "outputs": [],
-   "source": [
-    "fig, ax = subplots(figsize=(8, 8))\n",
-    "ax.contour(x, y, f, levels=45);"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "8e1d37a2",
-   "metadata": {},
-   "source": [
-    "To fine-tune the output of the\n",
-    "`ax.contour()`  function, take a\n",
-    "look at the help file by typing `?plt.contour`.\n",
-    " \n",
-    "The `ax.imshow()`  method is similar to \n",
-    "`ax.contour()`, except that it produces a color-coded plot\n",
-    "whose colors depend on the `z` value. This is known as a\n",
-    "*heatmap*, and is sometimes used to plot temperature in\n",
-    "weather forecasts."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "1f89d704",
-   "metadata": {
-    "execution": {},
-    "lines_to_next_cell": 2
-   },
-   "outputs": [],
-   "source": [
-    "fig, ax = subplots(figsize=(8, 8))\n",
-    "ax.imshow(f);\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "2500a6ec",
-   "metadata": {},
-   "source": [
-    "## Sequences and Slice Notation"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "07001b88",
-   "metadata": {},
-   "source": [
-    "As seen above, the\n",
-    "function `np.linspace()`  can be used to create a sequence\n",
-    "of numbers."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "cd971131",
-   "metadata": {
-    "execution": {},
-    "lines_to_next_cell": 2
-   },
-   "outputs": [],
-   "source": [
-    "seq1 = np.linspace(0, 10, 11)\n",
-    "seq1\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "926f96fc",
-   "metadata": {},
-   "source": [
-    "The function `np.arange()`\n",
-    " returns a sequence of numbers spaced out by `step`. If `step` is not specified, then a default value of $1$ is used. Let's create a sequence\n",
-    " that starts at $0$ and ends at $10$."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "aa630d16",
-   "metadata": {
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "seq2 = np.arange(0, 10)\n",
-    "seq2\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "6908bad7",
-   "metadata": {},
-   "source": [
-    "Why isn't $10$ output above? This has to do with *slice* notation in `Python`. \n",
-    "Slice notation  \n",
-    "is used to index sequences such as lists, tuples and arrays.\n",
-    "Suppose we want to retrieve the fourth through sixth (inclusive) entries\n",
-    "of a string. We obtain a slice of the string using the indexing  notation  `[3:6]`."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "89955ee2",
-   "metadata": {
-    "execution": {},
-    "lines_to_next_cell": 0
-   },
-   "outputs": [],
-   "source": [
-    "\"hello world\"[3:6]"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "17d73e4d",
-   "metadata": {},
-   "source": [
-    "In the code block above, the notation `3:6` is shorthand for  `slice(3,6)` when used inside\n",
-    "`[]`. "
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "517f592d",
-   "metadata": {
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "\"hello world\"[slice(3,6)]\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "680fe656",
-   "metadata": {},
-   "source": [
-    "You might have expected  `slice(3,6)` to output the fourth through seventh characters in the text string (recalling that  `Python` begins its indexing at zero),  but instead it output  the fourth through sixth. \n",
-    " This also explains why the earlier `np.arange(0, 10)` command output only the integers from $0$ to $9$. \n",
-    "See the documentation `slice?` for useful options in creating slices. \n",
-    "\n",
-    "    \n",
-    "\n",
-    "\n",
-    "\n",
-    "    \n",
-    "\n",
-    "\n",
-    "    \n",
-    "\n",
-    " \n",
-    "\n",
-    "    \n",
-    "\n",
-    " \n",
-    "\n",
-    "    \n",
-    "\n",
-    "\n",
-    "    \n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "522a2761",
-   "metadata": {},
-   "source": [
-    "## Indexing Data\n",
-    "To begin, we  create a two-dimensional `numpy` array."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "35927abd",
-   "metadata": {
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "A = np.array(np.arange(16)).reshape((4, 4))\n",
-    "A\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "27c88984",
-   "metadata": {},
-   "source": [
-    "Typing `A[1,2]` retrieves the element corresponding to the second row and third\n",
-    "column. (As usual, `Python` indexes from $0.$)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "78ee7f5b",
-   "metadata": {
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "A[1,2]\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "dd65ec1c",
-   "metadata": {},
-   "source": [
-    "The first number after the open-bracket symbol `[`\n",
-    " refers to the row, and the second number refers to the column. \n",
-    "\n",
-    "### Indexing Rows, Columns, and Submatrices\n",
-    " To select multiple rows at a time, we can pass in a list\n",
-    "  specifying our selection. For instance, `[1,3]` will retrieve the second and fourth rows:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "16212696",
-   "metadata": {
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "A[[1,3]]\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "0b8b3ce3",
-   "metadata": {},
-   "source": [
-    "To select the first and third columns, we pass in  `[0,2]` as the second argument in the square brackets.\n",
-    "In this case we need to supply the first argument `:` \n",
-    "which selects all rows."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "d5f473d2",
-   "metadata": {
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "A[:,[0,2]]\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "471ed1b4",
-   "metadata": {},
-   "source": [
-    "Now, suppose that we want to select the submatrix made up of the second and fourth \n",
-    "rows as well as the first and third columns. This is where\n",
-    "indexing gets slightly tricky. It is natural to try  to use lists to retrieve the rows and columns:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "c89646d6",
-   "metadata": {
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "A[[1,3],[0,2]]\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "9cbf1ff9",
-   "metadata": {},
-   "source": [
-    " Oops --- what happened? We got a one-dimensional array of length two identical to"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "87f6b4f2",
-   "metadata": {
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "np.array([A[1,0],A[3,2]])\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "9a93dc96",
-   "metadata": {},
-   "source": [
-    " Similarly,  the following code fails to extract the submatrix comprised of the second and fourth rows and the first, third, and fourth columns:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "5da5bda8",
-   "metadata": {
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "A[[1,3],[0,2,3]]\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "f4fd2f83",
-   "metadata": {},
-   "source": [
-    "We can see what has gone wrong here. When supplied with two indexing lists, the `numpy` interpretation is that these provide pairs of $i,j$ indices for a series of entries. That is why the pair of lists must have the same length. However, that was not our intent, since we are looking for a submatrix.\n",
-    "\n",
-    "One easy way to do this is as follows. We first create a submatrix by subsetting the rows of `A`, and then on the fly we make a further submatrix by subsetting its columns.\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "ac48a95b",
-   "metadata": {
-    "execution": {},
-    "lines_to_next_cell": 0
-   },
-   "outputs": [],
-   "source": [
-    "A[[1,3]][:,[0,2]]\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "5e8388aa",
-   "metadata": {},
-   "source": [
-    "    "
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "a09467cd",
-   "metadata": {},
-   "source": [
-    "There are more efficient ways of achieving the same result.\n",
-    "\n",
-    "The *convenience function* `np.ix_()` allows us  to extract a submatrix\n",
-    "using lists, by creating an intermediate *mesh* object."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "ee195cc4",
-   "metadata": {
-    "execution": {},
-    "lines_to_next_cell": 2
-   },
-   "outputs": [],
-   "source": [
-    "idx = np.ix_([1,3],[0,2,3])\n",
-    "A[idx]\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "b7177cb9",
-   "metadata": {},
-   "source": [
-    "Alternatively, we can subset matrices efficiently using slices.\n",
-    "  \n",
-    "The slice\n",
-    "`1:4:2` captures the second and fourth items of a sequence, while the slice `0:3:2` captures\n",
-    "the first and third items (the third element in a slice sequence is the step size)."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "48917bb5",
-   "metadata": {
-    "execution": {},
-    "lines_to_next_cell": 0
-   },
-   "outputs": [],
-   "source": [
-    "A[1:4:2,0:3:2]\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "697c5ab0",
-   "metadata": {},
-   "source": [
-    "    "
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "c647dbf0",
-   "metadata": {},
-   "source": [
-    "Why are we able to retrieve a submatrix directly using slices but not using lists?\n",
-    "Its because they are different `Python` types, and\n",
-    "are treated differently by `numpy`.\n",
-    "Slices can be used to extract objects from arbitrary sequences, such as strings, lists, and tuples, while the use of lists for indexing is more limited.\n",
-    "\n",
-    "\n",
-    "\n",
-    "\n",
-    "    \n",
-    "\n",
-    " \n",
-    "\n",
-    "    \n",
-    "\n",
-    " "
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "2dce8961",
-   "metadata": {},
-   "source": [
-    "### Boolean Indexing\n",
-    "In `numpy`, a *Boolean* is a type  that equals either   `True` or  `False` (also represented as $1$ and $0$, respectively).\n",
-    "The next line creates a vector of $0$'s, represented as Booleans, of length equal to the first dimension of `A`. "
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "5d4caf22",
-   "metadata": {
-    "execution": {},
-    "lines_to_next_cell": 0
-   },
-   "outputs": [],
-   "source": [
-    "keep_rows = np.zeros(A.shape[0], bool)\n",
-    "keep_rows"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "d83fadb5",
-   "metadata": {},
-   "source": [
-    "We now set two of the elements to `True`. "
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "348820e3",
-   "metadata": {
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "keep_rows[[1,3]] = True\n",
-    "keep_rows\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "a0fb487d",
-   "metadata": {},
-   "source": [
-    "Note that the elements of `keep_rows`, when viewed as integers, are the same as the\n",
-    "values of `np.array([0,1,0,1])`. Below, we use  `==` to verify their equality. When\n",
-    "applied to two arrays, the `==`   operation is applied elementwise."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "4aafe45b",
-   "metadata": {
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "np.all(keep_rows == np.array([0,1,0,1]))\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "603c0c53",
-   "metadata": {},
-   "source": [
-    "(Here, the function `np.all()` has checked whether\n",
-    "all entries of an array are `True`. A similar function, `np.any()`, can be used to check whether any entries of an array are `True`.)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "b0a449d1",
-   "metadata": {},
-   "source": [
-    "   However, even though `np.array([0,1,0,1])`  and `keep_rows` are equal according to `==`, they index different sets of rows!\n",
-    "The former retrieves the first, second, first, and second rows of `A`. "
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "1be6a588",
-   "metadata": {
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "A[np.array([0,1,0,1])]\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "e45bbebe",
-   "metadata": {},
-   "source": [
-    " By contrast, `keep_rows` retrieves only the second and fourth rows  of `A` --- i.e. the rows for which the Boolean equals `TRUE`. "
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "e83da57b",
-   "metadata": {
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "A[keep_rows]\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "374d34a7",
-   "metadata": {},
-   "source": [
-    "This example shows that Booleans and integers are treated differently by `numpy`."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "25db74bf",
-   "metadata": {},
-   "source": [
-    "We again make use of the `np.ix_()` function\n",
-    " to create a mesh containing the second and fourth rows, and the first,  third, and fourth columns. This time, we apply the function to Booleans,\n",
-    " rather than lists."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "09675294",
-   "metadata": {
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "keep_cols = np.zeros(A.shape[1], bool)\n",
-    "keep_cols[[0, 2, 3]] = True\n",
-    "idx_bool = np.ix_(keep_rows, keep_cols)\n",
-    "A[idx_bool]\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "0166c179",
-   "metadata": {},
-   "source": [
-    "We can also mix a list with an array of Booleans in the arguments to `np.ix_()`:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "a85614e4",
-   "metadata": {
-    "execution": {},
-    "lines_to_next_cell": 0
-   },
-   "outputs": [],
-   "source": [
-    "idx_mixed = np.ix_([1,3], keep_cols)\n",
-    "A[idx_mixed]\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "f6a338f1",
-   "metadata": {},
-   "source": [
-    "    "
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "b3541e0c",
-   "metadata": {},
-   "source": [
-    "For more details on indexing in `numpy`, readers are referred\n",
-    "to the `numpy` tutorial mentioned earlier.\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "ab75f168",
-   "metadata": {},
-   "source": [
-    "## Loading Data\n",
-    "\n",
-    "Data sets often contain different types of data, and may have names associated with the rows or columns. \n",
-    "For these reasons, they typically are best accommodated using a\n",
-    " *data frame*. \n",
-    " We can think of a data frame  as a sequence\n",
-    "of arrays of identical length; these are the columns. Entries in the\n",
-    "different arrays can be combined to form a row.\n",
-    " The `pandas`\n",
-    "library can be used to create and work with data frame objects."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "ca018d13",
-   "metadata": {},
-   "source": [
-    "### Reading in a Data Set\n",
-    "\n",
-    "The first step of most analyses involves importing a data set into\n",
-    "`Python`.  \n",
-    " Before attempting to load\n",
-    "a data set, we must make sure that `Python` knows where to find the file containing it. \n",
-    "If the\n",
-    "file is in the same location\n",
-    "as this notebook file, then we are all set. \n",
-    "Otherwise, \n",
-    "the command\n",
-    "`os.chdir()`  can be used to *change directory*. (You will need to call `import os` before calling `os.chdir()`.) "
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "b76342df",
-   "metadata": {},
-   "source": [
-    "We will begin by reading in `Auto.csv`, available on the book website. This is a comma-separated file, and can be read in using `pd.read_csv()`: "
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "ff81e644",
-   "metadata": {
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "import pandas as pd\n",
-    "Auto = pd.read_csv('Auto.csv')\n",
-    "Auto\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "42d6a799",
-   "metadata": {},
-   "source": [
-    "The book website also has a whitespace-delimited version of this data, called `Auto.data`. This can be read in as follows:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "5b45aa7f",
-   "metadata": {
-    "execution": {},
-    "lines_to_next_cell": 0
-   },
-   "outputs": [],
-   "source": [
-    "Auto = pd.read_csv('Auto.data', delim_whitespace=True)\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "f942c457",
-   "metadata": {},
-   "source": [
-    " Both `Auto.csv` and `Auto.data` are simply text\n",
-    "files. Before loading data into `Python`, it is a good idea to view it using\n",
-    "a text editor or other software, such as Microsoft Excel.\n",
-    "\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "1aceff38",
-   "metadata": {},
-   "source": [
-    "We now take a look at the column of `Auto` corresponding to the variable `horsepower`: "
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "413f626a",
-   "metadata": {
-    "execution": {},
-    "lines_to_next_cell": 0
-   },
-   "outputs": [],
-   "source": [
-    "Auto['horsepower']\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "fd11e757",
-   "metadata": {},
-   "source": [
-    "We see that the `dtype` of this column is `object`. \n",
-    "It turns out that all values of the `horsepower` column were interpreted as strings when reading\n",
-    "in the data. \n",
-    "We can find out why by looking at the unique values."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "57b86346",
-   "metadata": {
-    "execution": {},
-    "lines_to_next_cell": 0
-   },
-   "outputs": [],
-   "source": [
-    "np.unique(Auto['horsepower'])\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "f0aee233",
-   "metadata": {},
-   "source": [
-    "We see the culprit is the value `?`, which is being used to encode missing values.\n",
-    "\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "b7b032d4",
-   "metadata": {},
-   "source": [
-    "To fix the problem, we must provide `pd.read_csv()` with an argument called `na_values`.\n",
-    "Now,  each instance of  `?` in the file is replaced with the\n",
-    "value `np.nan`, which means *not a number*:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "a9698b26",
-   "metadata": {
-    "execution": {},
-    "lines_to_next_cell": 2
-   },
-   "outputs": [],
-   "source": [
-    "Auto = pd.read_csv('Auto.data',\n",
-    "                   na_values=['?'],\n",
-    "                   delim_whitespace=True)\n",
-    "Auto['horsepower'].sum()\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "13cb364e",
-   "metadata": {},
-   "source": [
-    "The `Auto.shape`  attribute tells us that the data has 397\n",
-    "observations, or rows, and nine variables, or columns."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "4877cb2c",
-   "metadata": {
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "Auto.shape\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "3fdc6f47",
-   "metadata": {},
-   "source": [
-    "There are\n",
-    "various ways to deal with  missing data. \n",
-    "In this case, since only five of the rows contain missing\n",
-    "observations,  we choose to use the `Auto.dropna()` method to simply remove these rows."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "2ba1d33d",
-   "metadata": {
-    "execution": {},
-    "lines_to_next_cell": 2
-   },
-   "outputs": [],
-   "source": [
-    "Auto_new = Auto.dropna()\n",
-    "Auto_new.shape\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "ac9748d9",
-   "metadata": {},
-   "source": [
-    "### Basics of Selecting Rows and Columns\n",
-    " \n",
-    "We can use `Auto.columns`  to check the variable names."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "3d03baab",
-   "metadata": {
-    "execution": {},
-    "lines_to_next_cell": 2
-   },
-   "outputs": [],
-   "source": [
-    "Auto = Auto_new # overwrite the previous value\n",
-    "Auto.columns\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "d24d4d42",
-   "metadata": {},
-   "source": [
-    "Accessing the rows and columns of a data frame is similar, but not identical, to accessing the rows and columns of an array. \n",
-    "Recall that the first argument to the `[]` method\n",
-    "is always applied to the rows of the array.  \n",
-    "Similarly, \n",
-    "passing in a slice to the `[]` method creates a data frame whose *rows* are determined by the slice:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "410b4dd7",
-   "metadata": {
-    "execution": {},
-    "lines_to_next_cell": 0
-   },
-   "outputs": [],
-   "source": [
-    "Auto[:3]\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "4ea0be7b",
-   "metadata": {},
-   "source": [
-    "Similarly, an array of Booleans can be used to subset the rows:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "3540804d",
-   "metadata": {
-    "execution": {},
-    "lines_to_next_cell": 0
-   },
-   "outputs": [],
-   "source": [
-    "idx_80 = Auto['year'] > 80\n",
-    "Auto[idx_80]\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "a02221a2",
-   "metadata": {},
-   "source": [
-    "However, if we pass  in a list of strings to the `[]` method, then we obtain a data frame containing the corresponding set of *columns*. "
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "66d174f1",
-   "metadata": {
-    "execution": {},
-    "lines_to_next_cell": 0
-   },
-   "outputs": [],
-   "source": [
-    "Auto[['mpg', 'horsepower']]\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "54bef6a3",
-   "metadata": {},
-   "source": [
-    "Since we did not specify an *index* column when we loaded our data frame, the rows are labeled using integers\n",
-    "0 to 396."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "52789c77",
-   "metadata": {
-    "execution": {},
-    "lines_to_next_cell": 0
-   },
-   "outputs": [],
-   "source": [
-    "Auto.index\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "3f5fcb26",
-   "metadata": {},
-   "source": [
-    "We can use the\n",
-    "`set_index()` method to re-name the rows using the contents of `Auto['name']`. "
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "d83650bf",
-   "metadata": {
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "Auto_re = Auto.set_index('name')\n",
-    "Auto_re\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "880d79d9",
-   "metadata": {
-    "execution": {},
-    "lines_to_next_cell": 0
-   },
-   "outputs": [],
-   "source": [
-    "Auto_re.columns\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "dbee53b8",
-   "metadata": {},
-   "source": [
-    "We see that the column `'name'` is no longer there.\n",
-    " \n",
-    "Now that the index has been set to `name`, we can  access rows of the data \n",
-    "frame by `name` using the `{loc[]`} method of\n",
-    "`Auto`:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "c01f4095",
-   "metadata": {
-    "execution": {},
-    "lines_to_next_cell": 0
-   },
-   "outputs": [],
-   "source": [
-    "rows = ['amc rebel sst', 'ford torino']\n",
-    "Auto_re.loc[rows]\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "29688cab",
-   "metadata": {},
-   "source": [
-    "As an alternative to using the index name, we could retrieve the 4th and 5th rows of `Auto` using the `{iloc[]`} method:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "a4202eb8",
-   "metadata": {
-    "execution": {},
-    "lines_to_next_cell": 0
-   },
-   "outputs": [],
-   "source": [
-    "Auto_re.iloc[[3,4]]\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "5427ede0",
-   "metadata": {},
-   "source": [
-    "We can also use it to retrieve the 1st, 3rd and and 4th columns of `Auto_re`:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "948b2d07",
-   "metadata": {
-    "execution": {},
-    "lines_to_next_cell": 0
-   },
-   "outputs": [],
-   "source": [
-    "Auto_re.iloc[:,[0,2,3]]\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "b83d56eb",
-   "metadata": {},
-   "source": [
-    "We can extract the 4th and 5th rows, as well as the 1st, 3rd and 4th columns, using\n",
-    "a single call to `iloc[]`:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "1cfdcc5c",
-   "metadata": {
-    "execution": {},
-    "lines_to_next_cell": 0
-   },
-   "outputs": [],
-   "source": [
-    "Auto_re.iloc[[3,4],[0,2,3]]\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "2bde6514",
-   "metadata": {},
-   "source": [
-    "Index entries need not be unique: there are several cars  in the data frame named `ford galaxie 500`."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "fd9c5cda",
-   "metadata": {
-    "execution": {},
-    "lines_to_next_cell": 0
-   },
-   "outputs": [],
-   "source": [
-    "Auto_re.loc['ford galaxie 500', ['mpg', 'origin']]\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "4d097282",
-   "metadata": {},
-   "source": [
-    "### More on Selecting Rows and Columns\n",
-    "Suppose now that we want to create a data frame consisting of the  `weight` and `origin`  of the subset of cars with \n",
-    "`year` greater than 80 --- i.e. those built after 1980.\n",
-    "To do this, we first create a Boolean array that indexes the rows.\n",
-    "The `loc[]` method allows for Boolean entries as well as strings:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "6d431cb5",
-   "metadata": {
-    "execution": {},
-    "lines_to_next_cell": 2
-   },
-   "outputs": [],
-   "source": [
-    "idx_80 = Auto_re['year'] > 80\n",
-    "Auto_re.loc[idx_80, ['weight', 'origin']]\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "838a03e0",
-   "metadata": {},
-   "source": [
-    "To do this more concisely, we can use an anonymous function called a `lambda`: "
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "fac41ce1",
-   "metadata": {
-    "execution": {},
-    "lines_to_next_cell": 0
-   },
-   "outputs": [],
-   "source": [
-    "Auto_re.loc[lambda df: df['year'] > 80, ['weight', 'origin']]\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "08e61254",
-   "metadata": {},
-   "source": [
-    "The `lambda` call creates a function that takes a single\n",
-    "argument, here `df`, and returns `df['year']>80`.\n",
-    "Since it is created inside the `loc[]` method for the\n",
-    "dataframe `Auto_re`, that dataframe will be the argument supplied.\n",
-    "As another example of using a `lambda`, suppose that\n",
-    "we want all cars built after 1980 that achieve greater than 30 miles per gallon:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "b0885654",
-   "metadata": {
-    "execution": {},
-    "lines_to_next_cell": 0
-   },
-   "outputs": [],
-   "source": [
-    "Auto_re.loc[lambda df: (df['year'] > 80) & (df['mpg'] > 30),\n",
-    "            ['weight', 'origin']\n",
-    "           ]\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "d87fc459",
-   "metadata": {},
-   "source": [
-    "The symbol `&` computes an element-wise *and* operation.\n",
-    "As another example, suppose that we want to retrieve all `Ford` and `Datsun`\n",
-    "cars with `displacement` less than 300. We check whether each `name` entry contains either the string `ford` or `datsun` using the  `str.contains()` method of the `index` attribute of \n",
-    "of the dataframe:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "213945a6",
-   "metadata": {
-    "execution": {},
-    "lines_to_next_cell": 0
-   },
-   "outputs": [],
-   "source": [
-    "Auto_re.loc[lambda df: (df['displacement'] < 300)\n",
-    "                       & (df.index.str.contains('ford')\n",
-    "                       | df.index.str.contains('datsun')),\n",
-    "            ['weight', 'origin']\n",
-    "           ]\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "8a940fd1",
-   "metadata": {},
-   "source": [
-    "Here, the symbol `|` computes an element-wise *or* operation.\n",
-    " \n",
-    "In summary, a powerful set of operations is available to index the rows and columns of data frames. For integer based queries, use the `iloc[]` method. For string and Boolean\n",
-    "selections, use the `loc[]` method. For functional queries that filter rows, use the `loc[]` method\n",
-    "with a function (typically a `lambda`) in the rows argument.\n",
-    "\n",
-    "## For Loops\n",
-    "A `for` loop is a standard tool in many languages that\n",
-    "repeatedly evaluates some chunk of code while\n",
-    "varying different values inside the code.\n",
-    "For example, suppose we loop over elements of a list and compute their sum."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "a3c4060a",
-   "metadata": {
-    "execution": {},
-    "lines_to_next_cell": 0
-   },
-   "outputs": [],
-   "source": [
-    "total = 0\n",
-    "for value in [3,2,19]:\n",
-    "    total += value\n",
-    "print('Total is: {0}'.format(total))\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "9117e3a1",
-   "metadata": {},
-   "source": [
-    "The indented code beneath the line with the `for` statement is run\n",
-    "for each value in the sequence\n",
-    "specified in the `for` statement. The loop ends either\n",
-    "when the cell ends or when code is indented at the same level\n",
-    "as the original `for` statement.\n",
-    "We see that the final line above which prints the total is executed\n",
-    "only once after the for loop has terminated. Loops\n",
-    "can be nested by additional indentation."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "f2bffb69",
-   "metadata": {
-    "execution": {},
-    "lines_to_next_cell": 0
-   },
-   "outputs": [],
-   "source": [
-    "total = 0\n",
-    "for value in [2,3,19]:\n",
-    "    for weight in [3, 2, 1]:\n",
-    "        total += value * weight\n",
-    "print('Total is: {0}'.format(total))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "9f99e85b",
-   "metadata": {},
-   "source": [
-    "Above, we summed over each combination of `value` and `weight`.\n",
-    "We also took advantage of the *increment* notation\n",
-    "in `Python`: the expression `a += b` is equivalent\n",
-    "to `a = a + b`. Besides\n",
-    "being a convenient notation, this can save time in computationally\n",
-    "heavy tasks in which the intermediate value of `a+b` need not\n",
-    "be explicitly created.\n",
-    "\n",
-    "Perhaps a more\n",
-    "common task would be to sum over `(value, weight)` pairs. For instance,\n",
-    "to compute the average value of a random variable that takes on\n",
-    "possible values 2, 3 or 19 with probability 0.2, 0.3, 0.5 respectively\n",
-    "we would compute the weighted sum. Tasks such as this\n",
-    "can often be accomplished using the `zip()`  function that\n",
-    "loops over a sequence of tuples."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "ee827a53",
-   "metadata": {
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "total = 0\n",
-    "for value, weight in zip([2,3,19],\n",
-    "                         [0.2,0.3,0.5]):\n",
-    "    total += weight * value\n",
-    "print('Weighted average is: {0}'.format(total))\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "dec18466",
-   "metadata": {},
-   "source": [
-    "### String Formatting\n",
-    "In the code chunk above we also printed a string\n",
-    "displaying the total. However, the object `total`\n",
-    "is an  integer and not a string.\n",
-    "Inserting the value of something into\n",
-    "a string is a common task, made\n",
-    "simple using\n",
-    "some of the powerful string formatting\n",
-    "tools in `Python`.\n",
-    "Many data cleaning tasks involve\n",
-    "manipulating and programmatically\n",
-    "producing strings.\n",
-    "\n",
-    "For example we may want to loop over the columns of a data frame and\n",
-    "print the percent missing in each column.\n",
-    "Let’s create a data frame `D` with columns in which 20% of the entries are missing i.e. set\n",
-    "to `np.nan`.  We’ll create the\n",
-    "values in `D` from a normal distribution with mean 0 and variance 1 using `rng.standard_normal()`\n",
-    "and then overwrite some random entries using `rng.choice()`."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "3a097fbc",
-   "metadata": {
-    "execution": {},
-    "lines_to_next_cell": 2
-   },
-   "outputs": [],
-   "source": [
-    "rng = np.random.default_rng(1)\n",
-    "A = rng.standard_normal((127, 5))\n",
-    "M = rng.choice([0, np.nan], p=[0.8,0.2], size=A.shape)\n",
-    "A += M\n",
-    "D = pd.DataFrame(A, columns=['food',\n",
-    "                             'bar',\n",
-    "                             'pickle',\n",
-    "                             'snack',\n",
-    "                             'popcorn'])\n",
-    "D[:3]\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "e064e170",
-   "metadata": {
-    "execution": {},
-    "lines_to_next_cell": 0
-   },
-   "outputs": [],
-   "source": [
-    "for col in D.columns:\n",
-    "    template = 'Column \"{0}\" has {1:.2%} missing values'\n",
-    "    print(template.format(col,\n",
-    "          np.isnan(D[col]).mean()))\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "7a3e4dd8",
-   "metadata": {},
-   "source": [
-    "We see that the `template.format()` method expects two arguments `{0}`\n",
-    "and `{1:.2%}`, and the latter includes some formatting\n",
-    "information. In particular, it specifies that the second argument should be expressed as a percent with two decimal digits.\n",
-    "\n",
-    "The reference\n",
-    "[docs.python.org/3/library/string.html](https://docs.python.org/3/library/string.html)\n",
-    "includes many helpful and more complex examples."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "d8fd496a",
-   "metadata": {},
-   "source": [
-    "## Additional Graphical and Numerical Summaries\n",
-    "We can use the `ax.plot()` or  `ax.scatter()`  functions to display the quantitative variables. However, simply typing the variable names will produce an error message,\n",
-    "because `Python` does not know to look in the  `Auto`  data set for those variables."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "c915ca52",
-   "metadata": {
-    "execution": {},
-    "lines_to_next_cell": 0
-   },
-   "outputs": [],
-   "source": [
-    "fig, ax = subplots(figsize=(8, 8))\n",
-    "ax.plot(horsepower, mpg, 'o');"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "63d47021",
-   "metadata": {},
-   "source": [
-    "We can address this by accessing the columns directly:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "65cd6d02",
-   "metadata": {
-    "execution": {},
-    "lines_to_next_cell": 0
-   },
-   "outputs": [],
-   "source": [
-    "fig, ax = subplots(figsize=(8, 8))\n",
-    "ax.plot(Auto['horsepower'], Auto['mpg'], 'o');\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "726836f0",
-   "metadata": {},
-   "source": [
-    "Alternatively, we can use the `plot()` method with the call `Auto.plot()`.\n",
-    "Using this method,\n",
-    "the variables  can be accessed by name.\n",
-    "The plot methods of a data frame return a familiar object:\n",
-    "an axes. We can use it to update the plot as we did previously: "
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "76b5c0b1",
-   "metadata": {
-    "execution": {},
-    "lines_to_next_cell": 0
-   },
-   "outputs": [],
-   "source": [
-    "ax = Auto.plot.scatter('horsepower', 'mpg')\n",
-    "ax.set_title('Horsepower vs. MPG');"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "69c46251",
-   "metadata": {},
-   "source": [
-    "If we want to save\n",
-    "the figure that contains a given axes, we can find the relevant figure\n",
-    "by accessing the `figure` attribute:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "183a2c2b",
-   "metadata": {
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "fig = ax.figure\n",
-    "fig.savefig('horsepower_mpg.png');"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "6f10cb46",
-   "metadata": {},
-   "source": [
-    "We can further instruct the data frame to plot to a particular axes object. In this\n",
-    "case the corresponding `plot()` method will return the\n",
-    "modified axes we passed in as an argument. Note that\n",
-    "when we request a one-dimensional grid of plots, the object `axes` is similarly\n",
-    "one-dimensional. We place our scatter plot in the middle plot of a row of three plots\n",
-    "within a figure."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "75fbb981",
-   "metadata": {
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "fig, axes = subplots(ncols=3, figsize=(15, 5))\n",
-    "Auto.plot.scatter('horsepower', 'mpg', ax=axes[1]);\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "53ffc0da",
-   "metadata": {},
-   "source": [
-    "Note also that the columns of a data frame can be accessed as attributes: try typing in `Auto.horsepower`. "
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "1c4705e0",
-   "metadata": {},
-   "source": [
-    "We now consider the `cylinders` variable. Typing in `Auto.cylinders.dtype` reveals that it is being treated as a quantitative variable. \n",
-    "However, since there is only a small number of possible values for this variable, we may wish to treat it as \n",
-    " qualitative.  Below, we replace\n",
-    "the `cylinders` column with a categorical version of `Auto.cylinders`. The function `pd.Series()`  owes its name to the fact that `pandas` is often used in time series applications."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "55b3a1cc",
-   "metadata": {
-    "execution": {},
-    "lines_to_next_cell": 0
-   },
-   "outputs": [],
-   "source": [
-    "Auto.cylinders = pd.Series(Auto.cylinders, dtype='category')\n",
-    "Auto.cylinders.dtype\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "adc75408",
-   "metadata": {},
-   "source": [
-    " Now that `cylinders` is qualitative, we can display it using\n",
-    " the `boxplot()` method."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "f3d88794",
-   "metadata": {
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "fig, ax = subplots(figsize=(8, 8))\n",
-    "Auto.boxplot('mpg', by='cylinders', ax=ax);\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "62d6582f",
-   "metadata": {},
-   "source": [
-    "The `hist()`  method can be used to plot a *histogram*."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "eea49f5b",
-   "metadata": {
-    "execution": {},
-    "lines_to_next_cell": 0
-   },
-   "outputs": [],
-   "source": [
-    "fig, ax = subplots(figsize=(8, 8))\n",
-    "Auto.hist('mpg', ax=ax);\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "c5a5933c",
-   "metadata": {},
-   "source": [
-    "The color of the bars and the number of bins can be changed:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "d5bcfff8",
-   "metadata": {
-    "execution": {},
-    "lines_to_next_cell": 0
-   },
-   "outputs": [],
-   "source": [
-    "fig, ax = subplots(figsize=(8, 8))\n",
-    "Auto.hist('mpg', color='red', bins=12, ax=ax);\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "60c36b6c",
-   "metadata": {},
-   "source": [
-    " See `Auto.hist?` for more plotting\n",
-    "options.\n",
-    " \n",
-    "We can use the `pd.plotting.scatter_matrix()`   function to create a *scatterplot matrix* to visualize all of the pairwise relationships between the columns in\n",
-    "a data frame."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "edb66cae",
-   "metadata": {
-    "execution": {},
-    "lines_to_next_cell": 0
-   },
-   "outputs": [],
-   "source": [
-    "pd.plotting.scatter_matrix(Auto);\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "0b162bd9",
-   "metadata": {},
-   "source": [
-    " We can also produce scatterplots\n",
-    "for a subset of the variables."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "4f5d25d9",
-   "metadata": {
-    "execution": {},
-    "lines_to_next_cell": 0
-   },
-   "outputs": [],
-   "source": [
-    "pd.plotting.scatter_matrix(Auto[['mpg',\n",
-    "                                 'displacement',\n",
-    "                                 'weight']]);\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "8cae5dfc",
-   "metadata": {},
-   "source": [
-    "The `describe()`  method produces a numerical summary of each column in a data frame."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "ce7b23e2",
-   "metadata": {
-    "execution": {},
-    "lines_to_next_cell": 0
-   },
-   "outputs": [],
-   "source": [
-    "Auto[['mpg', 'weight']].describe()\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "d5042294",
-   "metadata": {},
-   "source": [
-    "We can also produce a summary of just a single column."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "a6545d2f",
-   "metadata": {
-    "execution": {},
-    "lines_to_next_cell": 0
-   },
-   "outputs": [],
-   "source": [
-    "Auto['cylinders'].describe()\n",
-    "Auto['mpg'].describe()\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "c2ea7f81",
-   "metadata": {},
-   "source": [
-    "To exit `Jupyter`,  select `File / Close and Halt`.\n",
-    "\n",
-    " \n",
-    "\n"
-   ]
-  }
- ],
- "metadata": {
-  "jupytext": {
-   "cell_metadata_filter": "-all",
-   "formats": "Rmd,ipynb",
-   "main_language": "python"
-  },
-  "kernelspec": {
-   "display_name": "Python 3 (ipykernel)",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.10.4"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 5
-}

Reference files/Week2_ref/Lecture_1_basics.ipynb DELETED Viewed

The diff for this file is too large to render. See raw diff