{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "provenance": []
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "language_info": {
      "name": "python"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "source": [
        "## **Handling HTML Tables with `pd.read_html()`**\n",
        "\n",
        "- `pd.read_html()` reads tables from an HTML file, a URL, or a string containing HTML content.\n",
        "- The `match` parameter is used to **filter** tables based on a string or regular expression pattern found in the table's content.\n",
        "- This is useful when an HTML file contains multiple tables, but you want to extract only specific ones"
      ],
      "metadata": {
        "id": "8qRQ6CcLQHE9"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "import pandas as pd\n",
        "\n",
        "# Reading tables from an HTML file\n",
        "tables = pd.read_html('sample.html', match='Age')\n",
        "\n",
        "# Display the extracted table\n",
        "print(tables[0])"
      ],
      "metadata": {
        "id": "lHrp3X_cQHQO"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "## **Code Explanation:**\n",
        "\n",
        "- `pd.read_html()` scans the HTML file and extracts tables that contain the word \"Age\".\n",
        "- The `match` parameter allows filtering tables based on specific **strings** or **patterns**.\n",
        "- The output will be a **list of DataFrames**, where each DataFrame represents a table"
      ],
      "metadata": {
        "id": "wPXZOFF5QHVY"
      }
    }
  ]
}