{ "nbformat": 4, "nbformat_minor": 0, "metadata": { "colab": { "provenance": [] }, "kernelspec": { "name": "python3", "display_name": "Python 3" }, "language_info": { "name": "python" } }, "cells": [ { "cell_type": "markdown", "source": [ "## **Handling HTML Tables with `pd.read_html()`**\n", "\n", "- `pd.read_html()` reads tables from an HTML file, a URL, or a string containing HTML content.\n", "- The `match` parameter is used to **filter** tables based on a string or regular expression pattern found in the table's content.\n", "- This is useful when an HTML file contains multiple tables, but you want to extract only specific ones" ], "metadata": { "id": "8qRQ6CcLQHE9" } }, { "cell_type": "code", "source": [ "import pandas as pd\n", "\n", "# Reading tables from an HTML file\n", "tables = pd.read_html('sample.html', match='Age')\n", "\n", "# Display the extracted table\n", "print(tables[0])" ], "metadata": { "id": "lHrp3X_cQHQO" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "## **Code Explanation:**\n", "\n", "- `pd.read_html()` scans the HTML file and extracts tables that contain the word \"Age\".\n", "- The `match` parameter allows filtering tables based on specific **strings** or **patterns**.\n", "- The output will be a **list of DataFrames**, where each DataFrame represents a table" ], "metadata": { "id": "wPXZOFF5QHVY" } } ] }