Spaces:
Sleeping
Sleeping
File size: 1,739 Bytes
75f14e6 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 |
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"provenance": []
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"source": [
"## **Handling HTML Tables with `pd.read_html()`**\n",
"\n",
"- `pd.read_html()` reads tables from an HTML file, a URL, or a string containing HTML content.\n",
"- The `match` parameter is used to **filter** tables based on a string or regular expression pattern found in the table's content.\n",
"- This is useful when an HTML file contains multiple tables, but you want to extract only specific ones"
],
"metadata": {
"id": "8qRQ6CcLQHE9"
}
},
{
"cell_type": "code",
"source": [
"import pandas as pd\n",
"\n",
"# Reading tables from an HTML file\n",
"tables = pd.read_html('sample.html', match='Age')\n",
"\n",
"# Display the extracted table\n",
"print(tables[0])"
],
"metadata": {
"id": "lHrp3X_cQHQO"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"## **Code Explanation:**\n",
"\n",
"- `pd.read_html()` scans the HTML file and extracts tables that contain the word \"Age\".\n",
"- The `match` parameter allows filtering tables based on specific **strings** or **patterns**.\n",
"- The output will be a **list of DataFrames**, where each DataFrame represents a table"
],
"metadata": {
"id": "wPXZOFF5QHVY"
}
}
]
} |