Spaces:
Sleeping
Sleeping
File size: 4,003 Bytes
7224768 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 |
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"provenance": []
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"source": [
"## **Reading JSON File**"
],
"metadata": {
"id": "4L2BwncXK7Uv"
}
},
{
"cell_type": "code",
"source": [
"import pandas as pd\n",
"\n",
"pd.read_json(\"sample.json\")"
],
"metadata": {
"id": "sVqhFskxK-r5"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"## **Handling Structured JSON**\n",
"\n",
"- The `orient` parameter in `pd.read_json()` specifies the format of JSON data being read:\n",
" - **\"split\"**: Dictionary format with keys as \"index\", \"columns\", and \"data\".\n",
" - **\"records\"**: List of dictionaries where each dictionary represents a row.\n",
" - **\"index\"**: Dictionary format with row indices as keys and dictionaries of column data as values.\n",
" - **\"columns\"**: Default format where keys are column names and values are arrays of data"
],
"metadata": {
"id": "clN_z7L8K-zD"
}
},
{
"cell_type": "code",
"source": [
"import pandas as pd\n",
"\n",
"# Sample Structured JSON\n",
"structured_json = {\n",
" \"name\": [\"John\", \"Doe\", \"Jane\"],\n",
" \"age\": [30, 25, 28],\n",
" \"city\": [\"New York\", \"Los Angeles\", \"Chicago\"]\n",
"}\n",
"\n",
"# Reading JSON with different 'orient' values\n",
"df_default = pd.read_json('structured.json') # Default (columns)\n",
"df_split = pd.read_json('structured.json', orient='split')\n",
"df_index = pd.read_json('structured.json', orient='index')\n",
"\n",
"print(df_default)\n",
"print(df_split)\n",
"print(df_index)"
],
"metadata": {
"id": "yFW5vsHkK-6Q"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"## **Handling Semi-Structured JSON**\n",
"\n",
"- `pandas.json_normalize()` is used to flatten nested JSON objects into a DataFrame.\n",
" - **`record_path`**: Specifies the path in the JSON to extract records from nested lists.\n",
" - **`meta`**: Includes additional metadata fields from parent records.\n",
" - **`max_level`**: Limits the number of levels to flatten."
],
"metadata": {
"id": "-xtBajp5K_Ge"
}
},
{
"cell_type": "code",
"source": [
"import pandas as pd\n",
"import json\n",
"\n",
"# Sample Semi-Structured JSON\n",
"semi_structured_json = [\n",
" {\n",
" \"name\": \"John\",\n",
" \"age\": 30,\n",
" \"address\": {\"city\": \"New York\", \"zip\": \"10001\"},\n",
" \"skills\": [\"Python\", \"SQL\"]\n",
" },\n",
" {\n",
" \"name\": \"Jane\",\n",
" \"age\": 28,\n",
" \"address\": {\"city\": \"Chicago\", \"zip\": \"60601\"},\n",
" \"skills\": [\"Java\", \"C++\"]\n",
" }\n",
"]\n",
"\n",
"# Flattening nested JSON\n",
"df = pd.json_normalize(\n",
" semi_structured_json,\n",
" record_path=['skills'],\n",
" meta=['name', 'age', ['address', 'city'], ['address', 'zip']],\n",
" max_level=1\n",
")\n",
"\n",
"print(df)"
],
"metadata": {
"id": "7tfcZuiRK_Ln"
},
"execution_count": null,
"outputs": []
}
]
} |