{ "nbformat": 4, "nbformat_minor": 0, "metadata": { "colab": { "provenance": [] }, "kernelspec": { "name": "python3", "display_name": "Python 3" }, "language_info": { "name": "python" } }, "cells": [ { "cell_type": "markdown", "source": [ "## **Reading JSON File**" ], "metadata": { "id": "4L2BwncXK7Uv" } }, { "cell_type": "code", "source": [ "import pandas as pd\n", "\n", "pd.read_json(\"sample.json\")" ], "metadata": { "id": "sVqhFskxK-r5" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "## **Handling Structured JSON**\n", "\n", "- The `orient` parameter in `pd.read_json()` specifies the format of JSON data being read:\n", " - **\"split\"**: Dictionary format with keys as \"index\", \"columns\", and \"data\".\n", " - **\"records\"**: List of dictionaries where each dictionary represents a row.\n", " - **\"index\"**: Dictionary format with row indices as keys and dictionaries of column data as values.\n", " - **\"columns\"**: Default format where keys are column names and values are arrays of data" ], "metadata": { "id": "clN_z7L8K-zD" } }, { "cell_type": "code", "source": [ "import pandas as pd\n", "\n", "# Sample Structured JSON\n", "structured_json = {\n", " \"name\": [\"John\", \"Doe\", \"Jane\"],\n", " \"age\": [30, 25, 28],\n", " \"city\": [\"New York\", \"Los Angeles\", \"Chicago\"]\n", "}\n", "\n", "# Reading JSON with different 'orient' values\n", "df_default = pd.read_json('structured.json') # Default (columns)\n", "df_split = pd.read_json('structured.json', orient='split')\n", "df_index = pd.read_json('structured.json', orient='index')\n", "\n", "print(df_default)\n", "print(df_split)\n", "print(df_index)" ], "metadata": { "id": "yFW5vsHkK-6Q" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "## **Handling Semi-Structured JSON**\n", "\n", "- `pandas.json_normalize()` is used to flatten nested JSON objects into a DataFrame.\n", " - **`record_path`**: Specifies the path in the JSON to extract records from nested lists.\n", " - **`meta`**: Includes additional metadata fields from parent records.\n", " - **`max_level`**: Limits the number of levels to flatten." ], "metadata": { "id": "-xtBajp5K_Ge" } }, { "cell_type": "code", "source": [ "import pandas as pd\n", "import json\n", "\n", "# Sample Semi-Structured JSON\n", "semi_structured_json = [\n", " {\n", " \"name\": \"John\",\n", " \"age\": 30,\n", " \"address\": {\"city\": \"New York\", \"zip\": \"10001\"},\n", " \"skills\": [\"Python\", \"SQL\"]\n", " },\n", " {\n", " \"name\": \"Jane\",\n", " \"age\": 28,\n", " \"address\": {\"city\": \"Chicago\", \"zip\": \"60601\"},\n", " \"skills\": [\"Java\", \"C++\"]\n", " }\n", "]\n", "\n", "# Flattening nested JSON\n", "df = pd.json_normalize(\n", " semi_structured_json,\n", " record_path=['skills'],\n", " meta=['name', 'age', ['address', 'city'], ['address', 'zip']],\n", " max_level=1\n", ")\n", "\n", "print(df)" ], "metadata": { "id": "7tfcZuiRK_Ln" }, "execution_count": null, "outputs": [] } ] }