File size: 4,003 Bytes
7224768
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "provenance": []
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "language_info": {
      "name": "python"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "source": [
        "## **Reading JSON File**"
      ],
      "metadata": {
        "id": "4L2BwncXK7Uv"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "import pandas as pd\n",
        "\n",
        "pd.read_json(\"sample.json\")"
      ],
      "metadata": {
        "id": "sVqhFskxK-r5"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "## **Handling Structured JSON**\n",
        "\n",
        "- The `orient` parameter in `pd.read_json()` specifies the format of JSON data being read:\n",
        "    - **\"split\"**: Dictionary format with keys as \"index\", \"columns\", and \"data\".\n",
        "    - **\"records\"**: List of dictionaries where each dictionary represents a row.\n",
        "    - **\"index\"**: Dictionary format with row indices as keys and dictionaries of column data as values.\n",
        "    - **\"columns\"**: Default format where keys are column names and values are arrays of data"
      ],
      "metadata": {
        "id": "clN_z7L8K-zD"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "import pandas as pd\n",
        "\n",
        "# Sample Structured JSON\n",
        "structured_json = {\n",
        "    \"name\": [\"John\", \"Doe\", \"Jane\"],\n",
        "    \"age\": [30, 25, 28],\n",
        "    \"city\": [\"New York\", \"Los Angeles\", \"Chicago\"]\n",
        "}\n",
        "\n",
        "# Reading JSON with different 'orient' values\n",
        "df_default = pd.read_json('structured.json')  # Default (columns)\n",
        "df_split = pd.read_json('structured.json', orient='split')\n",
        "df_index = pd.read_json('structured.json', orient='index')\n",
        "\n",
        "print(df_default)\n",
        "print(df_split)\n",
        "print(df_index)"
      ],
      "metadata": {
        "id": "yFW5vsHkK-6Q"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "## **Handling Semi-Structured JSON**\n",
        "\n",
        "- `pandas.json_normalize()` is used to flatten nested JSON objects into a DataFrame.\n",
        "    - **`record_path`**: Specifies the path in the JSON to extract records from nested lists.\n",
        "    - **`meta`**: Includes additional metadata fields from parent records.\n",
        "    - **`max_level`**: Limits the number of levels to flatten."
      ],
      "metadata": {
        "id": "-xtBajp5K_Ge"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "import pandas as pd\n",
        "import json\n",
        "\n",
        "# Sample Semi-Structured JSON\n",
        "semi_structured_json = [\n",
        "    {\n",
        "        \"name\": \"John\",\n",
        "        \"age\": 30,\n",
        "        \"address\": {\"city\": \"New York\", \"zip\": \"10001\"},\n",
        "        \"skills\": [\"Python\", \"SQL\"]\n",
        "    },\n",
        "    {\n",
        "        \"name\": \"Jane\",\n",
        "        \"age\": 28,\n",
        "        \"address\": {\"city\": \"Chicago\", \"zip\": \"60601\"},\n",
        "        \"skills\": [\"Java\", \"C++\"]\n",
        "    }\n",
        "]\n",
        "\n",
        "# Flattening nested JSON\n",
        "df = pd.json_normalize(\n",
        "    semi_structured_json,\n",
        "    record_path=['skills'],\n",
        "    meta=['name', 'age', ['address', 'city'], ['address', 'zip']],\n",
        "    max_level=1\n",
        ")\n",
        "\n",
        "print(df)"
      ],
      "metadata": {
        "id": "7tfcZuiRK_Ln"
      },
      "execution_count": null,
      "outputs": []
    }
  ]
}