Spaces:

Yashvj123
/

Zero_To_Hero_ML

Sleeping

File size: 4,003 Bytes
{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "provenance": []
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "language_info": {
      "name": "python"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "source": [
        "## **Reading JSON File**"
      ],
      "metadata": {
        "id": "4L2BwncXK7Uv"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "import pandas as pd\n",
        "\n",
        "pd.read_json(\"sample.json\")"
      ],
      "metadata": {
        "id": "sVqhFskxK-r5"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "## **Handling Structured JSON**\n",
        "\n",
        "- The `orient` parameter in `pd.read_json()` specifies the format of JSON data being read:\n",
        "    - **\"split\"**: Dictionary format with keys as \"index\", \"columns\", and \"data\".\n",
        "    - **\"records\"**: List of dictionaries where each dictionary represents a row.\n",
        "    - **\"index\"**: Dictionary format with row indices as keys and dictionaries of column data as values.\n",
        "    - **\"columns\"**: Default format where keys are column names and values are arrays of data"
      ],
      "metadata": {
        "id": "clN_z7L8K-zD"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "import pandas as pd\n",
        "\n",
        "# Sample Structured JSON\n",
        "structured_json = {\n",
        "    \"name\": [\"John\", \"Doe\", \"Jane\"],\n",
        "    \"age\": [30, 25, 28],\n",
        "    \"city\": [\"New York\", \"Los Angeles\", \"Chicago\"]\n",
        "}\n",
        "\n",
        "# Reading JSON with different 'orient' values\n",
        "df_default = pd.read_json('structured.json')  # Default (columns)\n",
        "df_split = pd.read_json('structured.json', orient='split')\n",
        "df_index = pd.read_json('structured.json', orient='index')\n",
        "\n",
        "print(df_default)\n",
        "print(df_split)\n",
        "print(df_index)"
      ],
      "metadata": {
        "id": "yFW5vsHkK-6Q"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "## **Handling Semi-Structured JSON**\n",
        "\n",
        "- `pandas.json_normalize()` is used to flatten nested JSON objects into a DataFrame.\n",
        "    - **`record_path`**: Specifies the path in the JSON to extract records from nested lists.\n",
        "    - **`meta`**: Includes additional metadata fields from parent records.\n",
        "    - **`max_level`**: Limits the number of levels to flatten."
      ],
      "metadata": {
        "id": "-xtBajp5K_Ge"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "import pandas as pd\n",
        "import json\n",
        "\n",
        "# Sample Semi-Structured JSON\n",
        "semi_structured_json = [\n",
        "    {\n",
        "        \"name\": \"John\",\n",
        "        \"age\": 30,\n",
        "        \"address\": {\"city\": \"New York\", \"zip\": \"10001\"},\n",
        "        \"skills\": [\"Python\", \"SQL\"]\n",
        "    },\n",
        "    {\n",
        "        \"name\": \"Jane\",\n",
        "        \"age\": 28,\n",
        "        \"address\": {\"city\": \"Chicago\", \"zip\": \"60601\"},\n",
        "        \"skills\": [\"Java\", \"C++\"]\n",
        "    }\n",
        "]\n",
        "\n",
        "# Flattening nested JSON\n",
        "df = pd.json_normalize(\n",
        "    semi_structured_json,\n",
        "    record_path=['skills'],\n",
        "    meta=['name', 'age', ['address', 'city'], ['address', 'zip']],\n",
        "    max_level=1\n",
        ")\n",
        "\n",
        "print(df)"
      ],
      "metadata": {
        "id": "7tfcZuiRK_Ln"
      },
      "execution_count": null,
      "outputs": []
    }
  ]
}