Spaces:

LakshmiHarika
/

MachineLearning

Sleeping

App Files Files Community

LakshmiHarika commited on Dec 28, 2024

Commit

68524c0

verified ·

1 Parent(s): 96130b6

Update pages/Data Collection.py

Browse files

Files changed (1) hide show

pages/Data Collection.py +309 -5

pages/Data Collection.py CHANGED Viewed

@@ -1979,16 +1979,320 @@ elif st.session_state.current_page == "explore_csv":
 #--------------------------------------------------------- Json --------------------------------------------------------------------------------
 elif st.session_state.current_page == "explore_json":
     st.markdown("""
-        <h3 style="color: #e25822;">Exploring JSON</h3>
     """, unsafe_allow_html=True)
     st.write("""
-    JSON is a semi-structured format used for APIs and data exchange.
     """)
-    if st.button("Go Back"):
-        navigate_to("main")
 #--------------------------------------------------------- XML -------------------------------------------------------------------------------

 #--------------------------------------------------------- Json --------------------------------------------------------------------------------
 elif st.session_state.current_page == "explore_json":
     st.markdown("""
+        <h2 style="color: #BB3385;">JavaScript Object Notation (JSON)</h2>
     """, unsafe_allow_html=True)
     st.write("""
+    - **JSON (JavaScript Object Notation)** is a lightweight data-interchange format.
+    - It is easy for humans to read and write, and easy for machines to parse and generate.
+    - JSON is used to represent data as key-value pairs and supports hierarchical structures.
+    - Commonly used for:
+        - Web APIs for sending and receiving data.
+        - Configuration files.
+        - Storing structured and semi-structured data.
     """)
+    st.markdown("""
+    <h3 style="color: #5b2c6f;">Default JSON Format</h3>
+    """, unsafe_allow_html=True)
+    st.write("""
+    - JSON format is similar to a Python dictionary with key-value pairs.
+    - The main difference between JSON and a Python dictionary is:
+        - **In JSON**:
+            - Keys must be in string format.
+            - Values can be of various types (e.g., strings, numbers, arrays, objects).
+        - **In Python Dictionary**:
+            - Keys can be any hashable type (e.g., strings, numbers, tuples).
+    """)
+    st.markdown("""
+        <h4 style="color: #2a52be;">Example</h4>
+    """, unsafe_allow_html=True)
+    st.code("""
+    # JSON Format
+    {
+        "name": ["a", "b", "c"],
+        "age": [11, 12, 13]
+    }
+    """, language="json")
+    st.code("""
+    # Python Dictionary
+    {
+        "name": ["a", "b", "c"],
+        "age": [11, 12, 13]
+    }
+    """, language="python")
+    st.markdown("""
+    <h3 style="color: #5b2c6f;">JSON in Structured Data</h3>
+    """, unsafe_allow_html=True)
+    st.write("""
+    - JSON is considered structured when it has a consistent format with uniform key-value pairs for all entries.
+    - This allows direct conversion into a tabular format, such as a DataFrame.
+    """)
+    st.code("""
+    # Example of Structured JSON
+    [
+        { "Id": 100, "Name": "Lakshmi Harika", "Age": 22, "Gender": "Female" },
+        { "Id": 101, "Name": "Varshitha", "Age": 23, "Gender": "Female" },
+        { "Id": 102, "Name": "Hari Chandan", "Age": 22, "Gender": "Male" },
+        { "Id": 103, "Name": "Shamitha", "Age": 23, "Gender": "Female" }
+    ]
+    """, language="json")
+    st.code("""
+    # Reading a structured JSON file
+    df = pd.read_json('structured_data.json')
+    print(df)
+    """, language="python")
+    st.markdown("""
+    <h3 style="color: #5b2c6f;">JSON Orientations in Structured Data</h2>
+    """, unsafe_allow_html=True)
+    st.write("""
+    - JSON can represent data in various orientations using the `orient` parameter in `pandas.to_json()` or `pandas.read_json()`.""")
+    st.markdown("""
+        <h4 style="color: #2a52be;">JSON with Orient = 'index'</h4>
+    """, unsafe_allow_html=True)
+    st.write("""
+    - When **`orient='index'`**:
+        - In this format, keys represent row indices, and the values are dictionaries of column names and their respective data.
+        - It is useful when the data is naturally indexed.
+    """)
+    st.code("""
+    # Example of JSON with orient='index'
+    {
+        "0": { "Id": 100, "Name": "Lakshmi Harika", "Age": 22, "Gender": "Female" },
+        "1": { "Id": 101, "Name": "Varshitha", "Age": 23, "Gender": "Female" },
+        "2": { "Id": 102, "Name": "Hari Chandan", "Age": 22, "Gender": "Male" },
+        "3": { "Id": 103, "Name": "Shamitha", "Age": 23, "Gender": "Female" }
+    }
+    """, language="json")
+    st.code("""
+    # Creating a DataFrame
+    data = pd.DataFrame({
+        "Id": [100, 101, 102, 103],
+        "Name": ["Lakshmi Harika", "Varshitha", "Hari Chandan", "Shamitha"],
+        "Age": [22, 23, 22, 23],
+        "Gender": ["Female", "Female", "Male", "Female"]
+    })
+    # Exporting to JSON with orient='index'
+    json_data = data.to_json(orient='index')
+    print(json_data)
+    # Reading back from JSON with orient='index'
+    df = pd.read_json(json_data, orient='index')
+    print(df)
+    """, language="python")
+    st.markdown("""
+        <h4 style="color: #2a52be;">JSON with Orient = 'columns'</h4>
+    """, unsafe_allow_html=True)
+    st.write("""
+    - When **`orient='columns'`**:
+        - Keys represent column names, and the values are dictionaries where each key is the row index, and the value is the data.
+        - This is the default orientation when exporting DataFrames to JSON.
+    """)
+    st.code("""
+    # Example of JSON with orient='columns'
+    {
+        "Id": { "0": 100, "1": 101, "2": 102, "3": 103 },
+        "Name": { "0": "Lakshmi Harika", "1": "Varshitha", "2": "Hari Chandan", "3": "Shamitha" },
+        "Age": { "0": 22, "1": 23, "2": 22, "3": 23 },
+        "Gender": { "0": "Female", "1": "Female", "2": "Male", "3": "Female" }
+    }
+    """, language="json")
+    st.code("""
+    # Creating a DataFrame
+    data = pd.DataFrame({
+        "Id": [100, 101, 102, 103],
+        "Name": ["Lakshmi Harika", "Varshitha", "Hari Chandan", "Shamitha"],
+        "Age": [22, 23, 22, 23],
+        "Gender": ["Female", "Female", "Male", "Female"]
+    })
+    # Exporting to JSON with orient='columns'
+    json_data = data.to_json(orient='columns')
+    print(json_data)
+    # Reading back from JSON with orient='columns'
+    df = pd.read_json(json_data, orient='columns')
+    print(df)
+    """, language="python")
+    st.markdown("""
+    <h4 style="color: #2a52be;">JSON with Orient = 'values'</h4>
+    """, unsafe_allow_html=True)
+    st.write("""
+    - When **`orient='values'`**:
+        - The JSON represents the data as an array of arrays.
+        - Each inner array corresponds to a row of data, and the order matches the DataFrame’s column order.
+    """)
+    st.code("""
+    # Example of JSON with orient='values'
+    [
+        [100, "Lakshmi Harika", 22, "Female"],
+        [101, "Varshitha", 23, "Female"],
+        [102, "Hari Chandan", 22, "Male"],
+        [103, "Shamitha", 23, "Female"]
+    ]
+    """, language="json")
+    st.code("""
+    # Creating a DataFrame
+    data = pd.DataFrame({
+        "Id": [100, 101, 102, 103],
+        "Name": ["Lakshmi Harika", "Varshitha", "Hari Chandan", "Shamitha"],
+        "Age": [22, 23, 22, 23],
+        "Gender": ["Female", "Female", "Male", "Female"]
+    })
+    # Exporting to JSON with orient='values'
+    json_data = data.to_json(orient='values')
+    print(json_data)
+    # Reading back from JSON with orient='values'
+    df = pd.read_json(json_data, orient='values')
+    print(df)
+    """, language="python")
+    st.markdown("""
+    <h4 style="color: #2a52be;">JSON with Orient = 'split'</h4>
+    """, unsafe_allow_html=True)
+    st.write("""
+    - When **`orient='split'`**:
+        - The JSON structure splits the data into three parts:
+            1. `index`: Contains the row indices.
+            2. `columns`: Contains the column names.
+            3. `data`: Contains the actual data as a 2D array.
+        - This orientation is useful for reconstructing the original DataFrame structure.
+    """)
+    st.code("""
+    # Example of JSON with orient='split'
+    {
+        "index": [0, 1, 2, 3],
+        "columns": ["Id", "Name", "Age", "Gender"],
+        "data": [
+            [100, "Lakshmi Harika", 22, "Female"],
+            [101, "Varshitha", 23, "Female"],
+            [102, "Hari Chandan", 22, "Male"],
+            [103, "Shamitha", 23, "Female"]
+        ]
+    }
+    """, language="json")
+    st.code("""
+    # Creating a DataFrame
+    data = pd.DataFrame({
+        "Id": [100, 101, 102, 103],
+        "Name": ["Lakshmi Harika", "Varshitha", "Hari Chandan", "Shamitha"],
+        "Age": [22, 23, 22, 23],
+        "Gender": ["Female", "Female", "Male", "Female"]
+    })
+    # Exporting to JSON with orient='split'
+    json_data = data.to_json(orient='split')
+    print(json_data)
+    # Reading back from JSON with orient='split'
+    df = pd.read_json(json_data, orient='split')
+    print(df)
+    """, language="python")
+    st.markdown("""
+    <h3 style="color: #5b2c6f;">JSON in Semi-Structured Data</h3>
+    """, unsafe_allow_html=True)
+    st.write("""
+    - If the JSON file is in semi-structured format, we can use `pd.json_normalize()` to convert it into a DataFrame.
+    - **Semi-Structured JSON**:
+        - A JSON structure is considered semi-structured when one or multiple columns contain data in the form of lists of dictionaries.
+        - This format requires flattening or normalization to be converted into a tabular structure.
+    - When using `pd.json_normalize()`, ensure the data is in a **dictionary format**; otherwise, it will throw an error.
+    """)
+    st.code("""
+    # Example Nested JSON
+    x = [
+        {"name": "Lakshmi Harika", "age": 23, "gender": "f", "marks": [{"maths": 75, "English": 82}]},
+        {"name": "Varshitha", "age": 43, "gender": "f", "marks": [{"maths": 65, "English": 72}]},
+        {"name": "Hari Chandan", "age": 28, "gender": "m", "marks": [{"maths": 85, "English": 92}]},
+        {"name": "Shamitha", "age": 21, "gender": "f", "marks": [{"maths": 90, "English": 88}]}
+    ]
+    """, language="python")
+    st.markdown("""
+        <h4 style="color: #5b2c6f;">Parameters to Understand</h4>
+    """, unsafe_allow_html=True)
+    st.write("""
+    1. **record_path**:
+        - Specifies the path to nested lists or dictionaries that need to be flattened.
+        - Example: For the key `marks`, the path would be `'marks'`.
+    2. **meta**:
+        - Specifies fields to include as metadata in the resulting DataFrame.
+        - These fields remain unchanged and are added to the resulting DataFrame.
+    3. **max_level**:
+        - Controls the depth of the flattening.
+        - Default is `None` (flattens everything), but setting it to a specific number limits the depth.
+    """)
+    st.code("""
+    # Example of Semi-Structured JSON
+    df = pd.json_normalize(
+        x,
+        record_path=['marks'],  # Specifies the nested list to flatten
+        meta=['name', 'age', 'gender'],  # Fields to include as metadata
+        max_level=1  # Specifies the depth of flattening
+    )
+    print(df)
+    """, language="python")
+    st.write("""
+    The resulting DataFrame will look like this:
+    """)
+    st.table({
+        "maths": [75, 65, 85, 90],
+        "English": [82, 72, 92, 88],
+        "name": ["Lakshmi Harika", "Varshitha", "Hari Chandan", "Shamitha"],
+        "age": [23, 43, 28, 21],
+        "gender": ["f", "f", "m", "f"]
+    })
 #--------------------------------------------------------- XML -------------------------------------------------------------------------------