trohith89 commited on
Commit
cb23e1c
·
verified ·
1 Parent(s): c0164c0

Update pages/3_EDA_and_Feature_Engineering.py

Browse files
pages/3_EDA_and_Feature_Engineering.py CHANGED
@@ -280,18 +280,18 @@ if df is not None:
280
  plt.tight_layout()
281
  st.pyplot(fig)
282
 
283
- st.markdown('''**Insights :**
284
 
285
- - **Category Distribution** : The distribution of products across categories (Smartphones, Smart Watches, Tablets, Laptops, Headphones) is relatively uniform, with slight variations. This suggests a diverse product catalog.
286
- Purchase Intent: It appears that "Purchase Intent = 1" (meaning intent to purchase is present) is fairly consistent across categories, with no category showing a significantly higher or lower proportion of purchase intent.
287
 
288
- - **Brand Distribution** :The distribution of brands is less uniform. "Other Brands" seems to have the highest representation, followed by Samsung, Sony, HP, and then Apple.
289
- Purchase Intent: Observe if there are any notable differences in the proportion of "Purchase Intent = 1" between different brands. This could indicate if certain brands are more desirable or effective at converting interest into purchases.
290
 
291
  - **Price Distribution**: The price histogram indicates a wide range of product prices, likely spanning from near 0 to 3000 (assuming the x-axis represents price).
292
- Purchase Intent: Examine how purchase intent varies across different price points. Are there price ranges where purchase intent is higher or lower? This could reveal price sensitivity or the effectiveness of pricing strategies.''')
293
-
294
- # Set up the subplots grid: 1 row and 3 columns
295
  fig, axs = plt.subplots(1, 3, figsize=(18*0.7, 6*0.7))
296
  axs = axs.flatten() # Flatten the 2D array of axes to easily index
297
 
@@ -317,10 +317,13 @@ if df is not None:
317
  plt.tight_layout() # Correct method name here
318
  st.pyplot(fig)
319
 
320
- st.markdown('''**Insights :**
 
 
321
 
322
- - **Uneven Distribution:** There's a significant difference in the number of customers in each gender category. The category represented by '1' (likely female) has a much higher count than the category represented by '0' (likely male). This indicates that your customer base is skewed towards one gender.
323
- - **Purchase Intent:** The proportion of "Purchase Intent = 1" (meaning the intent to purchase is present) appears to be relatively similar between the two genders. The purple bars (Purchase Intent = 1) are proportionally similar in height for both genders.''')
 
324
 
325
  st.write("### PRODUCT VS BRANDS")
326
  # Create the plot
@@ -362,17 +365,20 @@ if df is not None:
362
 
363
  # Render the plot in Streamlit
364
  st.plotly_chart(fig)
365
- st.markdown('''**Insights :**
366
- - **Price Range:** The x-axis shows a price range likely from 0 to 3000 (units unspecified, but presumably currency).
 
 
367
 
368
- - **Category Distribution Across Price:** The stacked areas illustrate how the proportion of each product category varies across the price spectrum.
369
- 1. .**Smartphones (Black):** Appear to be concentrated in the lower to mid-price ranges, with fewer smartphones at the higher price points.
370
- 2. **Smart Watches (Red):** Show a relatively consistent distribution across the price range, though perhaps slightly more prevalent in the mid-range.
371
- 3. **Tablets (Yellow):** Seem to be more common in the mid-price range, with fewer tablets at both the low and high ends.
372
- 4. **Laptops (White):** Tend to dominate the higher price ranges, as expected. There are very few laptops at the lower price points.
373
- 5. **Headphones (Light Blue):** Have a fairly even distribution across the price range, although there's a slight increase in the mid-to-high price range.
374
 
375
- - **Overlapping Areas:** The stacked nature of the chart allows you to see the total number of products at each price point by summing the heights of the stacked areas.''')
 
 
376
 
377
  st.write("### BRANDS VS PRICE")
378
  # Create the histogram plot
@@ -381,20 +387,24 @@ if df is not None:
381
 
382
  # Render the plot in Streamlit
383
  st.plotly_chart(fig)
384
- st.markdown('''**Insights :**
385
-
386
- - **Price Range:** The x-axis covers a price range, likely from 0 to 3000 (currency unspecified).
387
-
388
- - **Brand Distribution Across Price:** The stacked bars show the count of products from each brand within different price intervals.
389
 
390
- 1. **Apple (Darkest Purple/Blue):** Appears to have a significant presence across most of the price range, though perhaps slightly less so at the very lowest end.
391
 
392
- 2. **HP (Medium Purple):** Also has a fairly broad distribution across price points, with a noticeable presence in the mid-range.
 
 
 
 
 
 
 
 
393
 
394
- 3. **Sony (Lighter Purple):** Seems to be more concentrated in the mid-to-high price range.
395
- 4. **Samsung (Lightest Purple/Pink):** Has a presence across the price range, but seems to be more prominent in the mid-range and slightly lower-mid range.
396
- 5. **Other Brands (Darkest Purple/Blue, sometimes hard to distinguish from Apple):** This category seems to have a substantial presence across all price points, particularly at the lower end. This suggests a large variety of less prominent brands catering to different price segments.
397
- - **Overlapping Areas/Stacked Bars:** The stacked nature of the chart shows the total number of products at each price point by adding up the heights of the different brand segments.''')
398
 
399
  st.write("### AGE vs PRODUCT CATEGORY and PRICE")
400
  # Create the histogram plot
@@ -402,20 +412,26 @@ if df is not None:
402
 
403
  # Render the plot in Streamlit
404
  st.plotly_chart(fig)
405
- st.markdown('''**Insights :**
406
-
407
- - **Category Distribution Across Age:** The stacked bars illustrate how the proportion of each product category contributes to the total orders within each age group.
408
 
409
- 1. **Smartphones (Blue):** Appear to have a fairly consistent demand across all age groups, forming the base of most stacks. This suggests smartphones are a popular category regardless of age.
410
- 2. **Smart Watches (Red):** Show a notable presence, with potentially higher contributions in the younger and middle-age groups. This could indicate that smartwatches are more popular among these demographics.
411
- 3. **Tablets (Green):** Have a somewhat consistent demand across age groups, similar to smartphones but with a smaller overall contribution to total orders.
412
- 4. **Laptops (Purple):** Appear to have a strong presence across all age groups, often rivaling or exceeding smartphones in contribution. This suggests laptops are essential for a wide range of ages.
413
- 5. **Headphones (Orange):** Show a relatively consistent pattern across age groups, with a moderate contribution to total orders.
 
 
 
 
414
 
415
- - Insights:
416
 
417
- 1. **Age-Related Preferences:** While some categories like smartphones and laptops seem to have broad appeal, there are hints of age-related preferences. For example, smartwatches might be more popular among younger demographics.
418
- 2. **Dominant Categories:** Smartphones and laptops appear to be the most consistently popular categories across most age groups.''')
 
 
 
419
 
420
  st.write("### HEATMAP | CORRELATION MATRIX")
421
  st.write("#### Label Encoding")
@@ -465,12 +481,13 @@ if df is not None:
465
 
466
  # Display insights in Streamlit
467
  st.markdown('''**Insights:**
468
-
469
  Correlation is a statistical measure that indicates the strength and direction of the linear relationship between two variables. The correlation coefficient ranges from -1 to 1, with the following interpretations:
470
 
471
- - -1: Perfect negative correlation (as one variable increases, the other decreases)
472
- - 0: No correlation (the variables are independent)
473
- - 1: Perfect positive correlation (as one variable increases, the other increases)''')
 
 
474
 
475
  else:
476
  st.error("No dataset found. Please upload a dataset on the main page first.")
 
280
  plt.tight_layout()
281
  st.pyplot(fig)
282
 
283
+ st.markdown('''**Insights:**
284
 
285
+ - **Category Distribution**: The distribution of products across categories (Smartphones, Smart Watches, Tablets, Laptops, Headphones) is relatively uniform, with slight variations. This suggests a diverse product catalog.
286
+ - **Purchase Intent**: It appears that "Purchase Intent = 1" (meaning intent to purchase is present) is fairly consistent across categories, with no category showing a significantly higher or lower proportion of purchase intent.
287
 
288
+ - **Brand Distribution**: The distribution of brands is less uniform. "Other Brands" seems to have the highest representation, followed by Samsung, Sony, HP, and then Apple.
289
+ - **Purchase Intent**: Observe if there are any notable differences in the proportion of "Purchase Intent = 1" between different brands. This could indicate if certain brands are more desirable or effective at converting interest into purchases.
290
 
291
  - **Price Distribution**: The price histogram indicates a wide range of product prices, likely spanning from near 0 to 3000 (assuming the x-axis represents price).
292
+ - **Purchase Intent**: Examine how purchase intent varies across different price points. Are there price ranges where purchase intent is higher or lower? This could reveal price sensitivity or the effectiveness of pricing strategies.
293
+ ''')
294
+
295
  fig, axs = plt.subplots(1, 3, figsize=(18*0.7, 6*0.7))
296
  axs = axs.flatten() # Flatten the 2D array of axes to easily index
297
 
 
317
  plt.tight_layout() # Correct method name here
318
  st.pyplot(fig)
319
 
320
+ st.markdown('''**Insights:**
321
+
322
+ - **Uneven Distribution**: There's a significant difference in the number of customers in each gender category. The category represented by '1' (likely female) has a much higher count than the category represented by '0' (likely male). This indicates that your customer base is skewed towards one gender.
323
 
324
+ - **Purchase Intent**: The proportion of "Purchase Intent = 1" (meaning the intent to purchase is present) appears to be relatively similar between the two genders. The purple bars (Purchase Intent = 1) are proportionally similar in height for both genders.
325
+ ''')
326
+
327
 
328
  st.write("### PRODUCT VS BRANDS")
329
  # Create the plot
 
365
 
366
  # Render the plot in Streamlit
367
  st.plotly_chart(fig)
368
+ st.markdown('''**Insights:**
369
+ - **Price Range**: The x-axis shows a price range likely from 0 to 3000 (units unspecified, but presumably currency).
370
+
371
+ - **Category Distribution Across Price**: The stacked areas illustrate how the proportion of each product category varies across the price spectrum.
372
 
373
+ 1. **Smartphones (Black)**: Appear to be concentrated in the lower to mid-price ranges, with fewer smartphones at the higher price points.
374
+ 2. **Smart Watches (Red)**: Show a relatively consistent distribution across the price range, though perhaps slightly more prevalent in the mid-range.
375
+ 3. **Tablets (Yellow)**: Seem to be more common in the mid-price range, with fewer tablets at both the low and high ends.
376
+ 4. **Laptops (White)**: Tend to dominate the higher price ranges, as expected. There are very few laptops at the lower price points.
377
+ 5. **Headphones (Light Blue)**: Have a fairly even distribution across the price range, although there's a slight increase in the mid-to-high price range.
 
378
 
379
+ - **Overlapping Areas**: The stacked nature of the chart allows you to see the total number of products at each price point by summing the heights of the stacked areas.
380
+ ''')
381
+
382
 
383
  st.write("### BRANDS VS PRICE")
384
  # Create the histogram plot
 
387
 
388
  # Render the plot in Streamlit
389
  st.plotly_chart(fig)
390
+ st.markdown('''**Insights:**
391
+ - **Price Range**: The x-axis covers a price range, likely from 0 to 3000 (currency unspecified).
 
 
 
392
 
393
+ - **Brand Distribution Across Price**: The stacked bars show the count of products from each brand within different price intervals.
394
 
395
+ 1. **Apple (Darkest Purple/Blue)**: Appears to have a significant presence across most of the price range, though perhaps slightly less so at the very lowest end.
396
+
397
+ 2. **HP (Medium Purple)**: Also has a fairly broad distribution across price points, with a noticeable presence in the mid-range.
398
+
399
+ 3. **Sony (Lighter Purple)**: Seems to be more concentrated in the mid-to-high price range.
400
+
401
+ 4. **Samsung (Lightest Purple/Pink)**: Has a presence across the price range, but seems to be more prominent in the mid-range and slightly lower-mid range.
402
+
403
+ 5. **Other Brands (Darkest Purple/Blue, sometimes hard to distinguish from Apple)**: This category seems to have a substantial presence across all price points, particularly at the lower end. This suggests a large variety of less prominent brands catering to different price segments.
404
 
405
+ - **Overlapping Areas/Stacked Bars**: The stacked nature of the chart shows the total number of products at each price point by adding up the heights of the different brand segments.
406
+ ''')
407
+
 
408
 
409
  st.write("### AGE vs PRODUCT CATEGORY and PRICE")
410
  # Create the histogram plot
 
412
 
413
  # Render the plot in Streamlit
414
  st.plotly_chart(fig)
415
+ st.markdown('''**Insights:**
416
+ - **Category Distribution Across Age**: The stacked bars illustrate how the proportion of each product category contributes to the total orders within each age group.
 
417
 
418
+ 1. **Smartphones (Blue)**: Appear to have a fairly consistent demand across all age groups, forming the base of most stacks. This suggests smartphones are a popular category regardless of age.
419
+
420
+ 2. **Smart Watches (Red)**: Show a notable presence, with potentially higher contributions in the younger and middle-age groups. This could indicate that smartwatches are more popular among these demographics.
421
+
422
+ 3. **Tablets (Green)**: Have a somewhat consistent demand across age groups, similar to smartphones but with a smaller overall contribution to total orders.
423
+
424
+ 4. **Laptops (Purple)**: Appear to have a strong presence across all age groups, often rivaling or exceeding smartphones in contribution. This suggests laptops are essential for a wide range of ages.
425
+
426
+ 5. **Headphones (Orange)**: Show a relatively consistent pattern across age groups, with a moderate contribution to total orders.
427
 
428
+ - **Insights**:
429
 
430
+ 1. **Age-Related Preferences**: While some categories like smartphones and laptops seem to have broad appeal, there are hints of age-related preferences. For example, smartwatches might be more popular among younger demographics.
431
+
432
+ 2. **Dominant Categories**: Smartphones and laptops appear to be the most consistently popular categories across most age groups.
433
+ ''')
434
+
435
 
436
  st.write("### HEATMAP | CORRELATION MATRIX")
437
  st.write("#### Label Encoding")
 
481
 
482
  # Display insights in Streamlit
483
  st.markdown('''**Insights:**
 
484
  Correlation is a statistical measure that indicates the strength and direction of the linear relationship between two variables. The correlation coefficient ranges from -1 to 1, with the following interpretations:
485
 
486
+ - **-1**: Perfect negative correlation (as one variable increases, the other decreases)
487
+ - **0**: No correlation (the variables are independent)
488
+ - **1**: Perfect positive correlation (as one variable increases, the other increases)
489
+ ''')
490
+
491
 
492
  else:
493
  st.error("No dataset found. Please upload a dataset on the main page first.")