Jonathandav commited on
Commit
b7b05df
·
verified ·
1 Parent(s): 2418ff9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -32,7 +32,7 @@ Our raw data was not model ready. Our first mission was to ensure every row was
32
  * **The Logarithmic Shift:** `rating_count` exhibited a "Long Tail" distribution. We applied a **Log Transformation** (`rating_count_log`) to normalize this scale, preventing high-popularity outliers from overwhelming the model's weight distribution.
33
  * **Impossible Values:** We filtered out "impossible" entries (e.g., 0-page books) and extreme edge cases (10,000+ page box sets) to focus the model on the standard retail book market.
34
 
35
- ![Outlier Detection Boxplot](PASTE_LINK_OR_FILENAME_HERE)
36
  *Figure 2: Boxplot analysis identifying and filtering statistical outliers.*
37
 
38
  ### 2. Exploratory Data Analysis (EDA)
@@ -40,22 +40,22 @@ We began by uncovering the natural relationships in the data. Our analysis revea
40
 
41
  **Question 1: How are the book ratings distributed?**
42
 
43
- ![EDA Q1](PASTE_LINK_OR_FILENAME_HERE)
44
  *Figure 1: Identifying the "center" of the data to justify our classification threshold.*
45
 
46
  **Question 2: Does the "Hype" (number of reviews) correlate with the Score?**
47
 
48
- ![EDA Q2](PASTE_LINK_OR_FILENAME_HERE)
49
  *Figure 2: Checking if high-volume books (popular) are rated better than niche books.*
50
 
51
  **Question 3: Are longer books rated higher or lower?**
52
 
53
- ![EDA Q3](PASTE_LINK_OR_FILENAME_HERE)
54
  *Figure 3: Investigating if "Epic" length contributes to higher perceived quality.*
55
 
56
  **Question 4: Which genres dominate the high-rating charts?**
57
 
58
- ![EDA Q4](PASTE_LINK_OR_FILENAME_HERE)
59
  *Figure 4: Determining if 'Genre' is a strong predictor of success.
60
 
61
  ### 3. Feature Engineering: The "Author Reputation" Signal
 
32
  * **The Logarithmic Shift:** `rating_count` exhibited a "Long Tail" distribution. We applied a **Log Transformation** (`rating_count_log`) to normalize this scale, preventing high-popularity outliers from overwhelming the model's weight distribution.
33
  * **Impossible Values:** We filtered out "impossible" entries (e.g., 0-page books) and extreme edge cases (10,000+ page box sets) to focus the model on the standard retail book market.
34
 
35
+ ![Outlier Detection Boxplot](Outliers.png)
36
  *Figure 2: Boxplot analysis identifying and filtering statistical outliers.*
37
 
38
  ### 2. Exploratory Data Analysis (EDA)
 
40
 
41
  **Question 1: How are the book ratings distributed?**
42
 
43
+ ![EDA Q1](Q1.png)
44
  *Figure 1: Identifying the "center" of the data to justify our classification threshold.*
45
 
46
  **Question 2: Does the "Hype" (number of reviews) correlate with the Score?**
47
 
48
+ ![EDA Q2](Q2.png)
49
  *Figure 2: Checking if high-volume books (popular) are rated better than niche books.*
50
 
51
  **Question 3: Are longer books rated higher or lower?**
52
 
53
+ ![EDA Q3](Q3.png)
54
  *Figure 3: Investigating if "Epic" length contributes to higher perceived quality.*
55
 
56
  **Question 4: Which genres dominate the high-rating charts?**
57
 
58
+ ![EDA Q4](Q4.png)
59
  *Figure 4: Determining if 'Genre' is a strong predictor of success.
60
 
61
  ### 3. Feature Engineering: The "Author Reputation" Signal