Update README.md
Browse files
README.md
CHANGED
|
@@ -32,7 +32,7 @@ Our raw data was not model ready. Our first mission was to ensure every row was
|
|
| 32 |
* **The Logarithmic Shift:** `rating_count` exhibited a "Long Tail" distribution. We applied a **Log Transformation** (`rating_count_log`) to normalize this scale, preventing high-popularity outliers from overwhelming the model's weight distribution.
|
| 33 |
* **Impossible Values:** We filtered out "impossible" entries (e.g., 0-page books) and extreme edge cases (10,000+ page box sets) to focus the model on the standard retail book market.
|
| 34 |
|
| 35 |
-

|
|
@@ -40,22 +40,22 @@ We began by uncovering the natural relationships in the data. Our analysis revea
|
|
| 40 |
|
| 41 |
**Question 1: How are the book ratings distributed?**
|
| 42 |
|
| 43 |
-
 correlate with the Score?**
|
| 47 |
|
| 48 |
-
 are rated better than niche books.*
|
| 50 |
|
| 51 |
**Question 3: Are longer books rated higher or lower?**
|
| 52 |
|
| 53 |
-
 to normalize this scale, preventing high-popularity outliers from overwhelming the model's weight distribution.
|
| 33 |
* **Impossible Values:** We filtered out "impossible" entries (e.g., 0-page books) and extreme edge cases (10,000+ page box sets) to focus the model on the standard retail book market.
|
| 34 |
|
| 35 |
+

|
| 36 |
*Figure 2: Boxplot analysis identifying and filtering statistical outliers.*
|
| 37 |
|
| 38 |
### 2. Exploratory Data Analysis (EDA)
|
|
|
|
| 40 |
|
| 41 |
**Question 1: How are the book ratings distributed?**
|
| 42 |
|
| 43 |
+

|
| 44 |
*Figure 1: Identifying the "center" of the data to justify our classification threshold.*
|
| 45 |
|
| 46 |
**Question 2: Does the "Hype" (number of reviews) correlate with the Score?**
|
| 47 |
|
| 48 |
+

|
| 49 |
*Figure 2: Checking if high-volume books (popular) are rated better than niche books.*
|
| 50 |
|
| 51 |
**Question 3: Are longer books rated higher or lower?**
|
| 52 |
|
| 53 |
+

|
| 54 |
*Figure 3: Investigating if "Epic" length contributes to higher perceived quality.*
|
| 55 |
|
| 56 |
**Question 4: Which genres dominate the high-rating charts?**
|
| 57 |
|
| 58 |
+

|
| 59 |
*Figure 4: Determining if 'Genre' is a strong predictor of success.
|
| 60 |
|
| 61 |
### 3. Feature Engineering: The "Author Reputation" Signal
|