Jonathandav
/

GoodReads-Rating-Predictor

Tabular Classification

Model card Files Files and versions

Jonathandav commited on 26 days ago

Commit

b7b05df

·

verified ·

1 Parent(s): 2418ff9

Update README.md

Files changed (1) hide show

README.md +5 -5

README.md CHANGED Viewed

@@ -32,7 +32,7 @@ Our raw data was not model ready. Our first mission was to ensure every row was
 * **The Logarithmic Shift:** `rating_count` exhibited a "Long Tail" distribution. We applied a **Log Transformation** (`rating_count_log`) to normalize this scale, preventing high-popularity outliers from overwhelming the model's weight distribution.
 * **Impossible Values:** We filtered out "impossible" entries (e.g., 0-page books) and extreme edge cases (10,000+ page box sets) to focus the model on the standard retail book market.
-![Outlier Detection Boxplot](PASTE_LINK_OR_FILENAME_HERE)
 *Figure 2: Boxplot analysis identifying and filtering statistical outliers.*
 ### 2. Exploratory Data Analysis (EDA)
@@ -40,22 +40,22 @@ We began by uncovering the natural relationships in the data. Our analysis revea
 **Question 1: How are the book ratings distributed?**
-![EDA Q1](PASTE_LINK_OR_FILENAME_HERE)
 *Figure 1: Identifying the "center" of the data to justify our classification threshold.*
 **Question 2: Does the "Hype" (number of reviews) correlate with the Score?**
-![EDA Q2](PASTE_LINK_OR_FILENAME_HERE)
 *Figure 2: Checking if high-volume books (popular) are rated better than niche books.*
 **Question 3: Are longer books rated higher or lower?**
-![EDA Q3](PASTE_LINK_OR_FILENAME_HERE)
 *Figure 3: Investigating if "Epic" length contributes to higher perceived quality.*
 **Question 4: Which genres dominate the high-rating charts?**
-![EDA Q4](PASTE_LINK_OR_FILENAME_HERE)
 *Figure 4: Determining if 'Genre' is a strong predictor of success.
 ### 3. Feature Engineering: The "Author Reputation" Signal

 * **The Logarithmic Shift:** `rating_count` exhibited a "Long Tail" distribution. We applied a **Log Transformation** (`rating_count_log`) to normalize this scale, preventing high-popularity outliers from overwhelming the model's weight distribution.
 * **Impossible Values:** We filtered out "impossible" entries (e.g., 0-page books) and extreme edge cases (10,000+ page box sets) to focus the model on the standard retail book market.
+![Outlier Detection Boxplot](Outliers.png)
 *Figure 2: Boxplot analysis identifying and filtering statistical outliers.*
 ### 2. Exploratory Data Analysis (EDA)
 **Question 1: How are the book ratings distributed?**
+![EDA Q1](Q1.png)
 *Figure 1: Identifying the "center" of the data to justify our classification threshold.*
 **Question 2: Does the "Hype" (number of reviews) correlate with the Score?**
+![EDA Q2](Q2.png)
 *Figure 2: Checking if high-volume books (popular) are rated better than niche books.*
 **Question 3: Are longer books rated higher or lower?**
+![EDA Q3](Q3.png)
 *Figure 3: Investigating if "Epic" length contributes to higher perceived quality.*
 **Question 4: Which genres dominate the high-rating charts?**
+![EDA Q4](Q4.png)
 *Figure 4: Determining if 'Genre' is a strong predictor of success.
 ### 3. Feature Engineering: The "Author Reputation" Signal