Jonathandav
/

GoodReads-Rating-Predictor

Tabular Classification

Model card Files Files and versions

Jonathandav commited on 19 days ago

Commit

3468fd2

·

verified ·

1 Parent(s): e39a23a

Update README.md

Files changed (1) hide show

README.md +14 -0

README.md CHANGED Viewed

@@ -43,21 +43,35 @@ We began by uncovering the natural relationships in the data. Our analysis revea
 ![EDA Q1](Q1.png)
 *Figure 1: Identifying the "center" of the data to justify our classification threshold.*
 **Question 2: Does the "Hype" (number of reviews) correlate with the Score?**
 ![EDA Q2](Q2.png)
 *Figure 2: Checking if high-volume books (popular) are rated better than niche books.*
 **Question 3: Are longer books rated higher or lower?**
 ![EDA Q3](Q3.png)
 *Figure 3: Investigating if "Epic" length contributes to higher perceived quality.*
 **Question 4: Which genres dominate the high-rating charts?**
 ![EDA Q4](Q4.png)
 *Figure 4: Determining if 'Genre' is a strong predictor of success.
 ### 3. Feature Engineering: The "Author Reputation" Signal
 The most significant breakthrough came from engineering the **Author Reputation Score**. By calculating the historical average rating for each author, we gave the model a "human" insight into quality that raw metadata lacks.

 ![EDA Q1](Q1.png)
 *Figure 1: Identifying the "center" of the data to justify our classification threshold.*
+**Insights:** We can clearly see how the most common average rating among the books in the dataset centers around a 4.0 rating.
+This makes sense since people tend to read and finish books they already thought they would like.
 **Question 2: Does the "Hype" (number of reviews) correlate with the Score?**
 ![EDA Q2](Q2.png)
 *Figure 2: Checking if high-volume books (popular) are rated better than niche books.*
+**Insights:** We can see the cloud of dots thickening as the ratings increase. This shows how less rated books can have extreme average ratings, both higher and lower, since each review matters more, while popular books with many reviews center around a 3.8 to 4.2 rating.
+**Important note:** The cloud is mostly even, which means there is little correlation between a book's popularity and a book's rating.
 **Question 3: Are longer books rated higher or lower?**
 ![EDA Q3](Q3.png)
 *Figure 3: Investigating if "Epic" length contributes to higher perceived quality.*
+**Insights:** We can see the black line has a slight upward tilt, which means there is a slight positive correlation between a book length and its average rating.
+This makes sense as readers of longer books (800+ pages) tend to be more invested and bigger fans of the book.
 **Question 4: Which genres dominate the high-rating charts?**
 ![EDA Q4](Q4.png)
 *Figure 4: Determining if 'Genre' is a strong predictor of success.
+**Insights:** We can see that the best rated genres are the sequential books and surprisingly the Unknown genres, which we named the rows where the data was missing.
+This plot teaches us that the genre has a big influence on the rating of a book.
 ### 3. Feature Engineering: The "Author Reputation" Signal
 The most significant breakthrough came from engineering the **Author Reputation Score**. By calculating the historical average rating for each author, we gave the model a "human" insight into quality that raw metadata lacks.