Jonathandav
/

GoodReads-Rating-Predictor

Tabular Classification

Model card Files Files and versions

Jonathandav commited on 21 days ago

Commit

d6d1551

·

verified ·

1 Parent(s): bab0a76

Update README.md

Files changed (1) hide show

README.md +4 -4

README.md CHANGED Viewed

@@ -80,10 +80,10 @@ The most significant breakthrough came from engineering the **Author Reputation
 ### 4. Unsupervised Learning: Discovering Book "Personas"
 Using **K-Means Clustering**, we identified four distinct "Personas" within the dataset.
-* **The Classics:** High-age, stable-rating books.
-* **The Modern Epics:** High page count, high popularity.
-* **The Niche Gems:** Low review count, extremely high ratings.
-* **The Everyman Read:** Standard length and average popularity.
 ![PCA Cluster Visualization](PCA.png)
 *Figure 6: PCA projection of the 4-cluster K-Means model.*

 ### 4. Unsupervised Learning: Discovering Book "Personas"
 Using **K-Means Clustering**, we identified four distinct "Personas" within the dataset.
+* **Cluster 0 The Modern Epics:** These are the "Mainstream Blockbusters." They are thick books with massive popularity.
+* **Cluster 1 The Standard Read:** Defined by average length and average popularity. This is likely the largest group of standard fiction.
+* **Cluster 2 The Purple Legacy:** These are "The Classics." They are much older than the rest of the dataset. Because there aren't many 70-year-old books in a modern dataset, they appeared as "specks" in the PCA, but mathematically, their age makes them a very distinct, elite group.
+* **Cluster 3 The High-Quality Hidden Gems:** Defined by high author_rep_score and the highest average_rating, but the lowest rating_count_log. They have very few ratings (low hype), but the people who do read them love them, and they are written by top-tier authors.
 ![PCA Cluster Visualization](PCA.png)
 *Figure 6: PCA projection of the 4-cluster K-Means model.*