Jonathandav commited on
Commit
d6d1551
·
verified ·
1 Parent(s): bab0a76

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -80,10 +80,10 @@ The most significant breakthrough came from engineering the **Author Reputation
80
 
81
  ### 4. Unsupervised Learning: Discovering Book "Personas"
82
  Using **K-Means Clustering**, we identified four distinct "Personas" within the dataset.
83
- * **The Classics:** High-age, stable-rating books.
84
- * **The Modern Epics:** High page count, high popularity.
85
- * **The Niche Gems:** Low review count, extremely high ratings.
86
- * **The Everyman Read:** Standard length and average popularity.
87
 
88
  ![PCA Cluster Visualization](PCA.png)
89
  *Figure 6: PCA projection of the 4-cluster K-Means model.*
 
80
 
81
  ### 4. Unsupervised Learning: Discovering Book "Personas"
82
  Using **K-Means Clustering**, we identified four distinct "Personas" within the dataset.
83
+ * **Cluster 0 The Modern Epics:** These are the "Mainstream Blockbusters." They are thick books with massive popularity.
84
+ * **Cluster 1 The Standard Read:** Defined by average length and average popularity. This is likely the largest group of standard fiction.
85
+ * **Cluster 2 The Purple Legacy:** These are "The Classics." They are much older than the rest of the dataset. Because there aren't many 70-year-old books in a modern dataset, they appeared as "specks" in the PCA, but mathematically, their age makes them a very distinct, elite group.
86
+ * **Cluster 3 The High-Quality Hidden Gems:** Defined by high author_rep_score and the highest average_rating, but the lowest rating_count_log. They have very few ratings (low hype), but the people who do read them love them, and they are written by top-tier authors.
87
 
88
  ![PCA Cluster Visualization](PCA.png)
89
  *Figure 6: PCA projection of the 4-cluster K-Means model.*