Update README.md
Browse files
README.md
CHANGED
|
@@ -80,10 +80,10 @@ The most significant breakthrough came from engineering the **Author Reputation
|
|
| 80 |
|
| 81 |
### 4. Unsupervised Learning: Discovering Book "Personas"
|
| 82 |
Using **K-Means Clustering**, we identified four distinct "Personas" within the dataset.
|
| 83 |
-
* **The
|
| 84 |
-
* **The
|
| 85 |
-
* **The
|
| 86 |
-
* **The
|
| 87 |
|
| 88 |

|
| 89 |
*Figure 6: PCA projection of the 4-cluster K-Means model.*
|
|
|
|
| 80 |
|
| 81 |
### 4. Unsupervised Learning: Discovering Book "Personas"
|
| 82 |
Using **K-Means Clustering**, we identified four distinct "Personas" within the dataset.
|
| 83 |
+
* **Cluster 0 The Modern Epics:** These are the "Mainstream Blockbusters." They are thick books with massive popularity.
|
| 84 |
+
* **Cluster 1 The Standard Read:** Defined by average length and average popularity. This is likely the largest group of standard fiction.
|
| 85 |
+
* **Cluster 2 The Purple Legacy:** These are "The Classics." They are much older than the rest of the dataset. Because there aren't many 70-year-old books in a modern dataset, they appeared as "specks" in the PCA, but mathematically, their age makes them a very distinct, elite group.
|
| 86 |
+
* **Cluster 3 The High-Quality Hidden Gems:** Defined by high author_rep_score and the highest average_rating, but the lowest rating_count_log. They have very few ratings (low hype), but the people who do read them love them, and they are written by top-tier authors.
|
| 87 |
|
| 88 |

|
| 89 |
*Figure 6: PCA projection of the 4-cluster K-Means model.*
|