Update README.md
Browse files
README.md
CHANGED
|
@@ -59,11 +59,11 @@ there are several limitations to outline
|
|
| 59 |
- When reviewing records were ambiguous or that the classifier incorrectly predicted, it was clear that the labeling scheme is fuzzy in
|
| 60 |
some instances. For instance, many "Opinion" comments can be viewed as "Expressive" "Arguments", leading to ambiguous labeling from models.
|
| 61 |
It would be worth exploring a more nuanced labeling scheme, perhaps splitting "Expressive" into 2-3 labels and Opinion into another 1 or 2
|
| 62 |
-
- Due to the nature of the project, the commentary data used for training
|
| 63 |
- Queries were isolated to "politics" or "US politics"
|
| 64 |
-
-
|
| 65 |
-
- We set a ceiling and a floor for number of comments per post. No posts with under 10 comments were used, and
|
| 66 |
-
|
| 67 |
|
| 68 |
## Training and evaluation data
|
| 69 |
|
|
|
|
| 59 |
- When reviewing records were ambiguous or that the classifier incorrectly predicted, it was clear that the labeling scheme is fuzzy in
|
| 60 |
some instances. For instance, many "Opinion" comments can be viewed as "Expressive" "Arguments", leading to ambiguous labeling from models.
|
| 61 |
It would be worth exploring a more nuanced labeling scheme, perhaps splitting "Expressive" into 2-3 labels and Opinion into another 1 or 2
|
| 62 |
+
- Due to the nature of the project, the commentary data used for training is subject to the following limitations
|
| 63 |
- Queries were isolated to "politics" or "US politics"
|
| 64 |
+
- All comment data is dated from Jan 1, 2025 to Feb 12, 2026, with the majority originating in 2026
|
| 65 |
+
- We set a ceiling and a floor for number of comments per post. No posts with under 10 comments were used, and number of comments scraped
|
| 66 |
+
were capped at 300
|
| 67 |
|
| 68 |
## Training and evaluation data
|
| 69 |
|