ADS509
/

BERTweet-large-self-labeling

Text Classification

Generated from Trainer

multi_label_classification

text-embeddings-inference

Model card Files Files and versions

tkbarb10 commited on Feb 21

Commit

6a01b67

·

verified ·

1 Parent(s): 0140c3c

Update README.md

Files changed (1) hide show

README.md +4 -4

README.md CHANGED Viewed

@@ -59,11 +59,11 @@ there are several limitations to outline
   - When reviewing records were ambiguous or that the classifier incorrectly predicted, it was clear that the labeling scheme is fuzzy in
     some instances. For instance, many "Opinion" comments can be viewed as "Expressive" "Arguments", leading to ambiguous labeling from models.
     It would be worth exploring a more nuanced labeling scheme, perhaps splitting "Expressive" into 2-3 labels and Opinion into another 1 or 2
-  - Due to the nature of the project, the commentary data used for training was subject to the following limitations
     - Queries were isolated to "politics" or "US politics"
-    - With one exception, all comment data is dated from Jan 1, 2026 to Feb 12, 2026
-    - We set a ceiling and a floor for number of comments per post. No posts with under 10 comments were used, and for posts with
-      several comments, we only pulled the most recent 300
 ## Training and evaluation data

   - When reviewing records were ambiguous or that the classifier incorrectly predicted, it was clear that the labeling scheme is fuzzy in
     some instances. For instance, many "Opinion" comments can be viewed as "Expressive" "Arguments", leading to ambiguous labeling from models.
     It would be worth exploring a more nuanced labeling scheme, perhaps splitting "Expressive" into 2-3 labels and Opinion into another 1 or 2
+  - Due to the nature of the project, the commentary data used for training is subject to the following limitations
     - Queries were isolated to "politics" or "US politics"
+    - All comment data is dated from Jan 1, 2025 to Feb 12, 2026, with the majority originating in 2026
+    - We set a ceiling and a floor for number of comments per post. No posts with under 10 comments were used, and number of comments scraped
+      were capped at 300
 ## Training and evaluation data