ernchern commited on
Commit
06568e8
·
verified ·
1 Parent(s): c6581cd

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +72 -0
README.md ADDED
@@ -0,0 +1,72 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: openrail
5
+ base_model: bigscience/bloom-560m
6
+ tags:
7
+ - text-classification
8
+ - aging
9
+ - social-media
10
+ - reddit
11
+ - generationing
12
+ metrics:
13
+ - f1
14
+ model-index:
15
+ - name: BLOOM-560m-Personal-Sharing-Classification
16
+ results:
17
+ - task:
18
+ type: text-classification
19
+ metrics:
20
+ - type: f1
21
+ value: 0.9599
22
+ ---
23
+
24
+ # Model Card: BLOOM-560m for Personal Sharing Classification
25
+
26
+ [cite_start]This model is a fine-tuned version of [BLOOM-560m](https://huggingface.co/bigscience/bloom-560m) designed to classify personal experience sharing in social media text[cite: 80, 85]. [cite_start]It was developed to explore how different generations (Baby Boomers and Gen X) express themselves on pseudonymous platforms like Reddit[cite: 56, 144].
27
+
28
+ ## Model Details
29
+
30
+ - [cite_start]**Model Type:** Large Language Model (Decoder-only) fine-tuned for sequence classification[cite: 80, 85].
31
+ - [cite_start]**Language:** English[cite: 77].
32
+ - [cite_start]**Finetuned from model:** `bigscience/bloom-560m`[cite: 85].
33
+ - [cite_start]**Application:** Sociotechnical research on digital aging and online self-disclosure[cite: 17, 180].
34
+
35
+ ## Intended Use
36
+
37
+ ### Primary Task
38
+ [cite_start]The model classifies individual sentences into one of four categories to analyze domains of self-disclosure in online forums[cite: 80].
39
+
40
+ ### Categories
41
+ * [cite_start]**Health and Wellness (Label 0):** Personal experiences regarding physical/mental health, treatments, or aging-related bodily changes[cite: 80, 81].
42
+ * [cite_start]**Personal Relationships and Identity (Label 1):** Sentences describing social ties, family, friendships, or social identities[cite: 80, 81].
43
+ * [cite_start]**Professional and Financial (Label 2):** Reflections on work, career history, retirement planning, and financial management[cite: 80, 81].
44
+ * [cite_start]**Not Related to Personal Sharing (Label 3):** Non-reflective content, general information, or social pleasantries (excluded from analysis)[cite: 80, 84].
45
+
46
+ ## Training Data
47
+
48
+ * [cite_start]**Source:** Publicly available posts and comments from the Reddit subreddit `r/AskOldPeople`[cite: 65, 76].
49
+ * [cite_start]**Size:** 2,000 manually labeled sentences (stratified sampling: 500 per category)[cite: 86].
50
+ * [cite_start]**Data Split:** 80% Training, 10% Validation, 10% Test[cite: 86].
51
+ * [cite_start]**Preprocessing:** Sentences were tokenized using the Punkt sentence tokenizer[cite: 77].
52
+
53
+ ## Performance
54
+
55
+ [cite_start]The model achieved high accuracy on a held-out test set[cite: 87]:
56
+
57
+ | Metric | Value |
58
+ | :--- | :--- |
59
+ | **F1 Score** | **0.9599** |
60
+
61
+ ## Usage
62
+
63
+ You can use this model directly with the Hugging Face `transformers` library:
64
+
65
+ ```python
66
+ from transformers import pipeline
67
+
68
+ classifier = pipeline("text-classification", model="ernchern/personal_info_classification")
69
+
70
+ text = "I am 67, retired in August, and most basic expenses are covered by Social Security."
71
+ result = classifier(text)
72
+ print(result)