jeergrvgreg commited on
Commit
01a7afa
·
verified ·
1 Parent(s): 2e0b38d

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +103 -91
README.md CHANGED
@@ -1,137 +1,149 @@
1
- ---
2
- license: mit
3
- language:
4
- - en
5
- - fr
6
- - es
7
- - de
8
- - nl
9
- - it
10
- tags:
11
- - base_model:adapter:Qwen/Qwen2.5-1.5B
12
- - lora
13
- - transformers
14
- library_name: peft
15
- base_model: Qwen/Qwen2.5-1.5B
16
- pipeline_tag: text-classification
17
- ---
18
-
19
- # Uplifting Content Filter v5
20
 
21
  ## Model Description
22
 
23
- A fine-tuned **Qwen2.5-1.5B** model with LoRA adapters for multi-dimensional uplifting content scoring.
 
24
 
25
- This model evaluates news articles across **6 orthogonal dimensions** to identify genuinely uplifting content with documented positive outcomes - not just feel-good stories or speculation.
 
26
 
27
- **Key Innovation**: Uses an orthogonal dimension framework (inspired by LCSA methodology) to avoid the high correlation issues found in previous versions.
28
 
29
- ## Dimensions
30
 
31
- The model scores articles on 6 dimensions:
32
 
33
- ### Impact Domains (WHAT kind of uplift)
34
- | Dimension | Weight | Question |
35
- |-----------|--------|----------|
36
- | **Human Wellbeing Impact** | 25% | Health, safety, livelihoods improved? |
37
- | **Social Cohesion Impact** | 15% | Communities strengthened, solidarity built? |
38
- | **Justice & Rights Impact** | 10% | Wrongs addressed, rights expanded? |
39
 
40
- ### Assessment Dimensions (HOW real/accessible)
41
- | Dimension | Weight | Question |
42
- |-----------|--------|----------|
43
- | **Evidence Level** | 20% | Documented outcomes or speculation? |
44
- | **Benefit Distribution** | 20% | Who benefits? Elite → Universal? |
45
- | **Change Durability** | 10% | Temporary relief → Systemic change? |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
46
 
47
  ## Performance
48
 
 
 
49
  | Metric | Value |
50
  |--------|-------|
51
- | **Validation MAE** | **0.681** |
52
- | Training MAE | 0.637 |
53
- | Validation RMSE | 0.880 |
 
 
 
54
 
55
- ### Per-Dimension MAE (Validation)
56
  | Dimension | MAE |
57
  |-----------|-----|
58
- | Human Wellbeing Impact | 0.686 |
59
- | Social Cohesion Impact | 0.704 |
60
- | Justice Rights Impact | 0.619 |
61
- | Evidence Level | 0.636 |
62
- | Benefit Distribution | 0.792 |
63
- | Change Durability | 0.648 |
64
-
65
- ## Training Details
66
-
67
- - **Base Model**: Qwen/Qwen2.5-1.5B
68
- - **Training Mode**: Knowledge Distillation (from Gemini Flash oracle)
69
- - **Adapter**: LoRA (18.5M trainable params, 1.2% of model)
70
- - **Training Samples**: 7,999
71
- - **Validation Samples**: 1,000
72
- - **Epochs**: 3
73
- - **Batch Size**: 8
74
- - **Learning Rate**: 2e-5
75
- - **Max Length**: 512 tokens
76
 
77
  ## Usage
78
 
79
  ```python
80
  from transformers import AutoTokenizer, AutoModelForSequenceClassification
81
- from peft import PeftModel
82
  import torch
83
 
84
- # Load base model and LoRA adapter
85
- base_model = AutoModelForSequenceClassification.from_pretrained(
86
- "Qwen/Qwen2.5-1.5B",
87
- num_labels=6,
88
- problem_type="regression"
89
- )
90
- model = PeftModel.from_pretrained(base_model, "nexusmind/uplifting-filter-v5")
91
- tokenizer = AutoTokenizer.from_pretrained("nexusmind/uplifting-filter-v5")
92
 
93
- # Score an article
94
- article = "Title: Community garden feeds 500 families\n\nA new community garden..."
95
- inputs = tokenizer(article, return_tensors="pt", max_length=512, truncation=True)
 
 
96
 
 
 
 
 
97
  with torch.no_grad():
98
  outputs = model(**inputs)
99
  scores = outputs.logits[0].numpy()
100
 
101
- dimensions = ["human_wellbeing_impact", "social_cohesion_impact", "justice_rights_impact",
102
- "evidence_level", "benefit_distribution", "change_durability"]
103
 
 
104
  for dim, score in zip(dimensions, scores):
105
- print(f"{dim}: {score:.1f}")
106
  ```
107
 
108
- ## Gatekeeper Rule
109
-
110
- **Evidence Level < 3 → Overall score capped at 3.0**
111
-
112
- Speculation without documented outcomes cannot be truly uplifting.
113
-
114
  ## Limitations
115
 
116
- - Trained on multilingual news articles (61% English, 31% French, 7% Spanish, <1% German/Dutch/Italian)
117
- - MAE of ~0.68 means predictions within ±0.7 of oracle on average
118
- - `benefit_distribution` dimension has highest error (0.79 MAE)
119
- - Model focuses on documented outcomes, not emotional tone
120
 
121
- ## License
122
 
123
- MIT
 
 
 
 
124
 
125
  ## Citation
126
 
 
 
127
  ```bibtex
128
- @misc{uplifting_filter_v5,
129
- title={Uplifting Content Filter v5},
130
- author={NexusMind},
131
  year={2025},
132
- url={https://huggingface.co/nexusmind/uplifting-filter-v5}
133
  }
134
  ```
135
- ### Framework versions
136
 
137
- - PEFT 0.17.1
 
 
 
1
+ ---
2
+ license: mit
3
+ language: en
4
+ tags:
5
+ - text-classification
6
+ - content-filtering
7
+ - multi-dimensional-scoring
8
+ - knowledge-distillation
9
+ library_name: transformers
10
+ pipeline_tag: text-classification
11
+ ---
12
+
13
+ # jeergrvgreg/uplifting-filter-v5
 
 
 
 
 
 
14
 
15
  ## Model Description
16
 
17
+ This model is a fine-tuned version of [Qwen/Qwen2.5-1.5B](https://huggingface.co/Qwen/Qwen2.5-1.5B)
18
+ for multi-dimensional content scoring using the **uplifting** filter.
19
 
20
+ The model was trained using **knowledge distillation** from Gemini Flash, learning to replicate
21
+ its judgment patterns on content evaluation.
22
 
23
+ **Filter Focus**: DOCUMENTED OUTCOMES for human/planetary wellbeing, not emotional tone or speculation
24
 
25
+ ## Intended Use
26
 
27
+ This model scores articles across 6 semantic dimensions:
28
 
29
+ - **Human Wellbeing Impact** (weight: 0.25): Improvement in health, safety, livelihoods, or basic needs
30
+ - **Social Cohesion Impact** (weight: 0.15): Communities strengthened, solidarity built, connections across groups
31
+ - **Justice Rights Impact** (weight: 0.10): Wrongs addressed, accountability achieved, rights expanded
32
+ - **Evidence Level** (weight: 0.20): How verified are the claimed outcomes?
33
+ - **Benefit Distribution** (weight: 0.20): Who benefits? How accessible is the benefit?
34
+ - **Change Durability** (weight: 0.10): How lasting is the change?
35
 
36
+
37
+ ## Training Data
38
+
39
+ - **Training samples**: 7,999
40
+ - **Validation samples**: 1,000
41
+ - **Oracle**: Gemini Flash (for ground truth generation)
42
+ - **Quality threshold**: Articles with quality_score >= 0.7
43
+
44
+ ## Training Procedure
45
+
46
+ ### Model Architecture
47
+
48
+ - **Base model**: Qwen/Qwen2.5-1.5B
49
+ - **Parameters**: 1,562,197,504
50
+ - **Task**: Multi-dimensional regression (8 outputs)
51
+ - **Input**: Article title + content (max 512 tokens)
52
+ - **Output**: 8 continuous scores (0-10 range)
53
+
54
+ ### Training Configuration
55
+
56
+ - **Epochs**: 3
57
+ - **Batch size**: 8
58
+ - **Learning rate**: 2e-05
59
+ - **Optimizer**: AdamW
60
+ - **Loss function**: Mean Squared Error (MSE)
61
+ - **Gradient checkpointing**: Enabled
62
 
63
  ## Performance
64
 
65
+ ### Overall Metrics
66
+
67
  | Metric | Value |
68
  |--------|-------|
69
+ | Validation MAE | 0.6807 |
70
+ | Training MAE | 0.6368 |
71
+ | Validation RMSE | 0.8799 |
72
+ | Training RMSE | 0.8215 |
73
+
74
+ ### Per-Dimension Performance (Validation MAE)
75
 
 
76
  | Dimension | MAE |
77
  |-----------|-----|
78
+ | Human Wellbeing Impact | 0.6857 |
79
+ | Social Cohesion Impact | 0.7040 |
80
+ | Justice Rights Impact | 0.6188 |
81
+ | Evidence Level | 0.6363 |
82
+ | Benefit Distribution | 0.7922 |
83
+ | Change Durability | 0.6475 |
84
+
 
 
 
 
 
 
 
 
 
 
 
85
 
86
  ## Usage
87
 
88
  ```python
89
  from transformers import AutoTokenizer, AutoModelForSequenceClassification
 
90
  import torch
91
 
92
+ # Load model and tokenizer
93
+ model_name = "jeergrvgreg/uplifting-filter-v5"
94
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
95
+ model = AutoModelForSequenceClassification.from_pretrained(model_name)
 
 
 
 
96
 
97
+ # Prepare input
98
+ article = {
99
+ "title": "Example Article Title",
100
+ "content": "Article content here..."
101
+ }
102
 
103
+ text = f"{article['title']}\n\n{article['content']}"
104
+ inputs = tokenizer(text, return_tensors="pt", max_length=512, truncation=True)
105
+
106
+ # Get predictions
107
  with torch.no_grad():
108
  outputs = model(**inputs)
109
  scores = outputs.logits[0].numpy()
110
 
111
+ # Dimension names
112
+ dimensions = ['human_wellbeing_impact', 'social_cohesion_impact', 'justice_rights_impact', 'evidence_level', 'benefit_distribution', 'change_durability']
113
 
114
+ # Print scores
115
  for dim, score in zip(dimensions, scores):
116
+ print(f"{dim}: {score:.2f}")
117
  ```
118
 
 
 
 
 
 
 
119
  ## Limitations
120
 
121
+ - Model was trained on English news articles
122
+ - Performance may vary on other content types
123
+ - Validation MAE of 0.6807 indicates ~0.8 point average error on 0-10 scale
124
+ - Some overfitting observed (train/val gap: 0.04)
125
 
126
+ ## Ethical Considerations
127
 
128
+ This model evaluates content based on specific semantic dimensions. Users should:
129
+ - Understand the filter's focus and biases
130
+ - Not use as sole decision-maker for content moderation
131
+ - Regularly evaluate model performance on their specific use case
132
+ - Be aware that automated scoring may miss nuance
133
 
134
  ## Citation
135
 
136
+ If you use this model, please cite:
137
+
138
  ```bibtex
139
+ @misc{uplifting_filter_v5.0,
140
+ title={Uplifting Content Filter},
141
+ author={Your Name},
142
  year={2025},
143
+ url={https://huggingface.co/jeergrvgreg/uplifting-filter-v5}
144
  }
145
  ```
 
146
 
147
+ ## Model Card Contact
148
+
149
+ For questions or feedback about this model, please open an issue in the repository.