jeergrvgreg commited on
Commit
2e0b38d
·
verified ·
1 Parent(s): aed9c90

Upload folder using huggingface_hub

Browse files
Files changed (3) hide show
  1. README.md +123 -124
  2. adapter_config.json +5 -3
  3. tokenizer.json +2 -2
README.md CHANGED
@@ -8,131 +8,130 @@ language:
8
  - nl
9
  - it
10
  tags:
11
- - multilingual
12
- - text-classification
13
- - content-filtering
14
- - multi-dimensional-scoring
15
- - knowledge-distillation
16
- - uplifting-content
17
- - news-analysis
18
  library_name: peft
19
  base_model: Qwen/Qwen2.5-1.5B
20
  pipeline_tag: text-classification
21
  ---
22
-
23
- # Uplifting Content Filter v5
24
-
25
- ## Model Description
26
-
27
- A fine-tuned **Qwen2.5-1.5B** model with LoRA adapters for multi-dimensional uplifting content scoring.
28
-
29
- This model evaluates news articles across **6 orthogonal dimensions** to identify genuinely uplifting content with documented positive outcomes - not just feel-good stories or speculation.
30
-
31
- **Key Innovation**: Uses an orthogonal dimension framework (inspired by LCSA methodology) to avoid the high correlation issues found in previous versions.
32
-
33
- ## Dimensions
34
-
35
- The model scores articles on 6 dimensions:
36
-
37
- ### Impact Domains (WHAT kind of uplift)
38
- | Dimension | Weight | Question |
39
- |-----------|--------|----------|
40
- | **Human Wellbeing Impact** | 25% | Health, safety, livelihoods improved? |
41
- | **Social Cohesion Impact** | 15% | Communities strengthened, solidarity built? |
42
- | **Justice & Rights Impact** | 10% | Wrongs addressed, rights expanded? |
43
-
44
- ### Assessment Dimensions (HOW real/accessible)
45
- | Dimension | Weight | Question |
46
- |-----------|--------|----------|
47
- | **Evidence Level** | 20% | Documented outcomes or speculation? |
48
- | **Benefit Distribution** | 20% | Who benefits? Elite → Universal? |
49
- | **Change Durability** | 10% | Temporary relief → Systemic change? |
50
-
51
- ## Performance
52
-
53
- | Metric | Value |
54
- |--------|-------|
55
- | **Validation MAE** | **0.681** |
56
- | Training MAE | 0.637 |
57
- | Validation RMSE | 0.880 |
58
-
59
- ### Per-Dimension MAE (Validation)
60
- | Dimension | MAE |
61
- |-----------|-----|
62
- | Human Wellbeing Impact | 0.686 |
63
- | Social Cohesion Impact | 0.704 |
64
- | Justice Rights Impact | 0.619 |
65
- | Evidence Level | 0.636 |
66
- | Benefit Distribution | 0.792 |
67
- | Change Durability | 0.648 |
68
-
69
- ## Training Details
70
-
71
- - **Base Model**: Qwen/Qwen2.5-1.5B
72
- - **Training Mode**: Knowledge Distillation (from Gemini Flash oracle)
73
- - **Adapter**: LoRA (18.5M trainable params, 1.2% of model)
74
- - **Training Samples**: 7,999
75
- - **Validation Samples**: 1,000
76
- - **Epochs**: 3
77
- - **Batch Size**: 8
78
- - **Learning Rate**: 2e-5
79
- - **Max Length**: 512 tokens
80
-
81
- ## Usage
82
-
83
- ```python
84
- from transformers import AutoTokenizer, AutoModelForSequenceClassification
85
- from peft import PeftModel
86
- import torch
87
-
88
- # Load base model and LoRA adapter
89
- base_model = AutoModelForSequenceClassification.from_pretrained(
90
- "Qwen/Qwen2.5-1.5B",
91
- num_labels=6,
92
- problem_type="regression"
93
- )
94
- model = PeftModel.from_pretrained(base_model, "nexusmind/uplifting-filter-v5")
95
- tokenizer = AutoTokenizer.from_pretrained("nexusmind/uplifting-filter-v5")
96
-
97
- # Score an article
98
- article = "Title: Community garden feeds 500 families\n\nA new community garden..."
99
- inputs = tokenizer(article, return_tensors="pt", max_length=512, truncation=True)
100
-
101
- with torch.no_grad():
102
- outputs = model(**inputs)
103
- scores = outputs.logits[0].numpy()
104
-
105
- dimensions = ["human_wellbeing_impact", "social_cohesion_impact", "justice_rights_impact",
106
- "evidence_level", "benefit_distribution", "change_durability"]
107
-
108
- for dim, score in zip(dimensions, scores):
109
- print(f"{dim}: {score:.1f}")
110
- ```
111
-
112
- ## Gatekeeper Rule
113
-
114
- **Evidence Level < 3 → Overall score capped at 3.0**
115
-
116
- Speculation without documented outcomes cannot be truly uplifting.
117
-
118
- ## Limitations
119
-
120
- - Trained on multilingual news articles (61% English, 31% French, 7% Spanish, <1% German/Dutch/Italian)
121
- - MAE of ~0.68 means predictions within ±0.7 of oracle on average
122
- - `benefit_distribution` dimension has highest error (0.79 MAE)
123
- - Model focuses on documented outcomes, not emotional tone
124
-
125
- ## License
126
-
127
- MIT
128
-
129
- ## Citation
130
-
131
- ```bibtex
132
- @misc{uplifting_filter_v5,
133
- title={Uplifting Content Filter v5},
134
- author={NexusMind},
135
- year={2025},
136
- url={https://huggingface.co/nexusmind/uplifting-filter-v5}
137
- }
138
- ```
 
 
 
 
8
  - nl
9
  - it
10
  tags:
11
+ - base_model:adapter:Qwen/Qwen2.5-1.5B
12
+ - lora
13
+ - transformers
 
 
 
 
14
  library_name: peft
15
  base_model: Qwen/Qwen2.5-1.5B
16
  pipeline_tag: text-classification
17
  ---
18
+
19
+ # Uplifting Content Filter v5
20
+
21
+ ## Model Description
22
+
23
+ A fine-tuned **Qwen2.5-1.5B** model with LoRA adapters for multi-dimensional uplifting content scoring.
24
+
25
+ This model evaluates news articles across **6 orthogonal dimensions** to identify genuinely uplifting content with documented positive outcomes - not just feel-good stories or speculation.
26
+
27
+ **Key Innovation**: Uses an orthogonal dimension framework (inspired by LCSA methodology) to avoid the high correlation issues found in previous versions.
28
+
29
+ ## Dimensions
30
+
31
+ The model scores articles on 6 dimensions:
32
+
33
+ ### Impact Domains (WHAT kind of uplift)
34
+ | Dimension | Weight | Question |
35
+ |-----------|--------|----------|
36
+ | **Human Wellbeing Impact** | 25% | Health, safety, livelihoods improved? |
37
+ | **Social Cohesion Impact** | 15% | Communities strengthened, solidarity built? |
38
+ | **Justice & Rights Impact** | 10% | Wrongs addressed, rights expanded? |
39
+
40
+ ### Assessment Dimensions (HOW real/accessible)
41
+ | Dimension | Weight | Question |
42
+ |-----------|--------|----------|
43
+ | **Evidence Level** | 20% | Documented outcomes or speculation? |
44
+ | **Benefit Distribution** | 20% | Who benefits? Elite → Universal? |
45
+ | **Change Durability** | 10% | Temporary relief → Systemic change? |
46
+
47
+ ## Performance
48
+
49
+ | Metric | Value |
50
+ |--------|-------|
51
+ | **Validation MAE** | **0.681** |
52
+ | Training MAE | 0.637 |
53
+ | Validation RMSE | 0.880 |
54
+
55
+ ### Per-Dimension MAE (Validation)
56
+ | Dimension | MAE |
57
+ |-----------|-----|
58
+ | Human Wellbeing Impact | 0.686 |
59
+ | Social Cohesion Impact | 0.704 |
60
+ | Justice Rights Impact | 0.619 |
61
+ | Evidence Level | 0.636 |
62
+ | Benefit Distribution | 0.792 |
63
+ | Change Durability | 0.648 |
64
+
65
+ ## Training Details
66
+
67
+ - **Base Model**: Qwen/Qwen2.5-1.5B
68
+ - **Training Mode**: Knowledge Distillation (from Gemini Flash oracle)
69
+ - **Adapter**: LoRA (18.5M trainable params, 1.2% of model)
70
+ - **Training Samples**: 7,999
71
+ - **Validation Samples**: 1,000
72
+ - **Epochs**: 3
73
+ - **Batch Size**: 8
74
+ - **Learning Rate**: 2e-5
75
+ - **Max Length**: 512 tokens
76
+
77
+ ## Usage
78
+
79
+ ```python
80
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
81
+ from peft import PeftModel
82
+ import torch
83
+
84
+ # Load base model and LoRA adapter
85
+ base_model = AutoModelForSequenceClassification.from_pretrained(
86
+ "Qwen/Qwen2.5-1.5B",
87
+ num_labels=6,
88
+ problem_type="regression"
89
+ )
90
+ model = PeftModel.from_pretrained(base_model, "nexusmind/uplifting-filter-v5")
91
+ tokenizer = AutoTokenizer.from_pretrained("nexusmind/uplifting-filter-v5")
92
+
93
+ # Score an article
94
+ article = "Title: Community garden feeds 500 families\n\nA new community garden..."
95
+ inputs = tokenizer(article, return_tensors="pt", max_length=512, truncation=True)
96
+
97
+ with torch.no_grad():
98
+ outputs = model(**inputs)
99
+ scores = outputs.logits[0].numpy()
100
+
101
+ dimensions = ["human_wellbeing_impact", "social_cohesion_impact", "justice_rights_impact",
102
+ "evidence_level", "benefit_distribution", "change_durability"]
103
+
104
+ for dim, score in zip(dimensions, scores):
105
+ print(f"{dim}: {score:.1f}")
106
+ ```
107
+
108
+ ## Gatekeeper Rule
109
+
110
+ **Evidence Level < 3 → Overall score capped at 3.0**
111
+
112
+ Speculation without documented outcomes cannot be truly uplifting.
113
+
114
+ ## Limitations
115
+
116
+ - Trained on multilingual news articles (61% English, 31% French, 7% Spanish, <1% German/Dutch/Italian)
117
+ - MAE of ~0.68 means predictions within ±0.7 of oracle on average
118
+ - `benefit_distribution` dimension has highest error (0.79 MAE)
119
+ - Model focuses on documented outcomes, not emotional tone
120
+
121
+ ## License
122
+
123
+ MIT
124
+
125
+ ## Citation
126
+
127
+ ```bibtex
128
+ @misc{uplifting_filter_v5,
129
+ title={Uplifting Content Filter v5},
130
+ author={NexusMind},
131
+ year={2025},
132
+ url={https://huggingface.co/nexusmind/uplifting-filter-v5}
133
+ }
134
+ ```
135
+ ### Framework versions
136
+
137
+ - PEFT 0.17.1
adapter_config.json CHANGED
@@ -19,6 +19,8 @@
19
  "megatron_config": null,
20
  "megatron_core": "megatron.core",
21
  "modules_to_save": [
 
 
22
  "classifier",
23
  "score"
24
  ],
@@ -28,13 +30,13 @@
28
  "rank_pattern": {},
29
  "revision": null,
30
  "target_modules": [
31
- "o_proj",
32
- "down_proj",
33
  "k_proj",
34
  "q_proj",
 
35
  "v_proj",
36
  "up_proj",
37
- "gate_proj"
 
38
  ],
39
  "target_parameters": null,
40
  "task_type": "SEQ_CLS",
 
19
  "megatron_config": null,
20
  "megatron_core": "megatron.core",
21
  "modules_to_save": [
22
+ "classifier",
23
+ "score",
24
  "classifier",
25
  "score"
26
  ],
 
30
  "rank_pattern": {},
31
  "revision": null,
32
  "target_modules": [
 
 
33
  "k_proj",
34
  "q_proj",
35
+ "down_proj",
36
  "v_proj",
37
  "up_proj",
38
+ "gate_proj",
39
+ "o_proj"
40
  ],
41
  "target_parameters": null,
42
  "task_type": "SEQ_CLS",
tokenizer.json CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:85acc0ed1a93f8b0e6c803b53edf0fe4898ac19a3fff657f21020d280364a0cf
3
- size 11422174
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9c5ae00e602b8860cbd784ba82a8aa14e8feecec692e7076590d014d7b7fdafa
3
+ size 11421896