Upload folder using huggingface_hub
Browse files- README.md +123 -124
- adapter_config.json +5 -3
- tokenizer.json +2 -2
README.md
CHANGED
|
@@ -8,131 +8,130 @@ language:
|
|
| 8 |
- nl
|
| 9 |
- it
|
| 10 |
tags:
|
| 11 |
-
-
|
| 12 |
-
-
|
| 13 |
-
-
|
| 14 |
-
- multi-dimensional-scoring
|
| 15 |
-
- knowledge-distillation
|
| 16 |
-
- uplifting-content
|
| 17 |
-
- news-analysis
|
| 18 |
library_name: peft
|
| 19 |
base_model: Qwen/Qwen2.5-1.5B
|
| 20 |
pipeline_tag: text-classification
|
| 21 |
---
|
| 22 |
-
|
| 23 |
-
# Uplifting Content Filter v5
|
| 24 |
-
|
| 25 |
-
## Model Description
|
| 26 |
-
|
| 27 |
-
A fine-tuned **Qwen2.5-1.5B** model with LoRA adapters for multi-dimensional uplifting content scoring.
|
| 28 |
-
|
| 29 |
-
This model evaluates news articles across **6 orthogonal dimensions** to identify genuinely uplifting content with documented positive outcomes - not just feel-good stories or speculation.
|
| 30 |
-
|
| 31 |
-
**Key Innovation**: Uses an orthogonal dimension framework (inspired by LCSA methodology) to avoid the high correlation issues found in previous versions.
|
| 32 |
-
|
| 33 |
-
## Dimensions
|
| 34 |
-
|
| 35 |
-
The model scores articles on 6 dimensions:
|
| 36 |
-
|
| 37 |
-
### Impact Domains (WHAT kind of uplift)
|
| 38 |
-
| Dimension | Weight | Question |
|
| 39 |
-
|-----------|--------|----------|
|
| 40 |
-
| **Human Wellbeing Impact** | 25% | Health, safety, livelihoods improved? |
|
| 41 |
-
| **Social Cohesion Impact** | 15% | Communities strengthened, solidarity built? |
|
| 42 |
-
| **Justice & Rights Impact** | 10% | Wrongs addressed, rights expanded? |
|
| 43 |
-
|
| 44 |
-
### Assessment Dimensions (HOW real/accessible)
|
| 45 |
-
| Dimension | Weight | Question |
|
| 46 |
-
|-----------|--------|----------|
|
| 47 |
-
| **Evidence Level** | 20% | Documented outcomes or speculation? |
|
| 48 |
-
| **Benefit Distribution** | 20% | Who benefits? Elite → Universal? |
|
| 49 |
-
| **Change Durability** | 10% | Temporary relief → Systemic change? |
|
| 50 |
-
|
| 51 |
-
## Performance
|
| 52 |
-
|
| 53 |
-
| Metric | Value |
|
| 54 |
-
|--------|-------|
|
| 55 |
-
| **Validation MAE** | **0.681** |
|
| 56 |
-
| Training MAE | 0.637 |
|
| 57 |
-
| Validation RMSE | 0.880 |
|
| 58 |
-
|
| 59 |
-
### Per-Dimension MAE (Validation)
|
| 60 |
-
| Dimension | MAE |
|
| 61 |
-
|-----------|-----|
|
| 62 |
-
| Human Wellbeing Impact | 0.686 |
|
| 63 |
-
| Social Cohesion Impact | 0.704 |
|
| 64 |
-
| Justice Rights Impact | 0.619 |
|
| 65 |
-
| Evidence Level | 0.636 |
|
| 66 |
-
| Benefit Distribution | 0.792 |
|
| 67 |
-
| Change Durability | 0.648 |
|
| 68 |
-
|
| 69 |
-
## Training Details
|
| 70 |
-
|
| 71 |
-
- **Base Model**: Qwen/Qwen2.5-1.5B
|
| 72 |
-
- **Training Mode**: Knowledge Distillation (from Gemini Flash oracle)
|
| 73 |
-
- **Adapter**: LoRA (18.5M trainable params, 1.2% of model)
|
| 74 |
-
- **Training Samples**: 7,999
|
| 75 |
-
- **Validation Samples**: 1,000
|
| 76 |
-
- **Epochs**: 3
|
| 77 |
-
- **Batch Size**: 8
|
| 78 |
-
- **Learning Rate**: 2e-5
|
| 79 |
-
- **Max Length**: 512 tokens
|
| 80 |
-
|
| 81 |
-
## Usage
|
| 82 |
-
|
| 83 |
-
```python
|
| 84 |
-
from transformers import AutoTokenizer, AutoModelForSequenceClassification
|
| 85 |
-
from peft import PeftModel
|
| 86 |
-
import torch
|
| 87 |
-
|
| 88 |
-
# Load base model and LoRA adapter
|
| 89 |
-
base_model = AutoModelForSequenceClassification.from_pretrained(
|
| 90 |
-
"Qwen/Qwen2.5-1.5B",
|
| 91 |
-
num_labels=6,
|
| 92 |
-
problem_type="regression"
|
| 93 |
-
)
|
| 94 |
-
model = PeftModel.from_pretrained(base_model, "nexusmind/uplifting-filter-v5")
|
| 95 |
-
tokenizer = AutoTokenizer.from_pretrained("nexusmind/uplifting-filter-v5")
|
| 96 |
-
|
| 97 |
-
# Score an article
|
| 98 |
-
article = "Title: Community garden feeds 500 families\n\nA new community garden..."
|
| 99 |
-
inputs = tokenizer(article, return_tensors="pt", max_length=512, truncation=True)
|
| 100 |
-
|
| 101 |
-
with torch.no_grad():
|
| 102 |
-
outputs = model(**inputs)
|
| 103 |
-
scores = outputs.logits[0].numpy()
|
| 104 |
-
|
| 105 |
-
dimensions = ["human_wellbeing_impact", "social_cohesion_impact", "justice_rights_impact",
|
| 106 |
-
"evidence_level", "benefit_distribution", "change_durability"]
|
| 107 |
-
|
| 108 |
-
for dim, score in zip(dimensions, scores):
|
| 109 |
-
print(f"{dim}: {score:.1f}")
|
| 110 |
-
```
|
| 111 |
-
|
| 112 |
-
## Gatekeeper Rule
|
| 113 |
-
|
| 114 |
-
**Evidence Level < 3 → Overall score capped at 3.0**
|
| 115 |
-
|
| 116 |
-
Speculation without documented outcomes cannot be truly uplifting.
|
| 117 |
-
|
| 118 |
-
## Limitations
|
| 119 |
-
|
| 120 |
-
- Trained on multilingual news articles (61% English, 31% French, 7% Spanish, <1% German/Dutch/Italian)
|
| 121 |
-
- MAE of ~0.68 means predictions within ±0.7 of oracle on average
|
| 122 |
-
- `benefit_distribution` dimension has highest error (0.79 MAE)
|
| 123 |
-
- Model focuses on documented outcomes, not emotional tone
|
| 124 |
-
|
| 125 |
-
## License
|
| 126 |
-
|
| 127 |
-
MIT
|
| 128 |
-
|
| 129 |
-
## Citation
|
| 130 |
-
|
| 131 |
-
```bibtex
|
| 132 |
-
@misc{uplifting_filter_v5,
|
| 133 |
-
title={Uplifting Content Filter v5},
|
| 134 |
-
author={NexusMind},
|
| 135 |
-
year={2025},
|
| 136 |
-
url={https://huggingface.co/nexusmind/uplifting-filter-v5}
|
| 137 |
-
}
|
| 138 |
-
```
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
- nl
|
| 9 |
- it
|
| 10 |
tags:
|
| 11 |
+
- base_model:adapter:Qwen/Qwen2.5-1.5B
|
| 12 |
+
- lora
|
| 13 |
+
- transformers
|
|
|
|
|
|
|
|
|
|
|
|
|
| 14 |
library_name: peft
|
| 15 |
base_model: Qwen/Qwen2.5-1.5B
|
| 16 |
pipeline_tag: text-classification
|
| 17 |
---
|
| 18 |
+
|
| 19 |
+
# Uplifting Content Filter v5
|
| 20 |
+
|
| 21 |
+
## Model Description
|
| 22 |
+
|
| 23 |
+
A fine-tuned **Qwen2.5-1.5B** model with LoRA adapters for multi-dimensional uplifting content scoring.
|
| 24 |
+
|
| 25 |
+
This model evaluates news articles across **6 orthogonal dimensions** to identify genuinely uplifting content with documented positive outcomes - not just feel-good stories or speculation.
|
| 26 |
+
|
| 27 |
+
**Key Innovation**: Uses an orthogonal dimension framework (inspired by LCSA methodology) to avoid the high correlation issues found in previous versions.
|
| 28 |
+
|
| 29 |
+
## Dimensions
|
| 30 |
+
|
| 31 |
+
The model scores articles on 6 dimensions:
|
| 32 |
+
|
| 33 |
+
### Impact Domains (WHAT kind of uplift)
|
| 34 |
+
| Dimension | Weight | Question |
|
| 35 |
+
|-----------|--------|----------|
|
| 36 |
+
| **Human Wellbeing Impact** | 25% | Health, safety, livelihoods improved? |
|
| 37 |
+
| **Social Cohesion Impact** | 15% | Communities strengthened, solidarity built? |
|
| 38 |
+
| **Justice & Rights Impact** | 10% | Wrongs addressed, rights expanded? |
|
| 39 |
+
|
| 40 |
+
### Assessment Dimensions (HOW real/accessible)
|
| 41 |
+
| Dimension | Weight | Question |
|
| 42 |
+
|-----------|--------|----------|
|
| 43 |
+
| **Evidence Level** | 20% | Documented outcomes or speculation? |
|
| 44 |
+
| **Benefit Distribution** | 20% | Who benefits? Elite → Universal? |
|
| 45 |
+
| **Change Durability** | 10% | Temporary relief → Systemic change? |
|
| 46 |
+
|
| 47 |
+
## Performance
|
| 48 |
+
|
| 49 |
+
| Metric | Value |
|
| 50 |
+
|--------|-------|
|
| 51 |
+
| **Validation MAE** | **0.681** |
|
| 52 |
+
| Training MAE | 0.637 |
|
| 53 |
+
| Validation RMSE | 0.880 |
|
| 54 |
+
|
| 55 |
+
### Per-Dimension MAE (Validation)
|
| 56 |
+
| Dimension | MAE |
|
| 57 |
+
|-----------|-----|
|
| 58 |
+
| Human Wellbeing Impact | 0.686 |
|
| 59 |
+
| Social Cohesion Impact | 0.704 |
|
| 60 |
+
| Justice Rights Impact | 0.619 |
|
| 61 |
+
| Evidence Level | 0.636 |
|
| 62 |
+
| Benefit Distribution | 0.792 |
|
| 63 |
+
| Change Durability | 0.648 |
|
| 64 |
+
|
| 65 |
+
## Training Details
|
| 66 |
+
|
| 67 |
+
- **Base Model**: Qwen/Qwen2.5-1.5B
|
| 68 |
+
- **Training Mode**: Knowledge Distillation (from Gemini Flash oracle)
|
| 69 |
+
- **Adapter**: LoRA (18.5M trainable params, 1.2% of model)
|
| 70 |
+
- **Training Samples**: 7,999
|
| 71 |
+
- **Validation Samples**: 1,000
|
| 72 |
+
- **Epochs**: 3
|
| 73 |
+
- **Batch Size**: 8
|
| 74 |
+
- **Learning Rate**: 2e-5
|
| 75 |
+
- **Max Length**: 512 tokens
|
| 76 |
+
|
| 77 |
+
## Usage
|
| 78 |
+
|
| 79 |
+
```python
|
| 80 |
+
from transformers import AutoTokenizer, AutoModelForSequenceClassification
|
| 81 |
+
from peft import PeftModel
|
| 82 |
+
import torch
|
| 83 |
+
|
| 84 |
+
# Load base model and LoRA adapter
|
| 85 |
+
base_model = AutoModelForSequenceClassification.from_pretrained(
|
| 86 |
+
"Qwen/Qwen2.5-1.5B",
|
| 87 |
+
num_labels=6,
|
| 88 |
+
problem_type="regression"
|
| 89 |
+
)
|
| 90 |
+
model = PeftModel.from_pretrained(base_model, "nexusmind/uplifting-filter-v5")
|
| 91 |
+
tokenizer = AutoTokenizer.from_pretrained("nexusmind/uplifting-filter-v5")
|
| 92 |
+
|
| 93 |
+
# Score an article
|
| 94 |
+
article = "Title: Community garden feeds 500 families\n\nA new community garden..."
|
| 95 |
+
inputs = tokenizer(article, return_tensors="pt", max_length=512, truncation=True)
|
| 96 |
+
|
| 97 |
+
with torch.no_grad():
|
| 98 |
+
outputs = model(**inputs)
|
| 99 |
+
scores = outputs.logits[0].numpy()
|
| 100 |
+
|
| 101 |
+
dimensions = ["human_wellbeing_impact", "social_cohesion_impact", "justice_rights_impact",
|
| 102 |
+
"evidence_level", "benefit_distribution", "change_durability"]
|
| 103 |
+
|
| 104 |
+
for dim, score in zip(dimensions, scores):
|
| 105 |
+
print(f"{dim}: {score:.1f}")
|
| 106 |
+
```
|
| 107 |
+
|
| 108 |
+
## Gatekeeper Rule
|
| 109 |
+
|
| 110 |
+
**Evidence Level < 3 → Overall score capped at 3.0**
|
| 111 |
+
|
| 112 |
+
Speculation without documented outcomes cannot be truly uplifting.
|
| 113 |
+
|
| 114 |
+
## Limitations
|
| 115 |
+
|
| 116 |
+
- Trained on multilingual news articles (61% English, 31% French, 7% Spanish, <1% German/Dutch/Italian)
|
| 117 |
+
- MAE of ~0.68 means predictions within ±0.7 of oracle on average
|
| 118 |
+
- `benefit_distribution` dimension has highest error (0.79 MAE)
|
| 119 |
+
- Model focuses on documented outcomes, not emotional tone
|
| 120 |
+
|
| 121 |
+
## License
|
| 122 |
+
|
| 123 |
+
MIT
|
| 124 |
+
|
| 125 |
+
## Citation
|
| 126 |
+
|
| 127 |
+
```bibtex
|
| 128 |
+
@misc{uplifting_filter_v5,
|
| 129 |
+
title={Uplifting Content Filter v5},
|
| 130 |
+
author={NexusMind},
|
| 131 |
+
year={2025},
|
| 132 |
+
url={https://huggingface.co/nexusmind/uplifting-filter-v5}
|
| 133 |
+
}
|
| 134 |
+
```
|
| 135 |
+
### Framework versions
|
| 136 |
+
|
| 137 |
+
- PEFT 0.17.1
|
adapter_config.json
CHANGED
|
@@ -19,6 +19,8 @@
|
|
| 19 |
"megatron_config": null,
|
| 20 |
"megatron_core": "megatron.core",
|
| 21 |
"modules_to_save": [
|
|
|
|
|
|
|
| 22 |
"classifier",
|
| 23 |
"score"
|
| 24 |
],
|
|
@@ -28,13 +30,13 @@
|
|
| 28 |
"rank_pattern": {},
|
| 29 |
"revision": null,
|
| 30 |
"target_modules": [
|
| 31 |
-
"o_proj",
|
| 32 |
-
"down_proj",
|
| 33 |
"k_proj",
|
| 34 |
"q_proj",
|
|
|
|
| 35 |
"v_proj",
|
| 36 |
"up_proj",
|
| 37 |
-
"gate_proj"
|
|
|
|
| 38 |
],
|
| 39 |
"target_parameters": null,
|
| 40 |
"task_type": "SEQ_CLS",
|
|
|
|
| 19 |
"megatron_config": null,
|
| 20 |
"megatron_core": "megatron.core",
|
| 21 |
"modules_to_save": [
|
| 22 |
+
"classifier",
|
| 23 |
+
"score",
|
| 24 |
"classifier",
|
| 25 |
"score"
|
| 26 |
],
|
|
|
|
| 30 |
"rank_pattern": {},
|
| 31 |
"revision": null,
|
| 32 |
"target_modules": [
|
|
|
|
|
|
|
| 33 |
"k_proj",
|
| 34 |
"q_proj",
|
| 35 |
+
"down_proj",
|
| 36 |
"v_proj",
|
| 37 |
"up_proj",
|
| 38 |
+
"gate_proj",
|
| 39 |
+
"o_proj"
|
| 40 |
],
|
| 41 |
"target_parameters": null,
|
| 42 |
"task_type": "SEQ_CLS",
|
tokenizer.json
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:9c5ae00e602b8860cbd784ba82a8aa14e8feecec692e7076590d014d7b7fdafa
|
| 3 |
+
size 11421896
|