haukelicht
/

all-mpnet-base-v2_economic-attributes-classifier

@@ -1,130 +1,167 @@
 ---
-tags:
-- setfit
-- sentence-transformers
-- text-classification
-- generated_from_setfit_trainer
-widget: []
-metrics:
-- accuracy
-pipeline_tag: text-classification
-library_name: setfit
-inference: true
 ---
-# SetFit
-This is a [SetFit](https://github.com/huggingface/setfit) model that can be used for Text Classification. A SetFitHeadWithClassWeights instance is used for classification.
-The model has been trained using an efficient few-shot learning technique that involves:
-1. Fine-tuning a [Sentence Transformer](https://www.sbert.net) with contrastive learning.
-2. Training a classification head with features from the fine-tuned Sentence Transformer.
 ## Model Details
 ### Model Description
-- **Model Type:** SetFit
-<!-- - **Sentence Transformer:** [Unknown](https://huggingface.co/unknown) -->
-- **Classification head:** a SetFitHeadWithClassWeights instance
-- **Maximum Sequence Length:** 384 tokens
-- **Number of Classes:** 6 classes
-<!-- - **Training Dataset:** [Unknown](https://huggingface.co/datasets/unknown) -->
-<!-- - **Language:** Unknown -->
-<!-- - **License:** Unknown -->
 ### Model Sources
-- **Repository:** [SetFit on GitHub](https://github.com/huggingface/setfit)
-- **Paper:** [Efficient Few-Shot Learning Without Prompts](https://arxiv.org/abs/2209.11055)
-- **Blogpost:** [SetFit: Efficient Few-Shot Learning Without Prompts](https://huggingface.co/blog/setfit)
 ## Uses
-### Direct Use for Inference
-First install the SetFit library:
-```bash
-pip install setfit
-```
-Then you can load this model and run inference.
 ```python
 from setfit import SetFitModel
-# Download from the 🤗 Hub
-model = SetFitModel.from_pretrained("setfit_model_id")
-# Run inference
-preds = model("I loved the spiderman movie!")
 ```
-<!--
-### Downstream Use
-*List how someone could finetune this model on their own dataset.*
--->
-<!--
-### Out-of-Scope Use
-*List how the model may foreseeably be misused and address what users ought not to do with the model.*
--->
-<!--
-## Bias, Risks and Limitations
-*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
--->
-<!--
-### Recommendations
-*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
--->
-## Training Details
-### Framework Versions
-- Python: 3.11.11
-- SetFit: 1.1.2
-- Sentence Transformers: 5.1.0
-- Transformers: 4.57.1
-- PyTorch: 2.6.0+cu124
-- Datasets: 3.5.0
-- Tokenizers: 0.22.1
-## Citation
-### BibTeX
-```bibtex
-@article{https://doi.org/10.48550/arxiv.2209.11055,
-    doi = {10.48550/ARXIV.2209.11055},
-    url = {https://arxiv.org/abs/2209.11055},
-    author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
-    keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
-    title = {Efficient Few-Shot Learning Without Prompts},
-    publisher = {arXiv},
-    year = {2022},
-    copyright = {Creative Commons Attribution 4.0 International}
-}
-```
-<!--
-## Glossary
-*Clearly define terms in order to be accessible across audiences.*
--->
-<!--
-## Model Card Authors
-*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
--->
-<!--
 ## Model Card Contact
-*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
--->

 ---
+base_model: sentence-transformers/all-mpnet-base-v2
+language:
+  - en
+license: apache-2.0
+tags:
+  - economic-attributes
+  - mention-classification
+  - mpnet-base-v2
+  - setfit
+  - multi-label-classification
+model-index:
+- name: all-mpnet-base-v2_economic-attributes-classifier
+  results:
+  - task:
+      type: multi-label-classification
+      name: Multi-label classification
+    metrics:
+    - type: _tba_
+      value: -1.0
+    dataset:
+      type: custom
+      name: custom human-labeled multi-label annotation dataset
 ---
+# Group mention economic attributes classifier
+A multi-label classifier for detecting **economic attribute** categories referred to in a social group mention, trained with `setfit` based on the light-weight [`sentence-transformers/all-mpnet-base-v2`](https://huggingface.co/sentence-transformers/all-mpnet-base-v2) sentence embedding model.
+The economic attributes classified are:
+| attribute                     | definition                                                                                                                                                                                   |
+|:------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| class membership              | People described with their membership in or belonging to a social class such as the upper class, the middle class, lower class, or the working class.                                       |
+| employment status             | People described or categorized by their employment status such as employers, employees, self-employed, or unemployed people.                                                                |
+| education level               | People described with or categorized by their education level such as students, apprentices, higher education, tertiary education, vocational training or graduates.                         |
+| income/wealth/economic status | People defined or categorized by their income, wealth, or economic status such as high/medium/low income groups, rich/poor people, homeowners/tenants/homeless.                              |
+| occupation/profession         | People referred to with or categorized according to their occupation or profession such as teachers, farmers, public servants, police officers                                               |
+| ecology of group              | People categorized by their relation to the ecology of society such as carbon emitters, coal miners, green employers, green workers, sustainable farmers, those working in the fossil sector |
 ## Model Details
 ### Model Description
+Group mention economic attributes classifier
+- **Developed by:** Hauke Licht
+- **Model type:** mpnet
+- **Language(s) (NLP):** ['en']
+- **License:** apache-2.0
+- **Finetuned from model:** sentence-transformers/all-mpnet-base-v2
+- **Funded by:** The *Deutsche Forschungsgemeinschaft* (DFG, German Research Foundation) under Germany's Excellence Strategy – EXC 2126/1 – 390838866
 ### Model Sources
+- **Repository:** _tba_
+- **Paper:** _tba_
+- **Demo:** [More Information Needed]
 ## Uses
+### Bias, Risks, and Limitations
+- Evaluation of the classifier in held-out data shows that it makes mistakes.
+- The model has been finetuned only on human-annotated labeled social group mentions recorded in sentences sampled from party manifestos of European parties (mostly far-right and Green parties). Applying the classifier in other domains can lead to higher error rates.
+- The data used to finetune the model come from human annotators. Human annotators can be biased and factors like gender and social background can impact their annotations judgments. This may lead to bias in the detection of specific social groups.
+#### Recommendations
+- Users who want to apply the model outside its training data domain should evaluate its performance in the target data.
+- Users who want to apply the model outside its training data domain  should contuninue to finetune this model on labeled data.
+### How to Get Started with the Model
+Use the code below to get started with the model.
+## Usage
+You can use the model with the [`setfit` python library](https://github.com/huggingface/setfit) (>=1.1.0):
+*Note:* It is recommended to use transformers version >=4.5.5,<=5.0.0 and sentence-transformers version >=4.0.1,<=5.1.0 for compatibility.
+### Classification
 ```python
+import torch
 from setfit import SetFitModel
+model_name = "haukelicht/all-mpnet-base-v2_economic-attributes-classifier"
+device = "cuda" if torch.cuda.is_available() else "mps" if torch.backends.mps.is_available() else "cpu"
+classifier = SetFitModel.from_pretrained(model_name)
+classifier.to(device);
+# Example mentions
+mentions = ["working class people", "highly-educated professionals", "people without a stable job"]
+# Get predictions
+predictions = classifier.predict(mentions)
+print(predictions)
+# Map predictions to labels
+[
+    [
+        classifier.id2label[l]
+        for l, p in enumerate(pred) if p==1
+    ]
+    for pred in predictions
+]
 ```
+### Mention embedding
+```python
+import torch
+from sentence_transformers import SentenceTransformer
+model_name = "haukelicht/all-mpnet-base-v2_economic-attributes-classifier"
+device = "cuda" if torch.cuda.is_available() else "mps" if torch.backends.mps.is_available() else "cpu"
+# Load the sentence transformer component of the pre-trained classifier
+model = SentenceTransformer(model_name, device=device)
+# Example mentions
+mentions = ["working class people", "highly-educated professionals", "people without a stable job"]
+# Compute mention embeddings
+embeddings = model.encode(mentions)
+````
+## Training Details
+### Training Data
+The train, dev, and test splits used for model finetuning and evaluation will be made available on Github upon publication of the associated research paper.
+### Training Procedure
+#### Training Hyperparameters
+- num epochs: (1, 4)
+- train batch sizes: (16, 4)
+- body train max teps: 100
+- head learning rate: 0.030
+- L2 weight: 0.015
+- warmup proportion: 0.10
+## Evaluation
+### Testing Data, Factors & Metrics
+#### Testing Data
+The train, dev, and test splits used for model finetuning and evaluation will be made available on Github upon publication of the associated research paper.
+## Citation
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
 ## Model Card Contact
+hauke.licht@uibk.ac.at

model_head.pkl CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:f807412922211387c9a5a35f52871c550b700f3e14d1a5b58d383ef4b56eaea8
 size 19966

 version https://git-lfs.github.com/spec/v1
+oid sha256:9109b9e144d648da1fbf0bc4024c61799e143a852eed3d19a1f51e45878803fc
 size 19966