msullivan commited on
Commit
81dad17
·
verified ·
1 Parent(s): 3cb3aa1

Push model using huggingface_hub.

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,208 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - setfit
4
+ - sentence-transformers
5
+ - text-classification
6
+ - generated_from_setfit_trainer
7
+ widget:
8
+ - text: '"But PBMs operate with little to no transparency within the drug pricing
9
+ system, and they often take advantage of their opaque position at the expense
10
+ of patients. Their work includes establishing formularies, contracting with pharmacies,
11
+ and negotiating rebates and discounts with drug manufacturers. But instead of
12
+ passing these savings on to consumers, PBMs retain these costs, and the patients
13
+ do not benefit at the pharmacy counter. But it''s actually worse than that. Just
14
+ as a rising tide lifts all boats, PBMs'' rebate manipulation inflates health care
15
+ prices generally and that ultimately increases the cost of patients'' medications."'
16
+ - text: '"That''s why our state''s local pharmacies are so essential. They provide
17
+ people access to the care they need when they need it. But now, many pharmacies
18
+ are under serious threatand our most vulnerable patients along with them. Over
19
+ the past 14 years, the number of Oregon pharmacies has decreased more than 26%.
20
+ Accessing medications or treatments should be simple, but unfortunately it''s
21
+ only becoming more difficult. Why is this happening? One reason involves middlemen
22
+ insurers called pharmacy benefit managers (PBMs)."'
23
+ - text: '"But more often, insurers and PBMs have implemented schemes called \"copay
24
+ accumulator adjustment programs\" that prevent the value of the copay assistance
25
+ from counting toward a patient''s deductible. Faced with unexpectedly high costs
26
+ at the pharmacy counter, patients impacted by these policies are less likely to
27
+ adhere to treatment which can lead to worsened health outcomes, increased hospitalizations,
28
+ and greater costs to the health care system. Copay accumulator policies disproportionately
29
+ impact communities of color."'
30
+ - text: '"PBMs also compile lists of drugs, called formularies, that providers of
31
+ health benefits agree to cover; establish pharmacy networks that patients can
32
+ access; and run their own mail-order pharmacies. Although PBMs are supposed to
33
+ help lower costs, some of their practices may well do the opposite. PBMs often
34
+ keep a portion of the rebates they negotiate, which can incentivize them to favor
35
+ more expensive drugs on their formularies. (A $1 million drug, for example, would
36
+ fetch a bigger fee than a $100 one."'
37
+ - text: '"This secrecy raises challenging questions. Do PBMs use their size and negotiating
38
+ power to win lower net prices from drugmakers? Or do PBMs use their dominant market
39
+ position and opaque business practices to enrich themselves at the expense of
40
+ their customers and the rest of society? The answer to both these questions is,
41
+ surprisingly, yes. If the contest for formulary placement works as it should,
42
+ competition compels drugmakers to offer substantial discounts off the published
43
+ list price. As a result, insurers and consumers benefit from a reduced net price
44
+ for drugs. However, formulary competition can be undermined in various ways."'
45
+ metrics:
46
+ - accuracy
47
+ pipeline_tag: text-classification
48
+ library_name: setfit
49
+ inference: true
50
+ base_model: sentence-transformers/all-mpnet-base-v2
51
+ ---
52
+
53
+ # SetFit with sentence-transformers/all-mpnet-base-v2
54
+
55
+ This is a [SetFit](https://github.com/huggingface/setfit) model that can be used for Text Classification. This SetFit model uses [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2) as the Sentence Transformer embedding model. A [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) instance is used for classification.
56
+
57
+ The model has been trained using an efficient few-shot learning technique that involves:
58
+
59
+ 1. Fine-tuning a [Sentence Transformer](https://www.sbert.net) with contrastive learning.
60
+ 2. Training a classification head with features from the fine-tuned Sentence Transformer.
61
+
62
+ ## Model Details
63
+
64
+ ### Model Description
65
+ - **Model Type:** SetFit
66
+ - **Sentence Transformer body:** [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2)
67
+ - **Classification head:** a [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) instance
68
+ - **Maximum Sequence Length:** 384 tokens
69
+ - **Number of Classes:** 2 classes
70
+ <!-- - **Training Dataset:** [Unknown](https://huggingface.co/datasets/unknown) -->
71
+ <!-- - **Language:** Unknown -->
72
+ <!-- - **License:** Unknown -->
73
+
74
+ ### Model Sources
75
+
76
+ - **Repository:** [SetFit on GitHub](https://github.com/huggingface/setfit)
77
+ - **Paper:** [Efficient Few-Shot Learning Without Prompts](https://arxiv.org/abs/2209.11055)
78
+ - **Blogpost:** [SetFit: Efficient Few-Shot Learning Without Prompts](https://huggingface.co/blog/setfit)
79
+
80
+ ### Model Labels
81
+ | Label | Examples |
82
+ |:-----------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
83
+ | Critical | <ul><li>'"That\'s why our state\'s local pharmacies are so essential. They provide people access to the care they need when they need it. But now, many pharmacies are under serious threatand our most vulnerable patients along with them. Over the past 14 years, the number of Oregon pharmacies has decreased more than 26%. Accessing medications or treatments should be simple, but unfortunately it\'s only becoming more difficult. Why is this happening? One reason involves middlemen insurers called pharmacy benefit managers (PBMs)."'</li><li>'"Unfortunately, anti-patient policies practiced by health insurance companies and health care middlemen known as pharmacy benefit managers (PBMs) impose unnecessary access and affordability barriers for epilepsy patients ? things like fail first or step therapy requirement, prior authorization, and pocketing billions in discounts without passing savings onto patients. Many patients benefit from copay coupons and copay assistance, which often come in the form of discounts from drug manufacturers and charitable organizations to help patients afford their medicine."'</li><li>'"But PBMs operate with little to no transparency within the drug pricing system, and they often take advantage of their opaque position at the expense of patients. Their work includes establishing formularies, contracting with pharmacies, and negotiating rebates and discounts with drug manufacturers. But instead of passing these savings on to consumers, PBMs retain these costs, and the patients do not benefit at the pharmacy counter. But it\'s actually worse than that. Just as a rising tide lifts all boats, PBMs\' rebate manipulation inflates health care prices generally and that ultimately increases the cost of patients\' medications."'</li></ul> |
84
+ | Supportive | <ul><li>'"Supporters of these bills claim they are about ?protecting patient choice,? but there?s not much of a choice when you can?t afford your medication to begin with. Patients don?t need laws that make it easier for Big Pharma to charge more. They need laws that encourage competition and lower prices. The average patient saves ? over $1,000 a year thanks to PBM negotiations. Take that away, and the only winner is the pharmaceutical industry. These bills don?t lower drug prices, they just shift the cost burden onto families, employers, and taxpayers. That?s not reform."'</li><li>'"This legislation, meant to punish a Pharmacy Benefit Manager, is driving up the cost of drugs for hard-working Tennesseans who were receiving their drugs at little to no cost. Not only is this in-house pharmacy losing business, but the school system is also having to include additional funding into its health insurance plan to cover additional pharmacy costs costs which were completely imposed by government action and not the rising cost of insurance. Remarkably, this means that the state government\'s actions are now being paid for by a local government."'</li><li>'"PBMs are third-party administrators of prescription medicine plans for insurance companies, businesses large and small, and government health plans. They administer the plan\'s drug formulary, process prescription claims and negotiate discounts with drug manufacturers. Basically, PBMs act as a check and balance like in our system of government on pharmaceutical companies, obtaining price discounts for the consumer in the form of rebates. Sanders\' bill would gut their ability to negotiate, under the mistaken assumption that they are the \\"bad guy,\\" and it sailed through the Senate health committee by a terrifying 18-3 vote."'</li></ul> |
85
+
86
+ ## Uses
87
+
88
+ ### Direct Use for Inference
89
+
90
+ First install the SetFit library:
91
+
92
+ ```bash
93
+ pip install setfit
94
+ ```
95
+
96
+ Then you can load this model and run inference.
97
+
98
+ ```python
99
+ from setfit import SetFitModel
100
+
101
+ # Download from the 🤗 Hub
102
+ model = SetFitModel.from_pretrained("setfit_model_id")
103
+ # Run inference
104
+ preds = model("\"PBMs also compile lists of drugs, called formularies, that providers of health benefits agree to cover; establish pharmacy networks that patients can access; and run their own mail-order pharmacies. Although PBMs are supposed to help lower costs, some of their practices may well do the opposite. PBMs often keep a portion of the rebates they negotiate, which can incentivize them to favor more expensive drugs on their formularies. (A $1 million drug, for example, would fetch a bigger fee than a $100 one.\"")
105
+ ```
106
+
107
+ <!--
108
+ ### Downstream Use
109
+
110
+ *List how someone could finetune this model on their own dataset.*
111
+ -->
112
+
113
+ <!--
114
+ ### Out-of-Scope Use
115
+
116
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
117
+ -->
118
+
119
+ <!--
120
+ ## Bias, Risks and Limitations
121
+
122
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
123
+ -->
124
+
125
+ <!--
126
+ ### Recommendations
127
+
128
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
129
+ -->
130
+
131
+ ## Training Details
132
+
133
+ ### Training Set Metrics
134
+ | Training set | Min | Median | Max |
135
+ |:-------------|:----|:--------|:----|
136
+ | Word count | 74 | 88.9474 | 100 |
137
+
138
+ | Label | Training Sample Count |
139
+ |:-----------|:----------------------|
140
+ | Supportive | 8 |
141
+ | Critical | 11 |
142
+
143
+ ### Training Hyperparameters
144
+ - batch_size: (8, 8)
145
+ - num_epochs: (2, 2)
146
+ - max_steps: -1
147
+ - sampling_strategy: oversampling
148
+ - body_learning_rate: (2e-05, 1e-05)
149
+ - head_learning_rate: 0.01
150
+ - loss: CosineSimilarityLoss
151
+ - distance_metric: cosine_distance
152
+ - margin: 0.25
153
+ - end_to_end: False
154
+ - use_amp: False
155
+ - warmup_proportion: 0.1
156
+ - l2_weight: 0.01
157
+ - seed: 42
158
+ - eval_max_steps: -1
159
+ - load_best_model_at_end: False
160
+
161
+ ### Training Results
162
+ | Epoch | Step | Training Loss | Validation Loss |
163
+ |:------:|:----:|:-------------:|:---------------:|
164
+ | 0.0385 | 1 | 0.201 | - |
165
+ | 1.9231 | 50 | 0.1192 | - |
166
+
167
+ ### Framework Versions
168
+ - Python: 3.10.6
169
+ - SetFit: 1.1.1
170
+ - Sentence Transformers: 3.4.1
171
+ - Transformers: 4.50.1
172
+ - PyTorch: 2.6.0
173
+ - Datasets: 3.4.1
174
+ - Tokenizers: 0.21.1
175
+
176
+ ## Citation
177
+
178
+ ### BibTeX
179
+ ```bibtex
180
+ @article{https://doi.org/10.48550/arxiv.2209.11055,
181
+ doi = {10.48550/ARXIV.2209.11055},
182
+ url = {https://arxiv.org/abs/2209.11055},
183
+ author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
184
+ keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
185
+ title = {Efficient Few-Shot Learning Without Prompts},
186
+ publisher = {arXiv},
187
+ year = {2022},
188
+ copyright = {Creative Commons Attribution 4.0 International}
189
+ }
190
+ ```
191
+
192
+ <!--
193
+ ## Glossary
194
+
195
+ *Clearly define terms in order to be accessible across audiences.*
196
+ -->
197
+
198
+ <!--
199
+ ## Model Card Authors
200
+
201
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
202
+ -->
203
+
204
+ <!--
205
+ ## Model Card Contact
206
+
207
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
208
+ -->
config.json ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "MPNetModel"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "bos_token_id": 0,
7
+ "eos_token_id": 2,
8
+ "hidden_act": "gelu",
9
+ "hidden_dropout_prob": 0.1,
10
+ "hidden_size": 768,
11
+ "initializer_range": 0.02,
12
+ "intermediate_size": 3072,
13
+ "layer_norm_eps": 1e-05,
14
+ "max_position_embeddings": 514,
15
+ "model_type": "mpnet",
16
+ "num_attention_heads": 12,
17
+ "num_hidden_layers": 12,
18
+ "pad_token_id": 1,
19
+ "relative_attention_num_buckets": 32,
20
+ "torch_dtype": "float32",
21
+ "transformers_version": "4.50.1",
22
+ "vocab_size": 30527
23
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.4.1",
4
+ "transformers": "4.50.1",
5
+ "pytorch": "2.6.0"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": "cosine"
10
+ }
config_setfit.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "normalize_embeddings": false,
3
+ "labels": [
4
+ "Supportive",
5
+ "Critical"
6
+ ]
7
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1774a473bd01df210c7f8a2b73f0e2d955730f92963ab913e46e156feb78a2ad
3
+ size 437967672
model_head.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9f24fe88b86152368b768cada4ac4e5ae525bf24ca4b6e742899786ace9e2804
3
+ size 7007
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 384,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "cls_token": {
10
+ "content": "<s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "eos_token": {
17
+ "content": "</s>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "mask_token": {
24
+ "content": "<mask>",
25
+ "lstrip": true,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "pad_token": {
31
+ "content": "<pad>",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "sep_token": {
38
+ "content": "</s>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false
43
+ },
44
+ "unk_token": {
45
+ "content": "[UNK]",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false
50
+ }
51
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,73 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "<s>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "<pad>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "</s>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "<unk>",
29
+ "lstrip": false,
30
+ "normalized": true,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "104": {
36
+ "content": "[UNK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ },
43
+ "30526": {
44
+ "content": "<mask>",
45
+ "lstrip": true,
46
+ "normalized": false,
47
+ "rstrip": false,
48
+ "single_word": false,
49
+ "special": true
50
+ }
51
+ },
52
+ "bos_token": "<s>",
53
+ "clean_up_tokenization_spaces": false,
54
+ "cls_token": "<s>",
55
+ "do_lower_case": true,
56
+ "eos_token": "</s>",
57
+ "extra_special_tokens": {},
58
+ "mask_token": "<mask>",
59
+ "max_length": 128,
60
+ "model_max_length": 384,
61
+ "pad_to_multiple_of": null,
62
+ "pad_token": "<pad>",
63
+ "pad_token_type_id": 0,
64
+ "padding_side": "right",
65
+ "sep_token": "</s>",
66
+ "stride": 0,
67
+ "strip_accents": null,
68
+ "tokenize_chinese_chars": true,
69
+ "tokenizer_class": "MPNetTokenizer",
70
+ "truncation_side": "right",
71
+ "truncation_strategy": "longest_first",
72
+ "unk_token": "[UNK]"
73
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff