Add SetFit model
Browse files- README.md +43 -43
- config_setfit.json +2 -2
- model.safetensors +1 -1
- model_head.pkl +1 -1
README.md
CHANGED
|
@@ -1,37 +1,41 @@
|
|
| 1 |
---
|
| 2 |
-
library_name: setfit
|
| 3 |
-
tags:
|
| 4 |
-
- setfit
|
| 5 |
-
- sentence-transformers
|
| 6 |
-
- text-classification
|
| 7 |
-
- generated_from_setfit_trainer
|
| 8 |
base_model: sentence-transformers/paraphrase-mpnet-base-v2
|
|
|
|
| 9 |
metrics:
|
| 10 |
- accuracy
|
| 11 |
- precision
|
| 12 |
- recall
|
| 13 |
- f1
|
| 14 |
-
widget:
|
| 15 |
-
- text: table drinking water threatsresource5c5007c2 56ef 4fb8 a37b 4f6dc9e37295
|
| 16 |
-
- text: physical profile data collected oregon coast provide observation seasoar tow
|
| 17 |
-
support globec northeast pacific mesoscale survey august 2000 ncei accession 0001050
|
| 18 |
-
- text: wood production intensification area aipl area intensification wood production
|
| 19 |
-
aipl territory mainly intended wood production silvicultural work aim increase
|
| 20 |
-
value per unit area volume per stem quality stem production desired specie combination
|
| 21 |
-
various production objective official register aipl applicable according article
|
| 22 |
-
69 sustainable forest management act ladtfthis third party metadata element translated
|
| 23 |
-
using automated translation tool amazon translate formdescriptors natureandenvironment
|
| 24 |
-
scienceandtechnology aipl area high forest potential sustainable forest management
|
| 25 |
-
forest development public forest silvicultural investment forest territorial limit
|
| 26 |
-
wood production stf forest territorial subdivision government information
|
| 27 |
-
- text: shelter profile information
|
| 28 |
-
- text: incident cerregulated pipeline facility canada energy regulator provides data
|
| 29 |
-
incident cerregulated pipeline facility defined onshore pipeline regulation processing
|
| 30 |
-
plant regulation data updated quarterlythe data provided 2008 current used visualization
|
| 31 |
-
tool canada energy regulator website source code visualization tool also available
|
| 32 |
-
economicsandindustry healthandsafety natureandenvironment cer pipeline safety
|
| 33 |
-
incident incident pipeline incident pipeline spill
|
| 34 |
pipeline_tag: text-classification
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 35 |
inference: false
|
| 36 |
model-index:
|
| 37 |
- name: SetFit with sentence-transformers/paraphrase-mpnet-base-v2
|
|
@@ -45,16 +49,16 @@ model-index:
|
|
| 45 |
split: test
|
| 46 |
metrics:
|
| 47 |
- type: accuracy
|
| 48 |
-
value: 0.
|
| 49 |
name: Accuracy
|
| 50 |
- type: precision
|
| 51 |
-
value:
|
| 52 |
name: Precision
|
| 53 |
- type: recall
|
| 54 |
-
value: 0.
|
| 55 |
name: Recall
|
| 56 |
- type: f1
|
| 57 |
-
value: 0.
|
| 58 |
name: F1
|
| 59 |
---
|
| 60 |
|
|
@@ -88,9 +92,9 @@ The model has been trained using an efficient few-shot learning technique that i
|
|
| 88 |
## Evaluation
|
| 89 |
|
| 90 |
### Metrics
|
| 91 |
-
| Label | Accuracy | Precision | Recall | F1
|
| 92 |
-
|
| 93 |
-
| **all** | 0.
|
| 94 |
|
| 95 |
## Uses
|
| 96 |
|
|
@@ -110,7 +114,7 @@ from setfit import SetFitModel
|
|
| 110 |
# Download from the 🤗 Hub
|
| 111 |
model = SetFitModel.from_pretrained("lgd/setfit-multilabel")
|
| 112 |
# Run inference
|
| 113 |
-
preds = model("
|
| 114 |
```
|
| 115 |
|
| 116 |
<!--
|
|
@@ -142,13 +146,14 @@ preds = model("shelter profile information")
|
|
| 142 |
### Training Set Metrics
|
| 143 |
| Training set | Min | Median | Max |
|
| 144 |
|:-------------|:----|:-------|:----|
|
| 145 |
-
| Word count |
|
| 146 |
|
| 147 |
### Training Hyperparameters
|
| 148 |
-
- batch_size: (
|
| 149 |
-
- num_epochs: (
|
| 150 |
- max_steps: -1
|
| 151 |
- sampling_strategy: oversampling
|
|
|
|
| 152 |
- body_learning_rate: (2e-05, 1e-05)
|
| 153 |
- head_learning_rate: 0.01
|
| 154 |
- loss: CosineSimilarityLoss
|
|
@@ -164,12 +169,7 @@ preds = model("shelter profile information")
|
|
| 164 |
### Training Results
|
| 165 |
| Epoch | Step | Training Loss | Validation Loss |
|
| 166 |
|:-----:|:----:|:-------------:|:---------------:|
|
| 167 |
-
| 0.
|
| 168 |
-
| 6.25 | 50 | 0.0408 | - |
|
| 169 |
-
| 0.125 | 1 | 0.0929 | - |
|
| 170 |
-
| 6.25 | 50 | 0.0393 | - |
|
| 171 |
-
| 0.125 | 1 | 0.0134 | - |
|
| 172 |
-
| 6.25 | 50 | 0.0165 | - |
|
| 173 |
|
| 174 |
### Framework Versions
|
| 175 |
- Python: 3.10.12
|
|
|
|
| 1 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
base_model: sentence-transformers/paraphrase-mpnet-base-v2
|
| 3 |
+
library_name: setfit
|
| 4 |
metrics:
|
| 5 |
- accuracy
|
| 6 |
- precision
|
| 7 |
- recall
|
| 8 |
- f1
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
pipeline_tag: text-classification
|
| 10 |
+
tags:
|
| 11 |
+
- setfit
|
| 12 |
+
- sentence-transformers
|
| 13 |
+
- text-classification
|
| 14 |
+
- generated_from_setfit_trainer
|
| 15 |
+
widget:
|
| 16 |
+
- text: topographic mapping physical location tree
|
| 17 |
+
- text: forest tenure harvesting authority polygon spatial layer reflects operational
|
| 18 |
+
activity harvesting authority harvesting authority legal area cleared ministry
|
| 19 |
+
forest range claim land exist harvesting purpose corresponds outlined area exhibit
|
| 20 |
+
map forest tenure section ft responsible creation maintenance digital forest atlas
|
| 21 |
+
file province british columbia encompassing forest range act tenure also support
|
| 22 |
+
forest resource program delivered mofr feature contains ministry forest range
|
| 23 |
+
mofr featureclassskey number column defines type feature layer contains harvesting
|
| 24 |
+
authority boundary following feature class christmas tree permit 489 forest licence
|
| 25 |
+
cutting permit 556 licence cut 615 timber licence cutting permit 811 timber sale
|
| 26 |
+
licence major cutting permit 818 timber sale licence minor 819 tree farm licence
|
| 27 |
+
cutting permit 834 woodlot licence cutting permit 864 timber sale licence minor
|
| 28 |
+
cutting permit 917 community forest cutting permit 2401 harvesting authority life
|
| 29 |
+
cycle status code either pendingthe harvesting authority submitted new harvesting
|
| 30 |
+
authority amendment yet approved rejected activethe harvesting authority approved
|
| 31 |
+
activity may taking place harvesting authority retiredall activity completed harvesting
|
| 32 |
+
authority formdescriptors natureandenvironment scienceandtechnology canada fta
|
| 33 |
+
cut permit forest forest tenure harvest government information
|
| 34 |
+
- text: concentration ultrapure water soluble aerosol trace element collected bulk
|
| 35 |
+
aerosol sample 2015 u geotraces western arctic transect uscgc healy hly1502 august
|
| 36 |
+
october 2015 ncei accession 0277333
|
| 37 |
+
- text: death registration ontario location
|
| 38 |
+
- text: aquatic resource area survey point
|
| 39 |
inference: false
|
| 40 |
model-index:
|
| 41 |
- name: SetFit with sentence-transformers/paraphrase-mpnet-base-v2
|
|
|
|
| 49 |
split: test
|
| 50 |
metrics:
|
| 51 |
- type: accuracy
|
| 52 |
+
value: 0.125
|
| 53 |
name: Accuracy
|
| 54 |
- type: precision
|
| 55 |
+
value: 0.6666666666666666
|
| 56 |
name: Precision
|
| 57 |
- type: recall
|
| 58 |
+
value: 0.2222222222222222
|
| 59 |
name: Recall
|
| 60 |
- type: f1
|
| 61 |
+
value: 0.3333333333333333
|
| 62 |
name: F1
|
| 63 |
---
|
| 64 |
|
|
|
|
| 92 |
## Evaluation
|
| 93 |
|
| 94 |
### Metrics
|
| 95 |
+
| Label | Accuracy | Precision | Recall | F1 |
|
| 96 |
+
|:--------|:---------|:----------|:-------|:-------|
|
| 97 |
+
| **all** | 0.125 | 0.6667 | 0.2222 | 0.3333 |
|
| 98 |
|
| 99 |
## Uses
|
| 100 |
|
|
|
|
| 114 |
# Download from the 🤗 Hub
|
| 115 |
model = SetFitModel.from_pretrained("lgd/setfit-multilabel")
|
| 116 |
# Run inference
|
| 117 |
+
preds = model("aquatic resource area survey point")
|
| 118 |
```
|
| 119 |
|
| 120 |
<!--
|
|
|
|
| 146 |
### Training Set Metrics
|
| 147 |
| Training set | Min | Median | Max |
|
| 148 |
|:-------------|:----|:-------|:----|
|
| 149 |
+
| Word count | 3 | 5.75 | 7 |
|
| 150 |
|
| 151 |
### Training Hyperparameters
|
| 152 |
+
- batch_size: (16, 16)
|
| 153 |
+
- num_epochs: (1, 1)
|
| 154 |
- max_steps: -1
|
| 155 |
- sampling_strategy: oversampling
|
| 156 |
+
- num_iterations: 20
|
| 157 |
- body_learning_rate: (2e-05, 1e-05)
|
| 158 |
- head_learning_rate: 0.01
|
| 159 |
- loss: CosineSimilarityLoss
|
|
|
|
| 169 |
### Training Results
|
| 170 |
| Epoch | Step | Training Loss | Validation Loss |
|
| 171 |
|:-----:|:----:|:-------------:|:---------------:|
|
| 172 |
+
| 0.05 | 1 | 0.0058 | - |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 173 |
|
| 174 |
### Framework Versions
|
| 175 |
- Python: 3.10.12
|
config_setfit.json
CHANGED
|
@@ -1,4 +1,4 @@
|
|
| 1 |
{
|
| 2 |
-
"
|
| 3 |
-
"
|
| 4 |
}
|
|
|
|
| 1 |
{
|
| 2 |
+
"normalize_embeddings": false,
|
| 3 |
+
"labels": null
|
| 4 |
}
|
model.safetensors
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 437967672
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:5f76e6d6812f4692b2394f819983598b48718cd9713ce9bc4bc704ddc0e65c9a
|
| 3 |
size 437967672
|
model_head.pkl
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 26916
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:092c0421b2b598ae0eae80b8867445970f46b4bc4448e929b8aba5d135959d78
|
| 3 |
size 26916
|