setfit-bert-a6-8per / README.md
rkoh's picture
Add SetFit model
4f576c0 verified
metadata
library_name: setfit
metrics:
  - accuracy
pipeline_tag: text-classification
tags:
  - setfit
  - sentence-transformers
  - text-classification
  - generated_from_setfit_trainer
widget:
  - text: >-
      (a) The enterprise fund may be used to cover closure costs only for major
      waste tire facilities operated by government agencies. (b) The enterprise
      fund shall dedicate its revenue exclusively or with exclusive first
      priority to financing closure activities. (c) The enterprise fund shall be
      established and the documents shall be worded as specified by using form
      CalRecycle 144 "Enterprise Fund for Financial Assurances" (03/17), which
      is incorporated herein by reference. (See Appendix A.) The wording,
      however, may be modified to accommodate special circumstances on a
      case-by-case basis, as approved by the Board or its designee. (d) Revenue
      generated by an enterprise fund shall be deposited into a financial
      assurance mechanism which: (1) Provides equivalent protection to a trust
      fund as described in section 18474 of this Article; (2) Shall be funded
      within five years as described in Section 18474 of this Article; (3) Is
      used exclusively to finance closure activities and shall remain inviolate
      against all other claims, including any claims by the operator, the
      operator's governing body, and the creditors of the operator and its
      governing body; (4) Authorizes the Board or its designee to direct the
      provider of financial assurance to pay closure costs if the Board or its
      designee determines that the operator has failed to perform closure
      activities covered by the mechanism; (5) Is maintained by a provider whose
      financial operations are regulated by a federal or state agency, or the
      provider is otherwise certain to maintain and disburse the assured funds
      properly; (6) Is maintained by a provider who has authority to invest
      revenue deposited into the mechanism. (7) Meets other requirements that
      the Board determines are necessary to ensure that the assured amount of
      funds shall be available for closure activities in a timely manner.
  - text: >-
      (a) Various laws provide for the issuance of certifications by the state
      board or regional boards. These regulations specify how the state board
      and the regional boards implement various certification programs and how
      the state board acts on petitions for reconsideration of certification
      actions or failures to act by the executive director, regional boards, and
      executive officers. (b) Within five years from the effective date of these
      regulations, the state board, in consultation with the Secretary for
      Environmental Protection, shall review the provisions of this Chapter to
      determine whether they should be retained, revised, or repealed.
  - text: >-
      The Tax Reform Act of 1986, as amended, (the "act") establishes a Federal
      tax credit ("low- income housing credit," "LIHTC" or "credit")
      administered by state housing agencies for owners of housing for persons
      of low-income. The act authorizes the governor of each state to allocate
      the low-income housing credit ceiling among governmental units and other
      issuing authorities in the state. The act requires that the allocation of
      credit to owners of low-income housing be coordinated by a single state
      housing credit agency. The act further requires each agency allocating
      credits to adopt a qualified allocation plan (the "plan" or the "QAP")
      which sets forth the criteria and preferences by which credit will be
      allocated to projects. By Executive Order, the New York State Division of
      Housing and Community Renewal has been designated as the State Housing
      Credit Agency to allocate the credit in a manner which maximizes the
      public benefit by addressing the State's need for low-income housing and
      community revitalization incentives. In order to provide for the effective
      coordination of the State's low-income housing credit program with section
      42 of the United States Internal Revenue Code (the "code"), this plan
      shall be construed and administered in a manner consistent with the code
      and regulations promulgated thereunder.
  - text: >-
      (1) The purpose of these rules is to provide administrative procedures for
      fetal, infant, and maternal death reviews, and maternal and family
      interviews, or both. (2) The program brings together key members of the
      community to review cases of fetal, infant, and maternal deaths in order
      to identify the factors associated with those deaths, to determine if
      those deaths represent system issues that require change, to develop
      recommendations for change, and to assist in the implementation of change.
      (3) The program's goal is to enhance the health and well-being of women,
      infants, and families by improving the community resources and service
      delivery systems available to them. The programs are operated under the
      auspices of the Alabama Department of Public Health (ADPH), Bureau of
      Family Health Services, State Perinatal Program.
  - text: >-
      The regulations contained in this article govern procedures affecting the
      appeal to the Board of orders to comply with the Surface Mining and
      Reclamation Act of 1975 (SMARA) issued by the supervisor of the Division
      of Mine Reclamation (DMR), or by the Board when acting in the capacity of
      lead agency pursuant to Public Resources Code Section 2774.4 or 2774.5.
inference: true

SetFit

This is a SetFit model that can be used for Text Classification. A LogisticRegression instance is used for classification.

The model has been trained using an efficient few-shot learning technique that involves:

  1. Fine-tuning a Sentence Transformer with contrastive learning.
  2. Training a classification head with features from the fine-tuned Sentence Transformer.

Model Details

Model Description

  • Model Type: SetFit
  • Classification head: a LogisticRegression instance
  • Maximum Sequence Length: 512 tokens
  • Number of Classes: 32 classes

Model Sources

Uses

Direct Use for Inference

First install the SetFit library:

pip install setfit

Then you can load this model and run inference.

from setfit import SetFitModel

# Download from the 🤗 Hub
model = SetFitModel.from_pretrained("rkoh/setfit-bert-a6-8per")
# Run inference
preds = model("The regulations contained in this article govern procedures affecting the appeal to the Board of orders to comply with the Surface Mining and Reclamation Act of 1975 (SMARA) issued by the supervisor of the Division of Mine Reclamation (DMR), or by the Board when acting in the capacity of lead agency pursuant to Public Resources Code Section 2774.4 or 2774.5.")

Training Details

Training Set Metrics

Training set Min Median Max
Word count tensor(31) tensor(329.9688) tensor(4265)
Label Training Sample Count
non-purpose 0
purpose-administrative 0
purpose-regulatory 0
purpose-with-authority 0
purpose-with-scope 0

Training Hyperparameters

  • batch_size: (32, 32)
  • num_epochs: (1, 1)
  • max_steps: -1
  • sampling_strategy: oversampling
  • num_iterations: 20
  • body_learning_rate: (2e-05, 1e-05)
  • head_learning_rate: 0.01
  • loss: CosineSimilarityLoss
  • distance_metric: cosine_distance
  • margin: 0.25
  • end_to_end: False
  • use_amp: False
  • warmup_proportion: 0.1
  • l2_weight: 0.01
  • seed: 42
  • eval_max_steps: -1
  • load_best_model_at_end: True

Training Results

Epoch Step Training Loss Validation Loss
0.025 1 0.478 -
0.25 10 0.3818 -
0.5 20 0.3011 -
0.75 30 0.2555 -
1.0 40 0.1937 0.2208

Framework Versions

  • Python: 3.10.12
  • SetFit: 1.1.0
  • Sentence Transformers: 3.2.1
  • Transformers: 4.44.2
  • PyTorch: 2.5.0+cu121
  • Datasets: 3.0.2
  • Tokenizers: 0.19.1

Citation

BibTeX

@article{https://doi.org/10.48550/arxiv.2209.11055,
    doi = {10.48550/ARXIV.2209.11055},
    url = {https://arxiv.org/abs/2209.11055},
    author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
    keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
    title = {Efficient Few-Shot Learning Without Prompts},
    publisher = {arXiv},
    year = {2022},
    copyright = {Creative Commons Attribution 4.0 International}
}