Fine-tuned reranker for financial chatbot
Browse files- README.md +316 -0
- config.json +35 -0
- model.safetensors +3 -0
- special_tokens_map.json +37 -0
- tokenizer.json +0 -0
- tokenizer_config.json +59 -0
- training_config.json +10 -0
- vocab.txt +0 -0
README.md
ADDED
|
@@ -0,0 +1,316 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
tags:
|
| 3 |
+
- sentence-transformers
|
| 4 |
+
- cross-encoder
|
| 5 |
+
- reranker
|
| 6 |
+
- generated_from_trainer
|
| 7 |
+
- dataset_size:2000
|
| 8 |
+
- loss:BinaryCrossEntropyLoss
|
| 9 |
+
base_model: cross-encoder/ms-marco-MiniLM-L6-v2
|
| 10 |
+
pipeline_tag: text-ranking
|
| 11 |
+
library_name: sentence-transformers
|
| 12 |
+
---
|
| 13 |
+
|
| 14 |
+
# CrossEncoder based on cross-encoder/ms-marco-MiniLM-L6-v2
|
| 15 |
+
|
| 16 |
+
This is a [Cross Encoder](https://www.sbert.net/docs/cross_encoder/usage/usage.html) model finetuned from [cross-encoder/ms-marco-MiniLM-L6-v2](https://huggingface.co/cross-encoder/ms-marco-MiniLM-L6-v2) using the [sentence-transformers](https://www.SBERT.net) library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.
|
| 17 |
+
|
| 18 |
+
## Model Details
|
| 19 |
+
|
| 20 |
+
### Model Description
|
| 21 |
+
- **Model Type:** Cross Encoder
|
| 22 |
+
- **Base model:** [cross-encoder/ms-marco-MiniLM-L6-v2](https://huggingface.co/cross-encoder/ms-marco-MiniLM-L6-v2) <!-- at revision c5ee24cb16019beea0893ab7796b1df96625c6b8 -->
|
| 23 |
+
- **Maximum Sequence Length:** 512 tokens
|
| 24 |
+
- **Number of Output Labels:** 1 label
|
| 25 |
+
<!-- - **Training Dataset:** Unknown -->
|
| 26 |
+
<!-- - **Language:** Unknown -->
|
| 27 |
+
<!-- - **License:** Unknown -->
|
| 28 |
+
|
| 29 |
+
### Model Sources
|
| 30 |
+
|
| 31 |
+
- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
|
| 32 |
+
- **Documentation:** [Cross Encoder Documentation](https://www.sbert.net/docs/cross_encoder/usage/usage.html)
|
| 33 |
+
- **Repository:** [Sentence Transformers on GitHub](https://github.com/huggingface/sentence-transformers)
|
| 34 |
+
- **Hugging Face:** [Cross Encoders on Hugging Face](https://huggingface.co/models?library=sentence-transformers&other=cross-encoder)
|
| 35 |
+
|
| 36 |
+
## Usage
|
| 37 |
+
|
| 38 |
+
### Direct Usage (Sentence Transformers)
|
| 39 |
+
|
| 40 |
+
First install the Sentence Transformers library:
|
| 41 |
+
|
| 42 |
+
```bash
|
| 43 |
+
pip install -U sentence-transformers
|
| 44 |
+
```
|
| 45 |
+
|
| 46 |
+
Then you can load this model and run inference.
|
| 47 |
+
```python
|
| 48 |
+
from sentence_transformers import CrossEncoder
|
| 49 |
+
|
| 50 |
+
# Download from the 🤗 Hub
|
| 51 |
+
model = CrossEncoder("cross_encoder_model_id")
|
| 52 |
+
# Get scores for pairs of texts
|
| 53 |
+
pairs = [
|
| 54 |
+
['(I) Designation The taxpayer shall designate the imputed income limitation of each unit taken into account under such clause?', '(I) Designation\nThe taxpayer shall designate the imputed income limitation of each unit taken into\naccount under such clause.\n(II) Average test\nThe average of the imputed income limitations designated under subclause (I) shall not\nexceed 60 percent of area median gross income.\n(III) 10-percent increments\nThe designated imputed income limitation of any unit under subclause (I) shall be 20\npercent, 30 percent, 40 percent, 50 percent, 60 percent, 70 percent, or 80 percent of area\nmedian gross income.\nAny election under this paragraph, once made, shall be irrevocable. For purposes of this\nparagraph, any property shall not be treated as failing to be residential rental property merely\nbecause part of the building in which such property is located is used for purposes other than\nresidential rental purposes.\n(2) Rent-restricted units\n(A) In general\nFor purposes of paragraph (1), a residential unit is rent-restricted if the gross rent with respect'],
|
| 55 |
+
['Summarize the key points from usc26@118-78.pdf.', 'turn or other schedules. List the type and \namount of tax.\nOther taxes to be listed include the \nfollowing.\nForm 8978 adjustment. Complete the \nNegative Form 8978 Adjustment Work-\nsheet—Schedule 2 (Line 17z) if you are \nfiling Form 8978 and completed the \nworksheet in the Schedule 3, line 6l, in-\nstructions and the amount on line 3 of \nthat worksheet is negative.\n100'],
|
| 56 |
+
['Summarize the key points from usc26@118-78.pdf.', 'program who are included in a unit of employees covered by an agreement which the Secretary of Labor finds\nto be a collective bargaining agreement between employee representatives and one or more employers, if there\nis evidence that educational assistance benefits were the subject of good faith bargaining between such\nemployee representatives and such employer or employers."\nPub. L. 99–514, §1114(b)(4), substituted "highly compensated employees (within the meaning of section\n414(q))" for "officers, owners, or highly compensated,".\nSubsec. (b)(6). Pub. L. 99–514, §1151(c)(4)(B), struck out par. (6) which read as follows: "\n.—Reasonable notification of the availability and terms of the program\nNOTIFICATION OF EMPLOYEES\nmust be provided to eligible employees."\nSubsec. (d). Pub. L. 99–514, §1162(a)(1), substituted "December 31, 1987" for "December 31, 1985".\n1984—Subsec. (a). Pub. L. 98–611, §1(b), amended subsec. generally, substituting "Exclusion from gross'],
|
| 57 |
+
['Summarize the key points from usc26@118-78.pdf.', 'the taxpayer or by law. Taxpayers have the right to expect \nappropriate action will be taken against employees, return \npreparers, and others who wrongfully use or disclose taxpayer \nreturn information.\n9. The Right to Retain Representation\nTaxpayers have the right to retain an authorized representative \nof their choice to represent them in their dealings with the \nIRS. Taxpayers have the right to seek assistance from a Low \nIncome Taxpayer Clinic if they cannot afford representation.\n10. The Right to a Fair and Just Tax System\nTaxpayers have the right to expect the tax system to consider \nfacts and circumstances that might affect their underlying \nliabilities, ability to pay, or ability to provide information timely. \nTaxpayers have the right to receive assistance from the \nTaxpayer Advocate Service if they are experiencing financial \ndifficulty or if the IRS has not resolved their tax issues properly \nand timely through its normal channels. \n113'],
|
| 58 |
+
['Summarize the key points from usc26@118-78.pdf.', 'ceived, include the amount withheld in \nthe total on line 25b. This should be \nshown in box 4 of Form 1099, box 6 of \nForm SSA-1099, or box 10 of Form \nRRB-1099.\nLine 25c—Other Forms\nInclude on line 25c any federal income \ntax withheld on your Form(s) W-2G. \nThe amount withheld should be shown \nin box 4. Attach Form(s) W-2G to the \nfront of your return if federal income tax \nwas withheld.\nIf you had Additional Medicare Tax \nwithheld, include the amount shown on \nForm 8959, line 24, in the total on \nline 25c. Attach Form 8959.\nInclude on line 25c any federal in-\ncome tax withheld that is shown on a \nSchedule K-1.\nAlso include on line 25c any tax \nwithheld that is shown on Form 1042-S, \nForm 8805, or Form 8288-A. You \nshould attach the form to your return to \nclaim a credit for the withholding.\nLine 26\n2023 Estimated Tax \nPayments\nEnter any estimated federal income tax \npayments you made for 2023. Include \nany overpayment that you applied to \nyour 2023 estimated tax from your 2022'],
|
| 59 |
+
]
|
| 60 |
+
scores = model.predict(pairs)
|
| 61 |
+
print(scores.shape)
|
| 62 |
+
# (5,)
|
| 63 |
+
|
| 64 |
+
# Or rank different texts based on similarity to a single text
|
| 65 |
+
ranks = model.rank(
|
| 66 |
+
'(I) Designation The taxpayer shall designate the imputed income limitation of each unit taken into account under such clause?',
|
| 67 |
+
[
|
| 68 |
+
'(I) Designation\nThe taxpayer shall designate the imputed income limitation of each unit taken into\naccount under such clause.\n(II) Average test\nThe average of the imputed income limitations designated under subclause (I) shall not\nexceed 60 percent of area median gross income.\n(III) 10-percent increments\nThe designated imputed income limitation of any unit under subclause (I) shall be 20\npercent, 30 percent, 40 percent, 50 percent, 60 percent, 70 percent, or 80 percent of area\nmedian gross income.\nAny election under this paragraph, once made, shall be irrevocable. For purposes of this\nparagraph, any property shall not be treated as failing to be residential rental property merely\nbecause part of the building in which such property is located is used for purposes other than\nresidential rental purposes.\n(2) Rent-restricted units\n(A) In general\nFor purposes of paragraph (1), a residential unit is rent-restricted if the gross rent with respect',
|
| 69 |
+
'turn or other schedules. List the type and \namount of tax.\nOther taxes to be listed include the \nfollowing.\nForm 8978 adjustment. Complete the \nNegative Form 8978 Adjustment Work-\nsheet—Schedule 2 (Line 17z) if you are \nfiling Form 8978 and completed the \nworksheet in the Schedule 3, line 6l, in-\nstructions and the amount on line 3 of \nthat worksheet is negative.\n100',
|
| 70 |
+
'program who are included in a unit of employees covered by an agreement which the Secretary of Labor finds\nto be a collective bargaining agreement between employee representatives and one or more employers, if there\nis evidence that educational assistance benefits were the subject of good faith bargaining between such\nemployee representatives and such employer or employers."\nPub. L. 99–514, §1114(b)(4), substituted "highly compensated employees (within the meaning of section\n414(q))" for "officers, owners, or highly compensated,".\nSubsec. (b)(6). Pub. L. 99–514, §1151(c)(4)(B), struck out par. (6) which read as follows: "\n.—Reasonable notification of the availability and terms of the program\nNOTIFICATION OF EMPLOYEES\nmust be provided to eligible employees."\nSubsec. (d). Pub. L. 99–514, §1162(a)(1), substituted "December 31, 1987" for "December 31, 1985".\n1984—Subsec. (a). Pub. L. 98–611, §1(b), amended subsec. generally, substituting "Exclusion from gross',
|
| 71 |
+
'the taxpayer or by law. Taxpayers have the right to expect \nappropriate action will be taken against employees, return \npreparers, and others who wrongfully use or disclose taxpayer \nreturn information.\n9. The Right to Retain Representation\nTaxpayers have the right to retain an authorized representative \nof their choice to represent them in their dealings with the \nIRS. Taxpayers have the right to seek assistance from a Low \nIncome Taxpayer Clinic if they cannot afford representation.\n10. The Right to a Fair and Just Tax System\nTaxpayers have the right to expect the tax system to consider \nfacts and circumstances that might affect their underlying \nliabilities, ability to pay, or ability to provide information timely. \nTaxpayers have the right to receive assistance from the \nTaxpayer Advocate Service if they are experiencing financial \ndifficulty or if the IRS has not resolved their tax issues properly \nand timely through its normal channels. \n113',
|
| 72 |
+
'ceived, include the amount withheld in \nthe total on line 25b. This should be \nshown in box 4 of Form 1099, box 6 of \nForm SSA-1099, or box 10 of Form \nRRB-1099.\nLine 25c—Other Forms\nInclude on line 25c any federal income \ntax withheld on your Form(s) W-2G. \nThe amount withheld should be shown \nin box 4. Attach Form(s) W-2G to the \nfront of your return if federal income tax \nwas withheld.\nIf you had Additional Medicare Tax \nwithheld, include the amount shown on \nForm 8959, line 24, in the total on \nline 25c. Attach Form 8959.\nInclude on line 25c any federal in-\ncome tax withheld that is shown on a \nSchedule K-1.\nAlso include on line 25c any tax \nwithheld that is shown on Form 1042-S, \nForm 8805, or Form 8288-A. You \nshould attach the form to your return to \nclaim a credit for the withholding.\nLine 26\n2023 Estimated Tax \nPayments\nEnter any estimated federal income tax \npayments you made for 2023. Include \nany overpayment that you applied to \nyour 2023 estimated tax from your 2022',
|
| 73 |
+
]
|
| 74 |
+
)
|
| 75 |
+
# [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
|
| 76 |
+
```
|
| 77 |
+
|
| 78 |
+
<!--
|
| 79 |
+
### Direct Usage (Transformers)
|
| 80 |
+
|
| 81 |
+
<details><summary>Click to see the direct usage in Transformers</summary>
|
| 82 |
+
|
| 83 |
+
</details>
|
| 84 |
+
-->
|
| 85 |
+
|
| 86 |
+
<!--
|
| 87 |
+
### Downstream Usage (Sentence Transformers)
|
| 88 |
+
|
| 89 |
+
You can finetune this model on your own dataset.
|
| 90 |
+
|
| 91 |
+
<details><summary>Click to expand</summary>
|
| 92 |
+
|
| 93 |
+
</details>
|
| 94 |
+
-->
|
| 95 |
+
|
| 96 |
+
<!--
|
| 97 |
+
### Out-of-Scope Use
|
| 98 |
+
|
| 99 |
+
*List how the model may foreseeably be misused and address what users ought not to do with the model.*
|
| 100 |
+
-->
|
| 101 |
+
|
| 102 |
+
<!--
|
| 103 |
+
## Bias, Risks and Limitations
|
| 104 |
+
|
| 105 |
+
*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
|
| 106 |
+
-->
|
| 107 |
+
|
| 108 |
+
<!--
|
| 109 |
+
### Recommendations
|
| 110 |
+
|
| 111 |
+
*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
|
| 112 |
+
-->
|
| 113 |
+
|
| 114 |
+
## Training Details
|
| 115 |
+
|
| 116 |
+
### Training Dataset
|
| 117 |
+
|
| 118 |
+
#### Unnamed Dataset
|
| 119 |
+
|
| 120 |
+
* Size: 2,000 training samples
|
| 121 |
+
* Columns: <code>sentence_0</code>, <code>sentence_1</code>, and <code>label</code>
|
| 122 |
+
* Approximate statistics based on the first 1000 samples:
|
| 123 |
+
| | sentence_0 | sentence_1 | label |
|
| 124 |
+
|:--------|:------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------|:--------------------------------------------------------------|
|
| 125 |
+
| type | string | string | float |
|
| 126 |
+
| details | <ul><li>min: 34 characters</li><li>mean: 60.95 characters</li><li>max: 160 characters</li></ul> | <ul><li>min: 109 characters</li><li>mean: 892.77 characters</li><li>max: 1000 characters</li></ul> | <ul><li>min: 0.0</li><li>mean: 0.5</li><li>max: 1.0</li></ul> |
|
| 127 |
+
* Samples:
|
| 128 |
+
| sentence_0 | sentence_1 | label |
|
| 129 |
+
|:-------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------|
|
| 130 |
+
| <code>(I) Designation The taxpayer shall designate the imputed income limitation of each unit taken into account under such clause?</code> | <code>(I) Designation<br>The taxpayer shall designate the imputed income limitation of each unit taken into<br>account under such clause.<br>(II) Average test<br>The average of the imputed income limitations designated under subclause (I) shall not<br>exceed 60 percent of area median gross income.<br>(III) 10-percent increments<br>The designated imputed income limitation of any unit under subclause (I) shall be 20<br>percent, 30 percent, 40 percent, 50 percent, 60 percent, 70 percent, or 80 percent of area<br>median gross income.<br>Any election under this paragraph, once made, shall be irrevocable. For purposes of this<br>paragraph, any property shall not be treated as failing to be residential rental property merely<br>because part of the building in which such property is located is used for purposes other than<br>residential rental purposes.<br>(2) Rent-restricted units<br>(A) In general<br>For purposes of paragraph (1), a residential unit is rent-restricted if the gross rent with respect</code> | <code>1.0</code> |
|
| 131 |
+
| <code>Summarize the key points from usc26@118-78.pdf.</code> | <code>turn or other schedules. List the type and <br>amount of tax.<br>Other taxes to be listed include the <br>following.<br>Form 8978 adjustment. Complete the <br>Negative Form 8978 Adjustment Work-<br>sheet—Schedule 2 (Line 17z) if you are <br>filing Form 8978 and completed the <br>worksheet in the Schedule 3, line 6l, in-<br>structions and the amount on line 3 of <br>that worksheet is negative.<br>100</code> | <code>0.0</code> |
|
| 132 |
+
| <code>Summarize the key points from usc26@118-78.pdf.</code> | <code>program who are included in a unit of employees covered by an agreement which the Secretary of Labor finds<br>to be a collective bargaining agreement between employee representatives and one or more employers, if there<br>is evidence that educational assistance benefits were the subject of good faith bargaining between such<br>employee representatives and such employer or employers."<br>Pub. L. 99–514, §1114(b)(4), substituted "highly compensated employees (within the meaning of section<br>414(q))" for "officers, owners, or highly compensated,".<br>Subsec. (b)(6). Pub. L. 99–514, §1151(c)(4)(B), struck out par. (6) which read as follows: "<br>.—Reasonable notification of the availability and terms of the program<br>NOTIFICATION OF EMPLOYEES<br>must be provided to eligible employees."<br>Subsec. (d). Pub. L. 99–514, §1162(a)(1), substituted "December 31, 1987" for "December 31, 1985".<br>1984—Subsec. (a). Pub. L. 98–611, §1(b), amended subsec. generally, substituting "Exclusion from gross</code> | <code>1.0</code> |
|
| 133 |
+
* Loss: [<code>BinaryCrossEntropyLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#binarycrossentropyloss) with these parameters:
|
| 134 |
+
```json
|
| 135 |
+
{
|
| 136 |
+
"activation_fn": "torch.nn.modules.linear.Identity",
|
| 137 |
+
"pos_weight": null
|
| 138 |
+
}
|
| 139 |
+
```
|
| 140 |
+
|
| 141 |
+
### Training Hyperparameters
|
| 142 |
+
|
| 143 |
+
#### All Hyperparameters
|
| 144 |
+
<details><summary>Click to expand</summary>
|
| 145 |
+
|
| 146 |
+
- `overwrite_output_dir`: False
|
| 147 |
+
- `do_predict`: False
|
| 148 |
+
- `eval_strategy`: no
|
| 149 |
+
- `prediction_loss_only`: True
|
| 150 |
+
- `per_device_train_batch_size`: 8
|
| 151 |
+
- `per_device_eval_batch_size`: 8
|
| 152 |
+
- `per_gpu_train_batch_size`: None
|
| 153 |
+
- `per_gpu_eval_batch_size`: None
|
| 154 |
+
- `gradient_accumulation_steps`: 1
|
| 155 |
+
- `eval_accumulation_steps`: None
|
| 156 |
+
- `torch_empty_cache_steps`: None
|
| 157 |
+
- `learning_rate`: 5e-05
|
| 158 |
+
- `weight_decay`: 0.0
|
| 159 |
+
- `adam_beta1`: 0.9
|
| 160 |
+
- `adam_beta2`: 0.999
|
| 161 |
+
- `adam_epsilon`: 1e-08
|
| 162 |
+
- `max_grad_norm`: 1
|
| 163 |
+
- `num_train_epochs`: 3
|
| 164 |
+
- `max_steps`: -1
|
| 165 |
+
- `lr_scheduler_type`: linear
|
| 166 |
+
- `lr_scheduler_kwargs`: {}
|
| 167 |
+
- `warmup_ratio`: 0.0
|
| 168 |
+
- `warmup_steps`: 0
|
| 169 |
+
- `log_level`: passive
|
| 170 |
+
- `log_level_replica`: warning
|
| 171 |
+
- `log_on_each_node`: True
|
| 172 |
+
- `logging_nan_inf_filter`: True
|
| 173 |
+
- `save_safetensors`: True
|
| 174 |
+
- `save_on_each_node`: False
|
| 175 |
+
- `save_only_model`: False
|
| 176 |
+
- `restore_callback_states_from_checkpoint`: False
|
| 177 |
+
- `no_cuda`: False
|
| 178 |
+
- `use_cpu`: False
|
| 179 |
+
- `use_mps_device`: False
|
| 180 |
+
- `seed`: 42
|
| 181 |
+
- `data_seed`: None
|
| 182 |
+
- `jit_mode_eval`: False
|
| 183 |
+
- `bf16`: False
|
| 184 |
+
- `fp16`: False
|
| 185 |
+
- `fp16_opt_level`: O1
|
| 186 |
+
- `half_precision_backend`: auto
|
| 187 |
+
- `bf16_full_eval`: False
|
| 188 |
+
- `fp16_full_eval`: False
|
| 189 |
+
- `tf32`: None
|
| 190 |
+
- `local_rank`: 0
|
| 191 |
+
- `ddp_backend`: None
|
| 192 |
+
- `tpu_num_cores`: None
|
| 193 |
+
- `tpu_metrics_debug`: False
|
| 194 |
+
- `debug`: []
|
| 195 |
+
- `dataloader_drop_last`: False
|
| 196 |
+
- `dataloader_num_workers`: 0
|
| 197 |
+
- `dataloader_prefetch_factor`: None
|
| 198 |
+
- `past_index`: -1
|
| 199 |
+
- `disable_tqdm`: False
|
| 200 |
+
- `remove_unused_columns`: True
|
| 201 |
+
- `label_names`: None
|
| 202 |
+
- `load_best_model_at_end`: False
|
| 203 |
+
- `ignore_data_skip`: False
|
| 204 |
+
- `fsdp`: []
|
| 205 |
+
- `fsdp_min_num_params`: 0
|
| 206 |
+
- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
|
| 207 |
+
- `fsdp_transformer_layer_cls_to_wrap`: None
|
| 208 |
+
- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
|
| 209 |
+
- `parallelism_config`: None
|
| 210 |
+
- `deepspeed`: None
|
| 211 |
+
- `label_smoothing_factor`: 0.0
|
| 212 |
+
- `optim`: adamw_torch_fused
|
| 213 |
+
- `optim_args`: None
|
| 214 |
+
- `adafactor`: False
|
| 215 |
+
- `group_by_length`: False
|
| 216 |
+
- `length_column_name`: length
|
| 217 |
+
- `project`: huggingface
|
| 218 |
+
- `trackio_space_id`: trackio
|
| 219 |
+
- `ddp_find_unused_parameters`: None
|
| 220 |
+
- `ddp_bucket_cap_mb`: None
|
| 221 |
+
- `ddp_broadcast_buffers`: False
|
| 222 |
+
- `dataloader_pin_memory`: True
|
| 223 |
+
- `dataloader_persistent_workers`: False
|
| 224 |
+
- `skip_memory_metrics`: True
|
| 225 |
+
- `use_legacy_prediction_loop`: False
|
| 226 |
+
- `push_to_hub`: False
|
| 227 |
+
- `resume_from_checkpoint`: None
|
| 228 |
+
- `hub_model_id`: None
|
| 229 |
+
- `hub_strategy`: every_save
|
| 230 |
+
- `hub_private_repo`: None
|
| 231 |
+
- `hub_always_push`: False
|
| 232 |
+
- `hub_revision`: None
|
| 233 |
+
- `gradient_checkpointing`: False
|
| 234 |
+
- `gradient_checkpointing_kwargs`: None
|
| 235 |
+
- `include_inputs_for_metrics`: False
|
| 236 |
+
- `include_for_metrics`: []
|
| 237 |
+
- `eval_do_concat_batches`: True
|
| 238 |
+
- `fp16_backend`: auto
|
| 239 |
+
- `push_to_hub_model_id`: None
|
| 240 |
+
- `push_to_hub_organization`: None
|
| 241 |
+
- `mp_parameters`:
|
| 242 |
+
- `auto_find_batch_size`: False
|
| 243 |
+
- `full_determinism`: False
|
| 244 |
+
- `torchdynamo`: None
|
| 245 |
+
- `ray_scope`: last
|
| 246 |
+
- `ddp_timeout`: 1800
|
| 247 |
+
- `torch_compile`: False
|
| 248 |
+
- `torch_compile_backend`: None
|
| 249 |
+
- `torch_compile_mode`: None
|
| 250 |
+
- `include_tokens_per_second`: False
|
| 251 |
+
- `include_num_input_tokens_seen`: no
|
| 252 |
+
- `neftune_noise_alpha`: None
|
| 253 |
+
- `optim_target_modules`: None
|
| 254 |
+
- `batch_eval_metrics`: False
|
| 255 |
+
- `eval_on_start`: False
|
| 256 |
+
- `use_liger_kernel`: False
|
| 257 |
+
- `liger_kernel_config`: None
|
| 258 |
+
- `eval_use_gather_object`: False
|
| 259 |
+
- `average_tokens_across_devices`: True
|
| 260 |
+
- `prompts`: None
|
| 261 |
+
- `batch_sampler`: batch_sampler
|
| 262 |
+
- `multi_dataset_batch_sampler`: proportional
|
| 263 |
+
- `router_mapping`: {}
|
| 264 |
+
- `learning_rate_mapping`: {}
|
| 265 |
+
|
| 266 |
+
</details>
|
| 267 |
+
|
| 268 |
+
### Training Logs
|
| 269 |
+
| Epoch | Step | Training Loss |
|
| 270 |
+
|:-----:|:----:|:-------------:|
|
| 271 |
+
| 2.0 | 500 | 0.2146 |
|
| 272 |
+
|
| 273 |
+
|
| 274 |
+
### Framework Versions
|
| 275 |
+
- Python: 3.12.12
|
| 276 |
+
- Sentence Transformers: 5.1.2
|
| 277 |
+
- Transformers: 4.57.3
|
| 278 |
+
- PyTorch: 2.9.0+cu126
|
| 279 |
+
- Accelerate: 1.12.0
|
| 280 |
+
- Datasets: 4.0.0
|
| 281 |
+
- Tokenizers: 0.22.1
|
| 282 |
+
|
| 283 |
+
## Citation
|
| 284 |
+
|
| 285 |
+
### BibTeX
|
| 286 |
+
|
| 287 |
+
#### Sentence Transformers
|
| 288 |
+
```bibtex
|
| 289 |
+
@inproceedings{reimers-2019-sentence-bert,
|
| 290 |
+
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
|
| 291 |
+
author = "Reimers, Nils and Gurevych, Iryna",
|
| 292 |
+
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
|
| 293 |
+
month = "11",
|
| 294 |
+
year = "2019",
|
| 295 |
+
publisher = "Association for Computational Linguistics",
|
| 296 |
+
url = "https://arxiv.org/abs/1908.10084",
|
| 297 |
+
}
|
| 298 |
+
```
|
| 299 |
+
|
| 300 |
+
<!--
|
| 301 |
+
## Glossary
|
| 302 |
+
|
| 303 |
+
*Clearly define terms in order to be accessible across audiences.*
|
| 304 |
+
-->
|
| 305 |
+
|
| 306 |
+
<!--
|
| 307 |
+
## Model Card Authors
|
| 308 |
+
|
| 309 |
+
*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
|
| 310 |
+
-->
|
| 311 |
+
|
| 312 |
+
<!--
|
| 313 |
+
## Model Card Contact
|
| 314 |
+
|
| 315 |
+
*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
|
| 316 |
+
-->
|
config.json
ADDED
|
@@ -0,0 +1,35 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"architectures": [
|
| 3 |
+
"BertForSequenceClassification"
|
| 4 |
+
],
|
| 5 |
+
"attention_probs_dropout_prob": 0.1,
|
| 6 |
+
"classifier_dropout": null,
|
| 7 |
+
"dtype": "float32",
|
| 8 |
+
"gradient_checkpointing": false,
|
| 9 |
+
"hidden_act": "gelu",
|
| 10 |
+
"hidden_dropout_prob": 0.1,
|
| 11 |
+
"hidden_size": 384,
|
| 12 |
+
"id2label": {
|
| 13 |
+
"0": "LABEL_0"
|
| 14 |
+
},
|
| 15 |
+
"initializer_range": 0.02,
|
| 16 |
+
"intermediate_size": 1536,
|
| 17 |
+
"label2id": {
|
| 18 |
+
"LABEL_0": 0
|
| 19 |
+
},
|
| 20 |
+
"layer_norm_eps": 1e-12,
|
| 21 |
+
"max_position_embeddings": 512,
|
| 22 |
+
"model_type": "bert",
|
| 23 |
+
"num_attention_heads": 12,
|
| 24 |
+
"num_hidden_layers": 6,
|
| 25 |
+
"pad_token_id": 0,
|
| 26 |
+
"position_embedding_type": "absolute",
|
| 27 |
+
"sentence_transformers": {
|
| 28 |
+
"activation_fn": "torch.nn.modules.linear.Identity",
|
| 29 |
+
"version": "5.1.2"
|
| 30 |
+
},
|
| 31 |
+
"transformers_version": "4.57.3",
|
| 32 |
+
"type_vocab_size": 2,
|
| 33 |
+
"use_cache": true,
|
| 34 |
+
"vocab_size": 30522
|
| 35 |
+
}
|
model.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:4ee5fb8931cb42590aab1cedb07759a5921f2f484cedfe67115ba9c76f1e647f
|
| 3 |
+
size 90866412
|
special_tokens_map.json
ADDED
|
@@ -0,0 +1,37 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"cls_token": {
|
| 3 |
+
"content": "[CLS]",
|
| 4 |
+
"lstrip": false,
|
| 5 |
+
"normalized": false,
|
| 6 |
+
"rstrip": false,
|
| 7 |
+
"single_word": false
|
| 8 |
+
},
|
| 9 |
+
"mask_token": {
|
| 10 |
+
"content": "[MASK]",
|
| 11 |
+
"lstrip": false,
|
| 12 |
+
"normalized": false,
|
| 13 |
+
"rstrip": false,
|
| 14 |
+
"single_word": false
|
| 15 |
+
},
|
| 16 |
+
"pad_token": {
|
| 17 |
+
"content": "[PAD]",
|
| 18 |
+
"lstrip": false,
|
| 19 |
+
"normalized": false,
|
| 20 |
+
"rstrip": false,
|
| 21 |
+
"single_word": false
|
| 22 |
+
},
|
| 23 |
+
"sep_token": {
|
| 24 |
+
"content": "[SEP]",
|
| 25 |
+
"lstrip": false,
|
| 26 |
+
"normalized": false,
|
| 27 |
+
"rstrip": false,
|
| 28 |
+
"single_word": false
|
| 29 |
+
},
|
| 30 |
+
"unk_token": {
|
| 31 |
+
"content": "[UNK]",
|
| 32 |
+
"lstrip": false,
|
| 33 |
+
"normalized": false,
|
| 34 |
+
"rstrip": false,
|
| 35 |
+
"single_word": false
|
| 36 |
+
}
|
| 37 |
+
}
|
tokenizer.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
tokenizer_config.json
ADDED
|
@@ -0,0 +1,59 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"added_tokens_decoder": {
|
| 3 |
+
"0": {
|
| 4 |
+
"content": "[PAD]",
|
| 5 |
+
"lstrip": false,
|
| 6 |
+
"normalized": false,
|
| 7 |
+
"rstrip": false,
|
| 8 |
+
"single_word": false,
|
| 9 |
+
"special": true
|
| 10 |
+
},
|
| 11 |
+
"100": {
|
| 12 |
+
"content": "[UNK]",
|
| 13 |
+
"lstrip": false,
|
| 14 |
+
"normalized": false,
|
| 15 |
+
"rstrip": false,
|
| 16 |
+
"single_word": false,
|
| 17 |
+
"special": true
|
| 18 |
+
},
|
| 19 |
+
"101": {
|
| 20 |
+
"content": "[CLS]",
|
| 21 |
+
"lstrip": false,
|
| 22 |
+
"normalized": false,
|
| 23 |
+
"rstrip": false,
|
| 24 |
+
"single_word": false,
|
| 25 |
+
"special": true
|
| 26 |
+
},
|
| 27 |
+
"102": {
|
| 28 |
+
"content": "[SEP]",
|
| 29 |
+
"lstrip": false,
|
| 30 |
+
"normalized": false,
|
| 31 |
+
"rstrip": false,
|
| 32 |
+
"single_word": false,
|
| 33 |
+
"special": true
|
| 34 |
+
},
|
| 35 |
+
"103": {
|
| 36 |
+
"content": "[MASK]",
|
| 37 |
+
"lstrip": false,
|
| 38 |
+
"normalized": false,
|
| 39 |
+
"rstrip": false,
|
| 40 |
+
"single_word": false,
|
| 41 |
+
"special": true
|
| 42 |
+
}
|
| 43 |
+
},
|
| 44 |
+
"clean_up_tokenization_spaces": true,
|
| 45 |
+
"cls_token": "[CLS]",
|
| 46 |
+
"do_basic_tokenize": true,
|
| 47 |
+
"do_lower_case": true,
|
| 48 |
+
"extra_special_tokens": {},
|
| 49 |
+
"mask_token": "[MASK]",
|
| 50 |
+
"model_max_length": 512,
|
| 51 |
+
"never_split": null,
|
| 52 |
+
"pad_token": "[PAD]",
|
| 53 |
+
"sep_token": "[SEP]",
|
| 54 |
+
"strip_accents": null,
|
| 55 |
+
"tokenize_chinese_chars": true,
|
| 56 |
+
"tokenizer_class": "BertTokenizer",
|
| 57 |
+
"truncation": true,
|
| 58 |
+
"unk_token": "[UNK]"
|
| 59 |
+
}
|
training_config.json
ADDED
|
@@ -0,0 +1,10 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"base_model": "cross-encoder/ms-marco-MiniLM-L-6-v2",
|
| 3 |
+
"max_length": 512,
|
| 4 |
+
"training_samples": 2000,
|
| 5 |
+
"epochs": 3,
|
| 6 |
+
"learning_rate": 2e-05,
|
| 7 |
+
"warmup_steps": 37,
|
| 8 |
+
"original_pairs": 2000,
|
| 9 |
+
"enhanced_pairs": 11
|
| 10 |
+
}
|
vocab.txt
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|