Instructions to use sujet-ai/Fin-ModernBERT-RAG-embed-base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use sujet-ai/Fin-ModernBERT-RAG-embed-base with sentence-transformers:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("sujet-ai/Fin-ModernBERT-RAG-embed-base")

sentences = [
    "How can the company assess the financial viability of its gaming division in relation to its overall business strategy?",
    "In the event of a partial or total liquidation of the Partnership or in the event there were insufficient Partnership assets to satisfy the claims of its general creditors , the limited partners may not be entitled to receive their entire Capital Contribut ion amounts back. Limited partner capital ac counts are not guaranteed. However, as a class, the limit ed partners would be entitled to receive the return of their aggregate Capital Contri butions before the return of any capital contributions to the subordinated limited partners or the general partners. If the Partnership experiences losses in any year but liquidation procedures described above are not undertaken and the Partne rship continues, the amounts of such losses would be absorbed in the capital accounts of the partners as described in the Partnership Agreement, and each limited partner in any event remains entitled to receive the 7½% Payments under t he terms of the Partnership Agreement. However, as there would be no accumulated profits in such a year, limited partner s would not receive any sums representing participation in net income of the Partnership. In addition, although the amount of the 7½% Payments to limited partners are charged as an expense to the Partnership and are pay able whether or not the Partnership ear ns any accumulated profits during any given period, no reserve fund has been set aside to enable the Partnership to make such payments. Therefore, such payments to the limited partners are subject to the Partnership’s ability to service the 7½% Payment, of which there is no assurance.",
    "10. Compliance of Award Agreement and Plan with Section 409A . The provisions of this Paragraph 10 apply to you only if you are a U.S. taxpayer. (a) This Award Agreement and the Plan provisions that apply to this Award are intended and will be construed to comply with Section 409A (including the requirements applicable to, or the conditions for exemption from treatment as, 409A Deferred Compensation), whether by reason of short-term deferral treatment or other exceptions or provisions. The Committee will have full authority to give effect to this intent. To the extent necessary to give effect to this intent, in the case of any conflict or potential inconsistency between the provisions of the Plan (including Sections 1.3.2 and 2.1 thereof) and this Award Agreement, the provisions of this Award Agreement will govern, and in the case of any conflict or potential inconsistency between this Paragraph 10 and the other provisions of this Award Agreement, this Paragraph 10 will govern. (b) Delivery of RSU Shares will not be delayed beyond the date on which all applicable conditions or restrictions on delivery of RSU Shares required by this Agreement (including those specified in Paragraphs 4, 6(b) and 7 and the consents and other items specified in Section 3.3 of the Plan) are satisfied, and will occur by December 31 of the calendar year in which the Delivery Date occurs unless, in order to permit such conditions or restrictions to be satisfied, the Committee elects, pursuant to Reg. 1.409A-1(b)(4)(i)(D) or otherwise as may be permitted in accordance with Section 409A, to delay delivery of RSU Shares to a later date as may be permitted under Section 409A, including Reg. 1.409A-3(d). For the avoidance of doubt, if the Award includes a “series of installment payments” as described in Reg. 1.409A-2(b)(2)(iii), your right to the series of installment payments will be treated as a right to a series of separate payments and not as a right to a single payment. (c) Notwithstanding the provisions of Paragraph 7(b) and Section 1.3.2(i) of the Plan, to the extent necessary to comply with Section 409A, any securities, other Awards or other property that the Firm may deliver in respect of your RSUs will not have the effect of deferring delivery or payment, income inclusion, or a substantial risk of forfeiture, beyond the date on which such delivery, payment or inclusion would occur or such risk of forfeiture would lapse, with respect to the RSU Shares that would otherwise have been deliverable (unless the Committee elects a later date for this purpose pursuant to Reg. 1.409A-1(b)(4)(i)(D) or otherwise as may be permitted under Section 409A, including and to the extent applicable, the subsequent election provisions of Section 409A(a)(4)(C) of the Code and Reg. 1.409A-2(b)). (d) Notwithstanding the timing provisions of Paragraph 6(b), the delivery of RSU Shares referred to therein will be made after the date of death and during the calendar year that includes the date of death (or on such later date as may be permitted under Section 409A). (e) Notwithstanding any provision of Paragraph 5 or Section 2.8.2 of the Plan to the contrary, the Dividend Equivalent Rights with respect to each of your Outstanding RSUs will be paid to you within the calendar year that includes the date of distribution of any corresponding regular cash dividends paid by GS Inc. in respect of a share of Common Stock the record date for which occurs on or after the Date of Grant. The payment will be in an amount (less applicable withholding) equal to such regular dividend payment as would have been made in respect of the RSU Shares underlying such Outstanding RSUs. (f) The timing of delivery or payment referred to in Paragraph 6(a)(i) will be the earlier of (i) the Delivery Date or (ii) within the calendar year in which the Committee receives satisfactory documentation relating to your Conflicted Employment, provided that such delivery or payment will be made, and any Committee action referred to in Paragraph 6(a)(i) will be taken, only at such time as, and if and to the extent that it, as reasonably determined by the Firm, would not result in the imposition of any additional tax to you under Section 409A.",
    "PART I Item 1 15 OPERATIONS We have regional operations service centers that support our operations, including customer contract and order processing, billing, credit and collections, information processing, and vendor management and logistics. The center in Ireland supports the African, Asia -Pacific, European, and Middle East regions ; and the centers in Arlington, Virginia, Atlanta, Georgia , Charlotte, North Carolina, Fargo, North Dakota, Fort Lauderdale, Florida, Redmond, Washington, Reno, Nevada , and Puerto Rico support the America n region s. In addition to our operations centers, we also operate datacenters throughout each of these regions . We continue to identify and evaluate opportunities to expand our datacenter locations and increase our server capacity to me et the evolving needs of our customers, particularly given the growing demand for AI services . Our datacenters depend on the availability of permitted and buildable land, predictable energy, networking supplies, and servers, including graphics processing units (“ GPUs ”) and other components. Our devices are primarily manufactured by third -party contract manufacturers. For the majority of our products, we have the ability to use other manufacturers if a current vendor becomes unavailable or unable to meet our requirements. However, some of our products contain certain components for which there are very few qualified suppliers. Extended disruptions at these suppliers could impact our ability to manufacture devices on time to meet consumer demand. RESEARCH AND DEVELOPMENT Product and Service Development, and Intellectual Property We develop most of our products and services internally through the following engineering groups. • Cloud and AI – focuses on making IT professionals, developers, partners, independent software vendors, and their systems more productive and efficient through development of Azure AI platform and cloud infrastructure, server, database, CRM, ERP, software development tools and services (including GitHub), AI cognitive services, and other business process applications and services for enterprises. • Strategic Missions and Technologies – focuses on incubating technical products and support solutions with transformative potential for the future of cloud computing and continued company growth across quantum computing, Azure Space & Missions Engineering, telecommunications, and Microsoft F ederal Sales and Delivery. • Experiences and Devices – focuses on delivering high value end -user experiences across our products, services, and devices, including Microsoft 365, Windows, Microsoft Teams, Search (including Microsoft Edge and Bing Chat) and other advertising -based services, and the Surface line of devices. • Microsoft Security – focuses on delivering a comprehensive portfolio of services that protect our customers’ digital infrastructure through cloud platform and application security, data protection and governance, identity and network access, and device management . • Technology and Research – focuses on fundamental research, product and business incubations , and forward -looking AI innovations that span infrastructure, services, and applications. • LinkedIn – focuses on our services that transform the way professionals grow their network and find jobs and the way businesses hire, market, sell, and learn. • Gaming – focuses on developing hardware, content, and services across a large range of platforms to help grow our user base through game experiences and social interaction. Internal development allows us to maintain competitive advantages that come from product differentiation and closer technical control over our products and services. It also gives us the freedom to decide which modifications and enhancements are most impor tant and when they should be implemented. We strive to obtain information as early as possible about changing usage patterns and hardware advances that may affect software and hardware design. Before releasing new software platforms, and as we make signifi cant modifications to existing platforms, we provide application vendors with a range of resources and guidelines for development, training, and testing. Generally, we also create product documentation internally. We protect our intellectual property investments in a variety of ways. We work actively in the U.S. and internationally to ensure the enforcement of copyright, trademark, trade secret, and other protections that apply to our software and hardware products, services, business plans, and branding. We are a leader among technology companies in pursuing patents and currently have a portfolio of over 70,000 U.S. and international patents issued and over 19,000 pending"
]
embeddings = model.encode(sentences)

similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [4, 4]

Notebooks
Google Colab
Kaggle

SentenceTransformer based on nomic-ai/modernbert-embed-base

This is a sentence-transformers model finetuned from nomic-ai/modernbert-embed-base on the sujet-financial-rag-en-dataset dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: nomic-ai/modernbert-embed-base
Maximum Sequence Length: 8192 tokens
Output Dimensionality: 768 dimensions
Similarity Function: Cosine Similarity
Training Dataset:
- sujet-financial-rag-en-dataset
Language: en

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: ModernBertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sujet-ai/Fin-ModernBERT-RAG-base")
# Run inference
sentences = [
    'How does the diversification of investments across different currencies impact financial risk?',
    '20/9/2023 4,504 0.00% GBP 305,720 USD (385,212) JPMorgan Chase Bank 20/9/2023 3,544 0.00% EUR 602,840 USD (659,854) State Street Bank & Trust Co. 20/9/2023 435 0.00% JPY 67,590,000 USD (473,571) JPMorgan Chase Bank 20/9/2023 (176) (0.00%) GBP 378,925 USD (483,052) State Street Bank & Trust Co. 20/9/2023 (1,208) (0.00%) GBP 382,825 USD (488,055) BNP Paribas 20/9/2023 (1,251) (0.00%) EUR 480,370 USD (528,752) State Street Bank & Trust Co. 20/9/2023 (2,604) (0.00%) JPY 68,925,000 USD (489,188) State Street Bank & Trust Co. 20/9/2023 (6,443) (0.00%) JPY 43,800,000 USD (319,166) JPMorgan Chase Bank 20/9/2023 (12,395) (0.00%) JPY 91,700,000 USD (657,807) JPMorgan Chase Bank 20/9/2023 (15,547) (0.00%) JPY 639,066,394 USD (4,648,059) JPMorgan Chase Bank 20/9/2023 (172,087) (0.00%) Total OTC Financial Derivative Instruments 545,977 0.00% Total Investments 17,991,067,179 98.73% Fair Value US Dollars ($)% of Total Net Assets Other Assets and Liabilities 232,296,305 1.27% Net Assets 18,223,363,484 100.00%',
    'In addition, the restriction on liens in the GSFC 2008 Indenture applies only to liens that secure debt for borrowed money. For example, liens imposed by operation of law, such as liens to secure statutory obligations for taxes or workers’ compensation benefits, or liens the Company creates to secure obligations to pay legal judgments or surety bonds, would not be covered by this restriction. Modification of the Debt Indenture and Waiver of Covenants There are four types of changes GSFC and the Company can make to the GSFC 2008 Indenture and the debt securities or series of debt securities and related guarantees issued under the GSFC 2008 Indenture. Changes Requiring Each Holder’s Approval First, there are changes that cannot be made without the approval of the holder of each debt security affected by the change under the GSFC 2008 Indenture. Here is a list of those types of changes: • change the stated maturity for any principal or interest payment on a debt security; • reduce the principal amount, the amount payable on acceleration of the stated maturity after a default, the interest rate or the redemption price for a debt security; • permit redemption of a debt security if not previously permitted; • impair any right a holder may have to require repayment of its debt security; • change the currency of any payment on a debt security; • change the place of payment on a debt security; • impair a holder’s right to sue for payment of any amount due on its debt security; • reduce the percentage in principal amount of the debt securities of any one or more affected series, taken • separately or together, as applicable, and whether comprising the same or different series or less than all of the debt securities of a series, the approval of whose holders is needed to change the applicable debt indenture or those debt securities; • reduce the percentage in principal amount of the debt securities of any one or more affected series, taken separately or together, as applicable, and whether comprising the same or different series or less than all of the debt securities of a series, the consent of whose holders is needed to waive GSFC’s compliance with the applicable debt indenture or to waive defaults; and • change the provisions of the applicable debt indenture dealing with modification and waiver in any other respect, except to increase any required percentage referred to above or to add to -59-',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Dataset: ModernFinBERT-RAG-embed-base
Evaluated with InformationRetrievalEvaluator

Metric	Value
cosine_accuracy@1	0.3813
cosine_accuracy@3	0.6329
cosine_accuracy@5	0.7124
cosine_accuracy@10	0.7919
cosine_precision@1	0.3813
cosine_precision@3	0.211
cosine_precision@5	0.1425
cosine_precision@10	0.0792
cosine_recall@1	0.3813
cosine_recall@3	0.6329
cosine_recall@5	0.7124
cosine_recall@10	0.7919
cosine_ndcg@10	0.5892
cosine_mrr@10	0.5239
cosine_map@100	0.5298

Training Details

Training Dataset

sujet-financial-rag-en-dataset

Dataset: sujet-financial-rag-en-dataset at ec52315
Size: 104,601 training samples
Columns: anchor and positive
Approximate statistics based on the first 1000 samples:
anchor positive
type string string
details
min: 13 tokens
mean: 24.56 tokens
max: 50 tokens

min: 23 tokens
mean: 647.39 tokens
max: 1165 tokens

	anchor	positive
type	string	string
details	min: 13 tokens mean: 24.56 tokens max: 50 tokens	min: 23 tokens mean: 647.39 tokens max: 1165 tokens

Samples:

anchor	positive
`How does the Compensation Committee's role influence the stock awards granted to executive officers?`	PART II Item 8 88 Stock Plans Stock awards entitle the holder to receive shares of Microsoft common stock as the award vests. Stock awards generally vest over a service period of four years or five years. Executive Incentive Plan Under the Executive Incentive Plan, the Compensation Committee approves stock awards to executive officers and certain senior executives. RSUs generally vest ratably over a service period of four years. PSUs generally vest over a performance period of thre e years. The number of shares the PSU holder receives is based on the extent to which the corresponding performance goals have been achieved. Activity for All Stock Plans The fair value of stock awards was estimated on the date of grant using the following assumptions: Year ended June 30, 2023 2022 2021 Dividends per share (quarterly amounts) $ 0.62 – 0.68 $ 0.56 – 0.62 $ 0.51 – 0.56 Interest rates 2.0% – 5.4% 0.03% – 3.6% 0.01% – 1.5% During fiscal year 2023 , the following activity occurred under our stock...
`What is the fair value of the bond issued by CVS Health Corp., and how does it compare to the fair value of the bond issued by Walt Disney Co.?`	445 Vanguard ESG Global Corporate Bond UCITS ETF Principal CouponMaturity DateFair Value US Dollars ($)% of Total Net Assets State Street Corp. $50,000 4.82% 26/1/2034 48,557 0.01% Baxalta, Inc. $50,000 4.00% 23/6/2025 48,515 0.01% Starbucks Corp. $50,000 3.80% 15/8/2025 48,426 0.01% Citigroup, Inc. $50,000 4.60% 9/3/2026 48,387 0.01% Athene Global Funding CAD70,000 2.10% 24/9/2025 48,344 0.01% Bank of America Corp. $50,000 4.25% 22/10/2026 48,257 0.01% PepsiCo, Inc. $50,000 3.60% 18/2/2028 48,191 0.01% Charles Schwab Corp. $50,000 3.85% 21/5/2025 48,183 0.01% JPMorgan Chase & Co. $50,000 4.13% 15/12/2026 48,165 0.01% Charter Communications Operating LLC/Charter Communications Operating Capital $60,000 5.50% 1/4/2063 48,151 0.01% US Bancorp $60,000 2.68% 27/1/2033 48,106 0.01% Chubb INA Holdings, Inc. $50,000 3.35% 3/5/2026 48,074 0.01% Bank of New York Mellon Corp. $50,000 3.00% 24/2/2025 48,071 0.01% Truist Financial Corp. $50,000 4.87% 26/1/2029 48,042 0.01% Truist Financial Corp. $...
`Analyze the impact of currency fluctuations on the unrealized gains and losses reported in the forward currency exchange contracts.`	15,216 141,230 0.01% Samsung Fire & Marine Insurance Co., Ltd. - Preference Shares 1,056 137,365 0.01% Samsung SDI Co., Ltd. - Preference Shares 546 133,014 0.01% NHN Corp. 7,096 132,480 0.01% Hanwha Corp. - Preference Shares 10,137 114,475 0.01% Amorepacific Corp. - Preference Shares 4,230 101,123 0.01% CJ CheilJedang Corp. - Preference Shares 576 59,276 0.00% Hanwha Galleria Corp. 47,521 54,711 0.00% - - 386,394,890 29.25% Total Equities 1,291,387,033 97.75% Total Transferable Securities 1,291,387,033 97.75% Number of Contracts Long/ (Short)Notional Amount Unrealised Gain/(Loss) US Dollar s ($)% of Total Net Assets Financial Derivative Instruments Dealt in on a Regulated Market (0.02%) (30 June 2022: (0.00%)) Futures (0.02%) (30 June 2022: (0.00%)) MSCI Pacific Ex-Japan Index September 2023 283 $20,595,251 (131,521) (0.01%) KOSPI 200 Index September 2023 138 KRW11,933,318,478 (141,212) (0.01%) Total Financial Derivative Instruments Dealt in on a Regulated Market (272,733) (0.02%) OTC...

Loss: MultipleNegativesRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim"
}

Evaluation Dataset

sujet-financial-rag-en-dataset

Dataset: sujet-financial-rag-en-dataset at ec52315
Size: 1,057 evaluation samples
Columns: anchor and positive
Approximate statistics based on the first 1000 samples:
anchor positive
type string string
details
min: 13 tokens
mean: 24.64 tokens
max: 52 tokens

min: 26 tokens
mean: 647.51 tokens
max: 1081 tokens

	anchor	positive
type	string	string
details	min: 13 tokens mean: 24.64 tokens max: 52 tokens	min: 26 tokens mean: 647.51 tokens max: 1081 tokens

Samples:

anchor	positive
`What was the net asset value per share for the EUR Distributing class as of 30 June 2022?`	The accompanying notes form an integral part of the financial statements.559 Vanguard EUR Eurozone Government Bond UCITS ETFStatement of Assets and Liabilities EUR (€) EUR (€) As at 30 June As at 30 June Note 2023 2022 Current Assets Financial Assets at Fair Value Through Profit or Loss: Transferable Securities 3,17 1,719,130,585 1,249,469,080 Financial Derivative Instruments 3,17 — 23,742 Cash 3 11,990,422 14,558,520 Receivables: Interest and Dividends 12,715,254 5,193,434 Capital Shares Issued 27 9,190,562 Investments Sold 6,621,764 499,630 Margin Cash Due from Broker 3 3 56,198 Total Current Assets 1,750,458,055 1,278,991,166 Current Liabilities Financial Liabilities at Fair Value Through Profit or Loss: Financial Derivative Instruments 3,17 — 17,321 Bank Overdraft — 6,668 Payables and Other Liabilities: Capital Shares Redeemed 5,790,847 6,811,068 Investments Purchased 8,942,689 15,381,189 Management Fees Payable 12 99,689 69,769 Total Current Liabilities 14,833,225 22,286,015 Net A...
`What factors could lead the Committee to determine that an employee's actions have resulted in a "material adverse impact" on the broader financial system?`	Definitions Appendix The following capitalized terms are used in this Award Agreement with the following meanings: (a)“409A Deferred Compensation ” means a “deferral of compensation” or “deferred compensation” as those terms are defined in the regulations under Section 409A. (b)“Conflicted Employment ” means your employment at any U.S. Federal, state or local government, any non-U.S. government, any supranational or international organization, any self- regulatory organization, or any agency or instrumentality of any such government or organization, or any other employer (other than an “Accounting Firm” within the meaning of SEC Rule 2-01(f)(2) of Regulation S-X or any successor thereto) determined by the Committee, if, as a result of such employment, your continued holding of any Outstanding Short-Term RSUs would result in an actual or perceived conflict of interest. (c)“Failed to Consider Risk ” means that you participated (or otherwise oversaw or were responsible for, depending on t...
`What financial implications could arise from a decrease in the pool of qualified drivers for a ridesharing platform?`	In addition, changes in certain laws and regulations, including immigration, labor and employment laws or background check requirements, may result in a shift or decrease in the pool of qualified drivers, which may result in increased competition for qualified drivers or higher costs of recruitment, operation and retention. As part of our business operations or research and development efforts, data on the vehicle may be collected and drivers may be uncomfortable or unwilling to drive knowing that data is being collected. Other factors outside of our control, such as concerns about personal health and safety, increases in the price of gasoline, vehicles or insurance, or concerns about the availability of government or other assistance programs if drivers continue to drive on our platform, may also reduce the number of drivers on our platform or their utilization of our platform, or impact our ability to onboard new drivers. If we fail to attract qualified drivers on favorable terms, fa...

Loss: MultipleNegativesRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim"
}

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: steps
per_device_train_batch_size: 64
per_device_eval_batch_size: 64
gradient_accumulation_steps: 8
learning_rate: 0.0002
num_train_epochs: 2
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: True
tf32: True
load_best_model_at_end: True
optim: adamw_torch_fused
batch_sampler: no_duplicates

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 64
per_device_eval_batch_size: 64
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 8
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 0.0002
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 2
max_steps: -1
lr_scheduler_type: cosine
lr_scheduler_kwargs: {}
warmup_ratio: 0.1
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: True
fp16: False
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: True
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: True
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch_fused
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
include_for_metrics: []
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
dispatch_batches: None
split_batches: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
eval_use_gather_object: False
average_tokens_across_devices: False
prompts: None
batch_sampler: no_duplicates
multi_dataset_batch_sampler: proportional

Training Logs

Epoch	Step	Training Loss	Validation Loss	ModernFinBERT-RAG-embed-base_cosine_ndcg@10
0	0	-	-	0.2812
0.0489	10	1.8949	-	-
0.0979	20	1.0738	-	-
0.1468	30	0.9147	-	-
0.1957	40	0.8194	-	-
0.2446	50	0.7847	-	-
0.2936	60	0.7428	-	-
0.3425	70	0.7587	-	-
0.3914	80	0.7769	-	-
0.4404	90	0.7319	-	-
0.4893	100	0.7199	0.7262	0.5395
0.5382	110	0.7085	-	-
0.5872	120	0.6726	-	-
0.6361	130	0.6954	-	-
0.6850	140	0.65	-	-
0.7339	150	0.6207	-	-
0.7829	160	0.6518	-	-
0.8318	170	0.6227	-	-
0.8807	180	0.6285	-	-
0.9297	190	0.6235	-	-
0.9786	200	0.6183	0.6158	0.5546
1.0294	210	0.6036	-	-
1.0783	220	0.5818	-	-
1.1272	230	0.5445	-	-
1.1761	240	0.5115	-	-
1.2251	250	0.4712	-	-
1.2740	260	0.449	-	-
1.3229	270	0.4457	-	-
1.3719	280	0.4763	-	-
1.4208	290	0.449	-	-
1.4697	300	0.4352	0.5674	0.5797
1.5187	310	0.4173	-	-
1.5676	320	0.4198	-	-
1.6165	330	0.3901	-	-
1.6654	340	0.4066	-	-
1.7144	350	0.3802	-	-
1.7633	360	0.3712	-	-
1.8122	370	0.3983	-	-
1.8612	380	0.3886	-	-
1.9101	390	0.4027	-	-
1.959	400	0.398	0.5435	0.5892

The bold row denotes the saved checkpoint.

Framework Versions

Python: 3.10.13
Sentence Transformers: 3.3.1
Transformers: 4.48.0.dev0
PyTorch: 2.5.1+cu124
Accelerate: 1.0.1
Datasets: 3.2.0
Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}