e5-step1 / README.md
suhwan3's picture
Upload fine-tuned model
9208a43 verified
---
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:23003
- loss:TripletLoss
base_model: intfloat/multilingual-e5-large-instruct
widget:
- source_sentence: The Merlyn.AI SectorSurfer Momentum ETF is designed to dynamically
shift its investment strategy based on market conditions, tracking an index that
utilizes an algorithmic Bull/Bear indicator assessing U.S. equity markets for
advancing trends or elevated decline risk using factors like price-trend, momentum,
value sentiment, and volatility. In Bull markets, it targets approximately a 70/30
domestic/foreign aggressive equity allocation by selecting six thematic ETFs (four
sectors, two geopolitical), while in Bear markets, it seeks safety by choosing
at least four momentum-leading bond, treasury, and gold safe-harbor ETFs, explicitly
avoiding inverse and leveraged funds. The index is typically evaluated monthly,
though the indicator can trigger strategy changes anytime during excessive market
volatility. Under normal circumstances, at least 80% of the fund's assets are
invested in the index's component securities; the fund is non-diversified. Please
be aware this fund is closing, with its last day of trading scheduled for November
10, 2023.
sentences:
- The BlackRock Future Climate and Sustainable Economy ETF (BECO) is an actively
managed equity fund focused on the transition to a lower carbon economy and future
climate themes. It seeks a relatively concentrated, non-diversified portfolio
of globally-listed companies of any market capitalization, investing across multiple
subthemes such as sustainable energy, resource efficiency, future transport, sustainable
nutrition, and biodiversity. The fund utilizes proprietary environmental criteria,
including carbon metrics, and aims to align with the Paris Climate Agreement goals
for net-zero emissions by 2050, while excluding certain high-emission industries
and companies violating the UN Global Compact. It also attempts to achieve a better
aggregate environmental and ESG score than its benchmark, the MSCI ACWI Multiple
Industries Select Index. Note that BECO is being delisted, with its last day of
trading on an exchange scheduled for August 12, 2024.
- The Direxion Daily Semiconductor Bull 3X Shares (SOXL) seeks daily investment
results, before fees and expenses, of 300% of the daily performance of the ICE
Semiconductor Index. To achieve this bullish, leveraged exposure, the fund invests
at least 80% of its net assets in financial instruments, such as swap agreements,
securities of the index, and ETFs that track the index. The underlying ICE Semiconductor
Index is a rules-based, modified float-adjusted market capitalization-weighted
index that tracks the performance of the thirty largest U.S. listed semiconductor
companies. As a daily leveraged fund, SOXL rebalances daily, meaning results over
periods longer than one day can differ significantly from 300% of the index's
performance due to the effects of compounding; the fund is also non-diversified.
- The KraneShares Trust ETF seeks investment results corresponding generally to
the price and yield performance of the Solactive Global Luxury Index. Under normal
circumstances, the fund invests at least 80% of its net assets in instruments
in the underlying index or those with similar economic characteristics. This index
is a modified, free float adjusted market capitalization weighted index designed
to measure the equity performance of companies from global luxury-related sectors,
such as travel & leisure, premium ware, and apparel, located in developed markets.
The index selects the top 25 companies based on criteria including size, trading
volume, and country of listing, applying a modified weighting approach where the
top 5 securities receive higher allocations (with the largest capped at 10%) while
others are capped at 4.5%. The index is rebalanced semi-annually. The fund is
non-diversified and while targeting US investments, it maintains at least 40%
of its assets in foreign entities or those with significant business activities
outside the United States.
- source_sentence: The Xtrackers MSCI Emerging Markets Climate Selection ETF seeks
to track an emerging markets index focused on companies meeting specific climate
criteria. Derived from the MSCI ACWI Select Climate 500 methodology, the underlying
index selects eligible emerging market stocks using an optimization process designed
to reduce greenhouse gas emission intensity (targeting 10% revenue-related and
7% financing-related reductions) and increase exposure to companies with SBTi-approved
targets. The strategy also excludes controversial companies and evaluates companies
based on broader ESG considerations. The fund is non-diversified and invests at
least 80% of its assets in the component securities of this climate-focused emerging
markets index.
sentences:
- The First Trust Indxx NextG UCITS ETF seeks investment results that generally
correspond to the price and yield of the Indxx 5G & NextG Thematic Index. This
tiered-weighted index of global mid- and large-cap equities tracks companies dedicating
significant resources to the research, development, and application of fifth generation
(5G) and emerging next generation digital cellular technologies. The fund normally
invests at least 90% of its net assets in the index's securities, which are primarily
drawn from themes including 5G infrastructure and hardware (such as data/cell
tower REITs and equipment manufacturers) and telecommunication service providers
operating relevant cellular and wireless networks.
- The iPath S&P MLP ETN tracks an S&P Dow Jones index designed to provide exposure
to leading partnerships listed on major U.S. exchanges. Comprising master limited
partnerships (MLPs) and similar publicly traded limited liability companies, these
constituents are primarily classified within the GICS Energy Sector and GICS Gas
Utilities Industry.
- The First Trust NASDAQ ABA Community Bank Index Fund (QABA) seeks investment results
corresponding generally to the NASDAQ OMX® ABA Community Bank TM Index, normally
investing at least 90% of its net assets in the index's securities. The index
tracks NASDAQ-listed US banks and thrifts of small, mid, and large capitalization,
designed to capture the community banking industry. Uniquely, it deliberately
excludes the 50 largest banks by asset size, banks with significant international
operations, and those specializing in credit cards, specifically targeting true
community banks and avoiding larger "mega-money centers." The index is market-cap-weighted
and undergoes regular rebalancing and reconstitution, subject to certain issuer
weight caps.
- source_sentence: The VanEck Morningstar Wide Moat ETF (MOAT) seeks to replicate
the performance of the Morningstar® Wide Moat Focus IndexSM by investing at least
80% of its assets in the index's securities. The fund targets US companies that
Morningstar identifies as having sustainable competitive advantages ("wide moat
companies") based on a proprietary methodology considering quantitative and qualitative
factors. Specifically, the index focuses on companies determined to have the highest
fair value among these wide moat firms. MOAT holds a concentrated, equal-weighted
portfolio, which typically involves around 40 names but can hold more, featuring
a staggered rebalance schedule and potential sector biases. The fund is non-diversified
and employs caps on turnover and sector exposure, resulting in a strategy that
can significantly diverge from broader market coverage despite its focus on established
companies with competitive advantages.
sentences:
- The Fidelity MSCI Industrials Index ETF (FIDU) aims to match the performance of
the MSCI USA IMI Industrials 25/25 Index, which represents the broad U.S. industrial
sector using a market-cap-weighted approach with a 25/25 capping methodology.
The fund, launched in October 2013, provides plain-vanilla exposure and invests
at least 80% of its assets in securities found within this index. It uses a representative
sampling strategy rather than replicating the entire index, and the underlying
index is rebalanced quarterly.
- The KraneShares Electric Vehicles and Future Mobility Index ETF (KARS) seeks to
track the price and yield performance of the Bloomberg Electric Vehicles Index
by investing at least 80% of its net assets in corresponding instruments or those
with similar economic characteristics. The underlying index is designed to measure
the equity market performance of globally-listed companies significantly involved
in the production of electric vehicles, components, or other initiatives enhancing
future mobility, including areas like energy storage, autonomous navigation technology,
lithium and copper mining, and hydrogen fuel cells. KARS holds a concentrated
portfolio, typically around 32 companies, weighted by market capitalization subject
to specific position caps, and is reconstituted and rebalanced quarterly.
- The iPath S&P MLP ETN tracks an S&P Dow Jones index designed to provide exposure
to leading partnerships listed on major U.S. exchanges. Comprising master limited
partnerships (MLPs) and similar publicly traded limited liability companies, these
constituents are primarily classified within the GICS Energy Sector and GICS Gas
Utilities Industry.
- source_sentence: The Global X Clean Water ETF (AQWA) seeks to provide exposure to
the global water industry by tracking the Solactive Global Clean Water Industry
Index. The fund invests at least 80% of its assets in securities of this index,
which targets companies deriving a significant portion (at least 50%) of their
revenue from water infrastructure, equipment, and services, including treatment,
purification, conservation, and management. The index selection process uses proprietary
technology like NLP to identify eligible firms, incorporates minimum ESG standards
based on UN Global Compact principles, and includes the 40 highest-ranking companies,
weighted by market capitalization with specific caps. Reconstituted and rebalanced
semi-annually, the fund is considered non-diversified.
sentences:
- The First Trust Nasdaq Transportation ETF aims to track the Nasdaq US Smart Transportation
TM Index, investing at least 90% of its net assets in the index's securities.
This non-diversified fund provides exposure to a concentrated portfolio of approximately
30 highly liquid U.S. transportation companies across various segments such as
delivery, shipping, marine, railroads, trucking, airports, airlines, bridges,
tunnels, and automobiles. The index selects companies based on liquidity and then
ranks and weights them according to factors reflecting growth (price returns),
value (cash flow-to-price), and low volatility, ensuring no single constituent
exceeds 8%. The index undergoes annual reconstitution and quarterly rebalancing.
- The Direxion Daily Healthcare Bull 3X Shares (CURE) is an ETF that seeks daily
investment results, before fees and expenses, of 300% (3X) of the daily performance
of the Health Care Select Sector Index. It invests at least 80% of its net assets
in financial instruments designed to provide this 3X daily leveraged exposure.
The underlying index tracks US listed healthcare companies, including pharmaceuticals,
health care equipment and supplies, providers and services, biotechnology, life
sciences tools, and health care technology, covering major large-cap names. CURE
is non-diversified and intended strictly as a short-term tactical instrument,
as it delivers its stated 3X exposure only for a single day, and returns over
longer periods can significantly differ from three times the index's performance.
- The BlackRock Future Climate and Sustainable Economy ETF (BECO) is an actively
managed equity fund focused on the transition to a lower carbon economy and future
climate themes. It seeks a relatively concentrated, non-diversified portfolio
of globally-listed companies of any market capitalization, investing across multiple
subthemes such as sustainable energy, resource efficiency, future transport, sustainable
nutrition, and biodiversity. The fund utilizes proprietary environmental criteria,
including carbon metrics, and aims to align with the Paris Climate Agreement goals
for net-zero emissions by 2050, while excluding certain high-emission industries
and companies violating the UN Global Compact. It also attempts to achieve a better
aggregate environmental and ESG score than its benchmark, the MSCI ACWI Multiple
Industries Select Index. Note that BECO is being delisted, with its last day of
trading on an exchange scheduled for August 12, 2024.
- source_sentence: The Horizon Kinetics Medical ETF (MEDX) is an actively-managed,
non-diversified fund aiming for long-term capital growth by investing primarily
in global companies (U.S. and foreign) within the medical research, pharmaceuticals,
medical technology, and related industries. The fund typically focuses on companies
generating at least 50% of their revenue from these areas and may include companies
of any market capitalization, with an emphasis on those involved in cancer research
and treatment. Under normal circumstances, at least 80% of assets are invested
in equity securities, convertibles, and warrants of such companies. Portfolio
selection and weighting are based on the adviser's evaluation and discretion.
The fund may also temporarily invest up to 100% in US short-term debt or invest
in non-convertible high-yield bonds.
sentences:
- The Fidelity MSCI Health Care Index ETF (FHLC) seeks to track the performance
of the MSCI USA IMI Health Care 25/50 Index, which represents the broad U.S. health
care sector. The ETF invests at least 80% of its assets in securities included
in this market-cap-weighted index, which captures large, mid, and small-cap companies
across over 10 subsectors. Employing a representative sampling strategy, the fund
aims to correspond to the index's performance. The index incorporates a 25/50
capping methodology, is rebalanced quarterly, and its broad reach offers diversification
across cap sizes and subsectors, potentially reducing concentration in dominant
large pharma names and increasing exposure to areas like drug retailers and insurance.
The fund is classified as non-diversified.
- The SPDR S&P Oil & Gas Equipment & Services ETF (XES) seeks investment results
corresponding generally to the total return performance of the S&P Oil & Gas Equipment
& Services Select Industry Index. This index represents companies in the oil and
gas equipment and services segment of the broad U.S. S&P Total Market Index (S&P
TMI), including those involved in activities like wildcatting, drilling hardware,
and related services. The index utilizes an equal-weighting methodology for its
constituent companies, which are selected based on market capitalization and liquidity
requirements and undergo quarterly rebalancing. The fund itself employs a sampling
strategy, aiming to invest at least 80% of its total assets in the securities
that comprise its benchmark index.
- The VanEck Biotech ETF (BBH) seeks to replicate the performance of the MVIS® US
Listed Biotech 25 Index, which provides exposure to approximately 25 of the largest
or leading U.S.-listed companies in the biotechnology industry. The fund normally
invests at least 80% of its assets in securities comprising this market-cap-weighted
index. The underlying index includes common stocks and depositary receipts of
firms involved in the research, development, production, marketing, and sale of
drugs based on genetic analysis and diagnostic equipment. While focusing on U.S.-listed
companies, it may include foreign firms listed domestically, and medium-capitalization
companies can be included. Reflecting the index's concentration, the fund is non-diversified
and may have a top-heavy portfolio. The index is reviewed semi-annually.
datasets:
- hobbang/stage1-triplet-dataset-selected
pipeline_tag: sentence-similarity
library_name: sentence-transformers
---
# SentenceTransformer based on intfloat/multilingual-e5-large-instruct
This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [intfloat/multilingual-e5-large-instruct](https://huggingface.co/intfloat/multilingual-e5-large-instruct) on the [stage1-triplet-dataset-selected](https://huggingface.co/datasets/hobbang/stage1-triplet-dataset-selected) dataset. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
## Model Details
### Model Description
- **Model Type:** Sentence Transformer
- **Base model:** [intfloat/multilingual-e5-large-instruct](https://huggingface.co/intfloat/multilingual-e5-large-instruct) <!-- at revision 84344a23ee1820ac951bc365f1e91d094a911763 -->
- **Maximum Sequence Length:** 512 tokens
- **Output Dimensionality:** 1024 dimensions
- **Similarity Function:** Cosine Similarity
- **Training Dataset:**
- [stage1-triplet-dataset-selected](https://huggingface.co/datasets/hobbang/stage1-triplet-dataset-selected)
<!-- - **Language:** Unknown -->
<!-- - **License:** Unknown -->
### Model Sources
- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
### Full Model Architecture
```
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: XLMRobertaModel
(1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
```
## Usage
### Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
```bash
pip install -U sentence-transformers
```
Then you can load this model and run inference.
```python
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
"The Horizon Kinetics Medical ETF (MEDX) is an actively-managed, non-diversified fund aiming for long-term capital growth by investing primarily in global companies (U.S. and foreign) within the medical research, pharmaceuticals, medical technology, and related industries. The fund typically focuses on companies generating at least 50% of their revenue from these areas and may include companies of any market capitalization, with an emphasis on those involved in cancer research and treatment. Under normal circumstances, at least 80% of assets are invested in equity securities, convertibles, and warrants of such companies. Portfolio selection and weighting are based on the adviser's evaluation and discretion. The fund may also temporarily invest up to 100% in US short-term debt or invest in non-convertible high-yield bonds.",
"The VanEck Biotech ETF (BBH) seeks to replicate the performance of the MVIS® US Listed Biotech 25 Index, which provides exposure to approximately 25 of the largest or leading U.S.-listed companies in the biotechnology industry. The fund normally invests at least 80% of its assets in securities comprising this market-cap-weighted index. The underlying index includes common stocks and depositary receipts of firms involved in the research, development, production, marketing, and sale of drugs based on genetic analysis and diagnostic equipment. While focusing on U.S.-listed companies, it may include foreign firms listed domestically, and medium-capitalization companies can be included. Reflecting the index's concentration, the fund is non-diversified and may have a top-heavy portfolio. The index is reviewed semi-annually.",
'The SPDR S&P Oil & Gas Equipment & Services ETF (XES) seeks investment results corresponding generally to the total return performance of the S&P Oil & Gas Equipment & Services Select Industry Index. This index represents companies in the oil and gas equipment and services segment of the broad U.S. S&P Total Market Index (S&P TMI), including those involved in activities like wildcatting, drilling hardware, and related services. The index utilizes an equal-weighting methodology for its constituent companies, which are selected based on market capitalization and liquidity requirements and undergo quarterly rebalancing. The fund itself employs a sampling strategy, aiming to invest at least 80% of its total assets in the securities that comprise its benchmark index.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
```
<!--
### Direct Usage (Transformers)
<details><summary>Click to see the direct usage in Transformers</summary>
</details>
-->
<!--
### Downstream Usage (Sentence Transformers)
You can finetune this model on your own dataset.
<details><summary>Click to expand</summary>
</details>
-->
<!--
### Out-of-Scope Use
*List how the model may foreseeably be misused and address what users ought not to do with the model.*
-->
<!--
## Bias, Risks and Limitations
*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
-->
<!--
### Recommendations
*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
-->
## Training Details
### Training Dataset
#### stage1-triplet-dataset-selected
* Dataset: [stage1-triplet-dataset-selected](https://huggingface.co/datasets/hobbang/stage1-triplet-dataset-selected) at [18e0423](https://huggingface.co/datasets/hobbang/stage1-triplet-dataset-selected/tree/18e0423399bc6678e814264ca8c8acdf02dfce97)
* Size: 23,003 training samples
* Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
* Approximate statistics based on the first 1000 samples:
| | anchor | positive | negative |
|:--------|:-------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|
| type | string | string | string |
| details | <ul><li>min: 94 tokens</li><li>mean: 170.87 tokens</li><li>max: 224 tokens</li></ul> | <ul><li>min: 29 tokens</li><li>mean: 174.15 tokens</li><li>max: 261 tokens</li></ul> | <ul><li>min: 72 tokens</li><li>mean: 174.89 tokens</li><li>max: 261 tokens</li></ul> |
* Samples:
| anchor | positive | negative |
|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| <code>The Invesco Financial Preferred ETF (PGF) seeks to track the ICE Exchange-Listed Fixed Rate Financial Preferred Securities Index, primarily by investing at least 90% of its total assets in the securities comprising the index. The underlying index is market capitalization weighted and designed to track the performance of exchange-listed, fixed rate, U.S. dollar denominated preferred securities, including functionally equivalent instruments, issued by U.S. financial companies. PGF provides a concentrated portfolio exclusively focused on financial-sector preferred securities and is considered non-diversified, holding both investment- and non-investment-grade securities within this focus.</code> | <code>The FlexShares ESG & Climate Investment Grade Corporate Core Index Fund (FEIG) is a passively managed ETF designed to provide broad-market, core exposure to USD-denominated investment-grade corporate bonds. It seeks to track the performance of the Northern Trust ESG & Climate Investment Grade U.S. Corporate Core IndexSM, which selects bonds from a universe of USD-denominated, investment-grade corporate debt with maturities of at least one year. The index employs an optimization process to increase the aggregate ESG score and reduce aggregate climate-related risk among constituent companies, involving ranking firms on material ESG metrics, governance, and carbon risks, while excluding controversial companies and international initiative violators. Weights are also optimized to minimize systematic risk, and the index is rebalanced monthly. Under normal circumstances, the fund invests at least 80% of its assets in the index's securities.</code> | <code>The Viridi Bitcoin Miners ETF primarily invests in companies engaged in Bitcoin mining, aiming to allocate at least 80% of its net assets, plus borrowings for investment purposes, to securities of such companies under normal circumstances. The fund focuses on U.S. and non-U.S. equity securities in developed markets, which may include investments via depositary receipts. It also specifically targets common stock from newly listed IPOs, shares derived from SPAC IPOs, and securities resulting from reverse mergers. This ETF is non-diversified.</code> |
| <code>The Invesco Financial Preferred ETF (PGF) seeks to track the ICE Exchange-Listed Fixed Rate Financial Preferred Securities Index, primarily by investing at least 90% of its total assets in the securities comprising the index. The underlying index is market capitalization weighted and designed to track the performance of exchange-listed, fixed rate, U.S. dollar denominated preferred securities, including functionally equivalent instruments, issued by U.S. financial companies. PGF provides a concentrated portfolio exclusively focused on financial-sector preferred securities and is considered non-diversified, holding both investment- and non-investment-grade securities within this focus.</code> | <code>The Fidelity Sustainable High Yield ETF (FSYD) is an actively managed fund primarily seeking high income, and potentially capital growth, by investing at least 80% of its assets in global high-yield (below investment grade) debt securities. The fund focuses on issuers demonstrating proven or improving sustainability practices based on an evaluation of their individual environmental, social, and governance (ESG) profiles using a proprietary rating process. Its comprehensive selection approach also incorporates a multi-factor quantitative screening model and fundamental analysis of issuers, aiming to identify value and quality within the high-yield universe.</code> | <code>The ETFMG Prime Mobile Payments ETF seeks to track the performance of the Nasdaq CTA Global Digital Payments Index, which identifies companies engaged in the global digital payments industry across categories like card networks, infrastructure, software, processors, and solutions. Under normal circumstances, the fund invests at least 80% of its net assets in common stocks (including ADRs and GDRs) of these Mobile Payments Companies. It typically holds a narrow portfolio expected to contain up to 50 companies, weighted using a theme-adjusted market capitalization scheme, and is considered non-diversified.</code> |
| <code>The Invesco Financial Preferred ETF (PGF) seeks to track the ICE Exchange-Listed Fixed Rate Financial Preferred Securities Index, primarily by investing at least 90% of its total assets in the securities comprising the index. The underlying index is market capitalization weighted and designed to track the performance of exchange-listed, fixed rate, U.S. dollar denominated preferred securities, including functionally equivalent instruments, issued by U.S. financial companies. PGF provides a concentrated portfolio exclusively focused on financial-sector preferred securities and is considered non-diversified, holding both investment- and non-investment-grade securities within this focus.</code> | <code>The First Trust TCW Securitized Plus ETF (DEED) is an actively-managed fund focused on U.S. securitized debt securities, aiming to maximize long-term total return and outperform the Bloomberg US Mortgage-Backed Securities Index. Under normal market conditions, the fund allocates at least 80% of its net assets to securitized debt, including asset-backed securities, residential and commercial mortgage-backed securities, and collateralized loan obligations (CLOs). At least 50% of total assets are invested in securities issued or guaranteed by the U.S. government, its agencies, or government-sponsored entities, while the balance may include non-government and privately-issued securitized debt. The fund invests across various maturities and credit qualities (junk and investment-grade), using proprietary research to identify undervalued securities, and may utilize OTC derivatives for up to 25% of the portfolio.</code> | <code>The First Trust Growth Strength UCITS ETF aims to track the price and yield of The Growth Strength Index. Passively managed, the fund normally invests at least 80% of its assets in the index's common stocks and REIT components. The index selects 50 equal-weighted, well-capitalized, large-cap US companies from the top 500 US securities by market capitalization based on fundamental criteria such as return on equity, long-term debt levels, liquidity, positive shareholder equity, and a composite ranking based on 3-year revenue and cash flow growth. The resulting portfolio is non-diversified and rebalanced quarterly.</code> |
* Loss: [<code>TripletLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#tripletloss) with these parameters:
```json
{
"distance_metric": "TripletDistanceMetric.COSINE",
"triplet_margin": 0.05
}
```
### Evaluation Dataset
#### stage1-triplet-dataset-selected
* Dataset: [stage1-triplet-dataset-selected](https://huggingface.co/datasets/hobbang/stage1-triplet-dataset-selected) at [18e0423](https://huggingface.co/datasets/hobbang/stage1-triplet-dataset-selected/tree/18e0423399bc6678e814264ca8c8acdf02dfce97)
* Size: 388 evaluation samples
* Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
* Approximate statistics based on the first 388 samples:
| | anchor | positive | negative |
|:--------|:-------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|
| type | string | string | string |
| details | <ul><li>min: 85 tokens</li><li>mean: 176.98 tokens</li><li>max: 271 tokens</li></ul> | <ul><li>min: 85 tokens</li><li>mean: 176.83 tokens</li><li>max: 271 tokens</li></ul> | <ul><li>min: 85 tokens</li><li>mean: 175.41 tokens</li><li>max: 271 tokens</li></ul> |
* Samples:
| anchor | positive | negative |
|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| <code>The Global X S&P 500 Risk Managed Income ETF seeks to track the Cboe S&P 500 Risk Managed Income Index by investing at least 80% of its assets in index securities. The index's strategy involves holding the underlying stocks of the S&P 500 Index while applying an options collar, specifically selling at-the-money covered call options and buying monthly 5% out-of-the-money put options corresponding to the portfolio's value. This approach aims to generate income, ideally resulting in a net credit from the options premiums, and provide risk management, though selling at-the-money calls inherently caps the fund's potential for upside participation.</code> | <code>The U.S. Global Technology and Aerospace & Defense ETF is an actively managed ETF seeking capital appreciation by investing in equity securities of companies expected to benefit from national defense efforts. These efforts include technological innovations and the development of products and services related to aerospace, physical, and cybersecurity defense, often in preparation for or in response to domestic, regional, or global conflicts. The fund is non-diversified.</code> | <code>The KraneShares Global Carbon Offset Strategy ETF (KSET) was the first US-listed ETF providing exposure to the global voluntary carbon market. It achieved this by investing primarily in liquid carbon offset credit futures, including CME-traded Global Emissions Offsets (GEOs) and Nature-Based Global Emission Offsets (N-GEOs), which are designed to help businesses meet greenhouse gas reduction goals. Tracking an index that weighted eligible futures based on liquidity, the fund sought exposure to the same carbon offset credit futures, typically those maturing within two years. The ETF was considered non-diversified and utilized a Cayman Island subsidiary. However, the fund was delisted, with its last day of trading on an exchange being March 14, 2024.</code> |
| <code>The Global X S&P 500 Risk Managed Income ETF seeks to track the Cboe S&P 500 Risk Managed Income Index by investing at least 80% of its assets in index securities. The index's strategy involves holding the underlying stocks of the S&P 500 Index while applying an options collar, specifically selling at-the-money covered call options and buying monthly 5% out-of-the-money put options corresponding to the portfolio's value. This approach aims to generate income, ideally resulting in a net credit from the options premiums, and provide risk management, though selling at-the-money calls inherently caps the fund's potential for upside participation.</code> | <code>The JPMorgan Social Advancement ETF (UPWD) is an actively managed, non-diversified fund that seeks to invest globally in companies facilitating social and economic advancements and empowerment across the socioeconomic spectrum. Primarily holding common stocks, depositary receipts, and REITs, the fund targets themes including essential amenities, affordable housing, healthcare, education, attainable financing, and the digital ecosystem, potentially investing in companies of various sizes, including small-caps, across U.S., foreign, and emerging markets with possible concentration in specific sectors. Security selection follows a proprietary three-step process involving exclusions, thematic ranking using a ThemeBot, and a sustainable investment inclusion process combined with fundamental research. Please note that this security is being delisted, with its last day of trading scheduled for December 15, 2023.</code> | <code>The Direxion Daily Gold Miners Index Bull 2X Shares (NUGT) is designed to provide 200% of the daily performance of the NYSE Arca Gold Miners Index, before fees and expenses. This market-cap-weighted index comprises publicly traded global companies, primarily involved in gold mining and to a lesser extent silver mining, operating in both developed and emerging markets. NUGT achieves its objective by investing at least 80% of its net assets in financial instruments providing 2X daily leveraged exposure to the index. As a leveraged fund intended for daily results, NUGT is designed for short-term trading, typically held for only one trading day, and holding it for longer periods can lead to performance results that differ significantly from the stated daily target due to the effects of compounding. The fund is also non-diversified.</code> |
| <code>The Global X S&P 500 Risk Managed Income ETF seeks to track the Cboe S&P 500 Risk Managed Income Index by investing at least 80% of its assets in index securities. The index's strategy involves holding the underlying stocks of the S&P 500 Index while applying an options collar, specifically selling at-the-money covered call options and buying monthly 5% out-of-the-money put options corresponding to the portfolio's value. This approach aims to generate income, ideally resulting in a net credit from the options premiums, and provide risk management, though selling at-the-money calls inherently caps the fund's potential for upside participation.</code> | <code>The Xtrackers MSCI Emerging Markets ESG Leaders Equity ETF tracks an index of large- and mid-cap emerging market stocks that emphasize strong environmental, social, and governance (ESG) characteristics. The index first excludes companies involved in specific controversial industries. From the remaining universe, it ranks stocks based on MSCI ESG scores, including a controversy component, to identify and select the highest-ranking ESG leaders, effectively screening out ESG laggards. To maintain market-like country and sector weights, the index selects the top ESG-scoring stocks within each sector until a specified market capitalization threshold is reached. Selected stocks are then weighted by market capitalization within their respective sectors. The fund typically invests over 80% of its assets in the securities of this underlying index.</code> | <code>The BlackRock Future Climate and Sustainable Economy ETF (BECO) is an actively managed equity fund focused on the transition to a lower carbon economy and future climate themes. It seeks a relatively concentrated, non-diversified portfolio of globally-listed companies of any market capitalization, investing across multiple subthemes such as sustainable energy, resource efficiency, future transport, sustainable nutrition, and biodiversity. The fund utilizes proprietary environmental criteria, including carbon metrics, and aims to align with the Paris Climate Agreement goals for net-zero emissions by 2050, while excluding certain high-emission industries and companies violating the UN Global Compact. It also attempts to achieve a better aggregate environmental and ESG score than its benchmark, the MSCI ACWI Multiple Industries Select Index. Note that BECO is being delisted, with its last day of trading on an exchange scheduled for August 12, 2024.</code> |
* Loss: [<code>TripletLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#tripletloss) with these parameters:
```json
{
"distance_metric": "TripletDistanceMetric.COSINE",
"triplet_margin": 0.05
}
```
### Training Hyperparameters
#### Non-Default Hyperparameters
- `eval_strategy`: steps
- `per_device_train_batch_size`: 32
- `per_device_eval_batch_size`: 16
- `learning_rate`: 2e-06
- `num_train_epochs`: 1
- `warmup_ratio`: 0.1
- `bf16`: True
- `dataloader_drop_last`: True
- `load_best_model_at_end`: True
- `batch_sampler`: no_duplicates
#### All Hyperparameters
<details><summary>Click to expand</summary>
- `overwrite_output_dir`: False
- `do_predict`: False
- `eval_strategy`: steps
- `prediction_loss_only`: True
- `per_device_train_batch_size`: 32
- `per_device_eval_batch_size`: 16
- `per_gpu_train_batch_size`: None
- `per_gpu_eval_batch_size`: None
- `gradient_accumulation_steps`: 1
- `eval_accumulation_steps`: None
- `torch_empty_cache_steps`: None
- `learning_rate`: 2e-06
- `weight_decay`: 0.0
- `adam_beta1`: 0.9
- `adam_beta2`: 0.999
- `adam_epsilon`: 1e-08
- `max_grad_norm`: 1.0
- `num_train_epochs`: 1
- `max_steps`: -1
- `lr_scheduler_type`: linear
- `lr_scheduler_kwargs`: {}
- `warmup_ratio`: 0.1
- `warmup_steps`: 0
- `log_level`: passive
- `log_level_replica`: warning
- `log_on_each_node`: True
- `logging_nan_inf_filter`: True
- `save_safetensors`: True
- `save_on_each_node`: False
- `save_only_model`: False
- `restore_callback_states_from_checkpoint`: False
- `no_cuda`: False
- `use_cpu`: False
- `use_mps_device`: False
- `seed`: 42
- `data_seed`: None
- `jit_mode_eval`: False
- `use_ipex`: False
- `bf16`: True
- `fp16`: False
- `fp16_opt_level`: O1
- `half_precision_backend`: auto
- `bf16_full_eval`: False
- `fp16_full_eval`: False
- `tf32`: None
- `local_rank`: 0
- `ddp_backend`: None
- `tpu_num_cores`: None
- `tpu_metrics_debug`: False
- `debug`: []
- `dataloader_drop_last`: True
- `dataloader_num_workers`: 0
- `dataloader_prefetch_factor`: None
- `past_index`: -1
- `disable_tqdm`: False
- `remove_unused_columns`: True
- `label_names`: None
- `load_best_model_at_end`: True
- `ignore_data_skip`: False
- `fsdp`: []
- `fsdp_min_num_params`: 0
- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
- `tp_size`: 0
- `fsdp_transformer_layer_cls_to_wrap`: None
- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
- `deepspeed`: None
- `label_smoothing_factor`: 0.0
- `optim`: adamw_torch
- `optim_args`: None
- `adafactor`: False
- `group_by_length`: False
- `length_column_name`: length
- `ddp_find_unused_parameters`: None
- `ddp_bucket_cap_mb`: None
- `ddp_broadcast_buffers`: False
- `dataloader_pin_memory`: True
- `dataloader_persistent_workers`: False
- `skip_memory_metrics`: True
- `use_legacy_prediction_loop`: False
- `push_to_hub`: False
- `resume_from_checkpoint`: None
- `hub_model_id`: None
- `hub_strategy`: every_save
- `hub_private_repo`: None
- `hub_always_push`: False
- `gradient_checkpointing`: False
- `gradient_checkpointing_kwargs`: None
- `include_inputs_for_metrics`: False
- `include_for_metrics`: []
- `eval_do_concat_batches`: True
- `fp16_backend`: auto
- `push_to_hub_model_id`: None
- `push_to_hub_organization`: None
- `mp_parameters`:
- `auto_find_batch_size`: False
- `full_determinism`: False
- `torchdynamo`: None
- `ray_scope`: last
- `ddp_timeout`: 1800
- `torch_compile`: False
- `torch_compile_backend`: None
- `torch_compile_mode`: None
- `include_tokens_per_second`: False
- `include_num_input_tokens_seen`: False
- `neftune_noise_alpha`: None
- `optim_target_modules`: None
- `batch_eval_metrics`: False
- `eval_on_start`: False
- `use_liger_kernel`: False
- `eval_use_gather_object`: False
- `average_tokens_across_devices`: False
- `prompts`: None
- `batch_sampler`: no_duplicates
- `multi_dataset_batch_sampler`: proportional
</details>
### Training Logs
| Epoch | Step | Training Loss | Validation Loss |
|:----------:|:-------:|:-------------:|:---------------:|
| 0.0139 | 10 | 0.0367 | - |
| 0.0279 | 20 | 0.0378 | - |
| 0.0418 | 30 | 0.0346 | - |
| 0.0557 | 40 | 0.0337 | - |
| 0.0696 | 50 | 0.0328 | - |
| 0.0836 | 60 | 0.0291 | - |
| 0.0975 | 70 | 0.0257 | - |
| 0.1114 | 80 | 0.0206 | - |
| 0.1253 | 90 | 0.0201 | - |
| 0.1393 | 100 | 0.0208 | 0.0132 |
| 0.1532 | 110 | 0.0167 | - |
| 0.1671 | 120 | 0.0167 | - |
| 0.1811 | 130 | 0.0156 | - |
| 0.1950 | 140 | 0.0153 | - |
| 0.2089 | 150 | 0.0125 | - |
| 0.2228 | 160 | 0.0141 | - |
| 0.2368 | 170 | 0.0153 | - |
| 0.2507 | 180 | 0.0142 | - |
| 0.2646 | 190 | 0.0095 | - |
| 0.2786 | 200 | 0.0144 | 0.0111 |
| 0.2925 | 210 | 0.0132 | - |
| 0.3064 | 220 | 0.0107 | - |
| 0.3203 | 230 | 0.0116 | - |
| 0.3343 | 240 | 0.0134 | - |
| 0.3482 | 250 | 0.0112 | - |
| 0.3621 | 260 | 0.0115 | - |
| 0.3760 | 270 | 0.0124 | - |
| 0.3900 | 280 | 0.0126 | - |
| 0.4039 | 290 | 0.0105 | - |
| 0.4178 | 300 | 0.0111 | 0.0109 |
| 0.4318 | 310 | 0.0136 | - |
| 0.4457 | 320 | 0.0123 | - |
| 0.4596 | 330 | 0.0113 | - |
| 0.4735 | 340 | 0.0125 | - |
| 0.4875 | 350 | 0.0082 | - |
| 0.5014 | 360 | 0.0102 | - |
| 0.5153 | 370 | 0.0081 | - |
| 0.5292 | 380 | 0.0115 | - |
| 0.5432 | 390 | 0.0107 | - |
| 0.5571 | 400 | 0.012 | 0.0106 |
| 0.5710 | 410 | 0.0094 | - |
| 0.5850 | 420 | 0.0099 | - |
| 0.5989 | 430 | 0.0105 | - |
| 0.6128 | 440 | 0.0101 | - |
| 0.6267 | 450 | 0.0099 | - |
| 0.6407 | 460 | 0.0106 | - |
| 0.6546 | 470 | 0.0099 | - |
| 0.6685 | 480 | 0.0108 | - |
| 0.6825 | 490 | 0.01 | - |
| **0.6964** | **500** | **0.0084** | **0.0102** |
| 0.7103 | 510 | 0.0092 | - |
| 0.7242 | 520 | 0.0084 | - |
| 0.7382 | 530 | 0.0077 | - |
| 0.7521 | 540 | 0.0096 | - |
| 0.7660 | 550 | 0.0099 | - |
| 0.7799 | 560 | 0.0103 | - |
| 0.7939 | 570 | 0.0082 | - |
| 0.8078 | 580 | 0.009 | - |
| 0.8217 | 590 | 0.0078 | - |
| 0.8357 | 600 | 0.0091 | 0.0104 |
| 0.8496 | 610 | 0.0088 | - |
| 0.8635 | 620 | 0.0103 | - |
| 0.8774 | 630 | 0.0109 | - |
| 0.8914 | 640 | 0.0072 | - |
| 0.9053 | 650 | 0.0084 | - |
| 0.9192 | 660 | 0.0099 | - |
| 0.9331 | 670 | 0.008 | - |
| 0.9471 | 680 | 0.0081 | - |
| 0.9610 | 690 | 0.0075 | - |
| 0.9749 | 700 | 0.0096 | 0.0103 |
| 0.9889 | 710 | 0.0089 | - |
* The bold row denotes the saved checkpoint.
### Framework Versions
- Python: 3.10.12
- Sentence Transformers: 4.1.0
- Transformers: 4.51.3
- PyTorch: 2.1.0+cu118
- Accelerate: 1.6.0
- Datasets: 3.5.0
- Tokenizers: 0.21.1
## Citation
### BibTeX
#### Sentence Transformers
```bibtex
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
```
#### TripletLoss
```bibtex
@misc{hermans2017defense,
title={In Defense of the Triplet Loss for Person Re-Identification},
author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
year={2017},
eprint={1703.07737},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```
<!--
## Glossary
*Clearly define terms in order to be accessible across audiences.*
-->
<!--
## Model Card Authors
*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
-->
<!--
## Model Card Contact
*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
-->