e5-step1 / README.md
suhwan3's picture
Upload fine-tuned model
9208a43 verified
metadata
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:23003
  - loss:TripletLoss
base_model: intfloat/multilingual-e5-large-instruct
widget:
  - source_sentence: >-
      The Merlyn.AI SectorSurfer Momentum ETF is designed to dynamically shift
      its investment strategy based on market conditions, tracking an index that
      utilizes an algorithmic Bull/Bear indicator assessing U.S. equity markets
      for advancing trends or elevated decline risk using factors like
      price-trend, momentum, value sentiment, and volatility. In Bull markets,
      it targets approximately a 70/30 domestic/foreign aggressive equity
      allocation by selecting six thematic ETFs (four sectors, two
      geopolitical), while in Bear markets, it seeks safety by choosing at least
      four momentum-leading bond, treasury, and gold safe-harbor ETFs,
      explicitly avoiding inverse and leveraged funds. The index is typically
      evaluated monthly, though the indicator can trigger strategy changes
      anytime during excessive market volatility. Under normal circumstances, at
      least 80% of the fund's assets are invested in the index's component
      securities; the fund is non-diversified. Please be aware this fund is
      closing, with its last day of trading scheduled for November 10, 2023.
    sentences:
      - >-
        The BlackRock Future Climate and Sustainable Economy ETF (BECO) is an
        actively managed equity fund focused on the transition to a lower carbon
        economy and future climate themes. It seeks a relatively concentrated,
        non-diversified portfolio of globally-listed companies of any market
        capitalization, investing across multiple subthemes such as sustainable
        energy, resource efficiency, future transport, sustainable nutrition,
        and biodiversity. The fund utilizes proprietary environmental criteria,
        including carbon metrics, and aims to align with the Paris Climate
        Agreement goals for net-zero emissions by 2050, while excluding certain
        high-emission industries and companies violating the UN Global Compact.
        It also attempts to achieve a better aggregate environmental and ESG
        score than its benchmark, the MSCI ACWI Multiple Industries Select
        Index. Note that BECO is being delisted, with its last day of trading on
        an exchange scheduled for August 12, 2024.
      - >-
        The Direxion Daily Semiconductor Bull 3X Shares (SOXL) seeks daily
        investment results, before fees and expenses, of 300% of the daily
        performance of the ICE Semiconductor Index. To achieve this bullish,
        leveraged exposure, the fund invests at least 80% of its net assets in
        financial instruments, such as swap agreements, securities of the index,
        and ETFs that track the index. The underlying ICE Semiconductor Index is
        a rules-based, modified float-adjusted market capitalization-weighted
        index that tracks the performance of the thirty largest U.S. listed
        semiconductor companies. As a daily leveraged fund, SOXL rebalances
        daily, meaning results over periods longer than one day can differ
        significantly from 300% of the index's performance due to the effects of
        compounding; the fund is also non-diversified.
      - >-
        The KraneShares Trust ETF seeks investment results corresponding
        generally to the price and yield performance of the Solactive Global
        Luxury Index. Under normal circumstances, the fund invests at least 80%
        of its net assets in instruments in the underlying index or those with
        similar economic characteristics. This index is a modified, free float
        adjusted market capitalization weighted index designed to measure the
        equity performance of companies from global luxury-related sectors, such
        as travel & leisure, premium ware, and apparel, located in developed
        markets. The index selects the top 25 companies based on criteria
        including size, trading volume, and country of listing, applying a
        modified weighting approach where the top 5 securities receive higher
        allocations (with the largest capped at 10%) while others are capped at
        4.5%. The index is rebalanced semi-annually. The fund is non-diversified
        and while targeting US investments, it maintains at least 40% of its
        assets in foreign entities or those with significant business activities
        outside the United States.
  - source_sentence: >-
      The Xtrackers MSCI Emerging Markets Climate Selection ETF seeks to track
      an emerging markets index focused on companies meeting specific climate
      criteria. Derived from the MSCI ACWI Select Climate 500 methodology, the
      underlying index selects eligible emerging market stocks using an
      optimization process designed to reduce greenhouse gas emission intensity
      (targeting 10% revenue-related and 7% financing-related reductions) and
      increase exposure to companies with SBTi-approved targets. The strategy
      also excludes controversial companies and evaluates companies based on
      broader ESG considerations. The fund is non-diversified and invests at
      least 80% of its assets in the component securities of this
      climate-focused emerging markets index.
    sentences:
      - >-
        The First Trust Indxx NextG UCITS ETF seeks investment results that
        generally correspond to the price and yield of the Indxx 5G & NextG
        Thematic Index. This tiered-weighted index of global mid- and large-cap
        equities tracks companies dedicating significant resources to the
        research, development, and application of fifth generation (5G) and
        emerging next generation digital cellular technologies. The fund
        normally invests at least 90% of its net assets in the index's
        securities, which are primarily drawn from themes including 5G
        infrastructure and hardware (such as data/cell tower REITs and equipment
        manufacturers) and telecommunication service providers operating
        relevant cellular and wireless networks.
      - >-
        The iPath S&P MLP ETN tracks an S&P Dow Jones index designed to provide
        exposure to leading partnerships listed on major U.S. exchanges.
        Comprising master limited partnerships (MLPs) and similar publicly
        traded limited liability companies, these constituents are primarily
        classified within the GICS Energy Sector and GICS Gas Utilities
        Industry.
      - >-
        The First Trust NASDAQ ABA Community Bank Index Fund (QABA) seeks
        investment results corresponding generally to the NASDAQ OMX® ABA
        Community Bank TM Index, normally investing at least 90% of its net
        assets in the index's securities. The index tracks NASDAQ-listed US
        banks and thrifts of small, mid, and large capitalization, designed to
        capture the community banking industry. Uniquely, it deliberately
        excludes the 50 largest banks by asset size, banks with significant
        international operations, and those specializing in credit cards,
        specifically targeting true community banks and avoiding larger
        "mega-money centers." The index is market-cap-weighted and undergoes
        regular rebalancing and reconstitution, subject to certain issuer weight
        caps.
  - source_sentence: >-
      The VanEck Morningstar Wide Moat ETF (MOAT) seeks to replicate the
      performance of the Morningstar® Wide Moat Focus IndexSM by investing at
      least 80% of its assets in the index's securities. The fund targets US
      companies that Morningstar identifies as having sustainable competitive
      advantages ("wide moat companies") based on a proprietary methodology
      considering quantitative and qualitative factors. Specifically, the index
      focuses on companies determined to have the highest fair value among these
      wide moat firms. MOAT holds a concentrated, equal-weighted portfolio,
      which typically involves around 40 names but can hold more, featuring a
      staggered rebalance schedule and potential sector biases. The fund is
      non-diversified and employs caps on turnover and sector exposure,
      resulting in a strategy that can significantly diverge from broader market
      coverage despite its focus on established companies with competitive
      advantages.
    sentences:
      - >-
        The Fidelity MSCI Industrials Index ETF (FIDU) aims to match the
        performance of the MSCI USA IMI Industrials 25/25 Index, which
        represents the broad U.S. industrial sector using a market-cap-weighted
        approach with a 25/25 capping methodology. The fund, launched in October
        2013, provides plain-vanilla exposure and invests at least 80% of its
        assets in securities found within this index. It uses a representative
        sampling strategy rather than replicating the entire index, and the
        underlying index is rebalanced quarterly.
      - >-
        The KraneShares Electric Vehicles and Future Mobility Index ETF (KARS)
        seeks to track the price and yield performance of the Bloomberg Electric
        Vehicles Index by investing at least 80% of its net assets in
        corresponding instruments or those with similar economic
        characteristics. The underlying index is designed to measure the equity
        market performance of globally-listed companies significantly involved
        in the production of electric vehicles, components, or other initiatives
        enhancing future mobility, including areas like energy storage,
        autonomous navigation technology, lithium and copper mining, and
        hydrogen fuel cells. KARS holds a concentrated portfolio, typically
        around 32 companies, weighted by market capitalization subject to
        specific position caps, and is reconstituted and rebalanced quarterly.
      - >-
        The iPath S&P MLP ETN tracks an S&P Dow Jones index designed to provide
        exposure to leading partnerships listed on major U.S. exchanges.
        Comprising master limited partnerships (MLPs) and similar publicly
        traded limited liability companies, these constituents are primarily
        classified within the GICS Energy Sector and GICS Gas Utilities
        Industry.
  - source_sentence: >-
      The Global X Clean Water ETF (AQWA) seeks to provide exposure to the
      global water industry by tracking the Solactive Global Clean Water
      Industry Index. The fund invests at least 80% of its assets in securities
      of this index, which targets companies deriving a significant portion (at
      least 50%) of their revenue from water infrastructure, equipment, and
      services, including treatment, purification, conservation, and management.
      The index selection process uses proprietary technology like NLP to
      identify eligible firms, incorporates minimum ESG standards based on UN
      Global Compact principles, and includes the 40 highest-ranking companies,
      weighted by market capitalization with specific caps. Reconstituted and
      rebalanced semi-annually, the fund is considered non-diversified.
    sentences:
      - >-
        The First Trust Nasdaq Transportation ETF aims to track the Nasdaq US
        Smart Transportation TM Index, investing at least 90% of its net assets
        in the index's securities. This non-diversified fund provides exposure
        to a concentrated portfolio of approximately 30 highly liquid U.S.
        transportation companies across various segments such as delivery,
        shipping, marine, railroads, trucking, airports, airlines, bridges,
        tunnels, and automobiles. The index selects companies based on liquidity
        and then ranks and weights them according to factors reflecting growth
        (price returns), value (cash flow-to-price), and low volatility,
        ensuring no single constituent exceeds 8%. The index undergoes annual
        reconstitution and quarterly rebalancing.
      - >-
        The Direxion Daily Healthcare Bull 3X Shares (CURE) is an ETF that seeks
        daily investment results, before fees and expenses, of 300% (3X) of the
        daily performance of the Health Care Select Sector Index. It invests at
        least 80% of its net assets in financial instruments designed to provide
        this 3X daily leveraged exposure. The underlying index tracks US listed
        healthcare companies, including pharmaceuticals, health care equipment
        and supplies, providers and services, biotechnology, life sciences
        tools, and health care technology, covering major large-cap names. CURE
        is non-diversified and intended strictly as a short-term tactical
        instrument, as it delivers its stated 3X exposure only for a single day,
        and returns over longer periods can significantly differ from three
        times the index's performance.
      - >-
        The BlackRock Future Climate and Sustainable Economy ETF (BECO) is an
        actively managed equity fund focused on the transition to a lower carbon
        economy and future climate themes. It seeks a relatively concentrated,
        non-diversified portfolio of globally-listed companies of any market
        capitalization, investing across multiple subthemes such as sustainable
        energy, resource efficiency, future transport, sustainable nutrition,
        and biodiversity. The fund utilizes proprietary environmental criteria,
        including carbon metrics, and aims to align with the Paris Climate
        Agreement goals for net-zero emissions by 2050, while excluding certain
        high-emission industries and companies violating the UN Global Compact.
        It also attempts to achieve a better aggregate environmental and ESG
        score than its benchmark, the MSCI ACWI Multiple Industries Select
        Index. Note that BECO is being delisted, with its last day of trading on
        an exchange scheduled for August 12, 2024.
  - source_sentence: >-
      The Horizon Kinetics Medical ETF (MEDX) is an actively-managed,
      non-diversified fund aiming for long-term capital growth by investing
      primarily in global companies (U.S. and foreign) within the medical
      research, pharmaceuticals, medical technology, and related industries. The
      fund typically focuses on companies generating at least 50% of their
      revenue from these areas and may include companies of any market
      capitalization, with an emphasis on those involved in cancer research and
      treatment. Under normal circumstances, at least 80% of assets are invested
      in equity securities, convertibles, and warrants of such companies.
      Portfolio selection and weighting are based on the adviser's evaluation
      and discretion. The fund may also temporarily invest up to 100% in US
      short-term debt or invest in non-convertible high-yield bonds.
    sentences:
      - >-
        The Fidelity MSCI Health Care Index ETF (FHLC) seeks to track the
        performance of the MSCI USA IMI Health Care 25/50 Index, which
        represents the broad U.S. health care sector. The ETF invests at least
        80% of its assets in securities included in this market-cap-weighted
        index, which captures large, mid, and small-cap companies across over 10
        subsectors. Employing a representative sampling strategy, the fund aims
        to correspond to the index's performance. The index incorporates a 25/50
        capping methodology, is rebalanced quarterly, and its broad reach offers
        diversification across cap sizes and subsectors, potentially reducing
        concentration in dominant large pharma names and increasing exposure to
        areas like drug retailers and insurance. The fund is classified as
        non-diversified.
      - >-
        The SPDR S&P Oil & Gas Equipment & Services ETF (XES) seeks investment
        results corresponding generally to the total return performance of the
        S&P Oil & Gas Equipment & Services Select Industry Index. This index
        represents companies in the oil and gas equipment and services segment
        of the broad U.S. S&P Total Market Index (S&P TMI), including those
        involved in activities like wildcatting, drilling hardware, and related
        services. The index utilizes an equal-weighting methodology for its
        constituent companies, which are selected based on market capitalization
        and liquidity requirements and undergo quarterly rebalancing. The fund
        itself employs a sampling strategy, aiming to invest at least 80% of its
        total assets in the securities that comprise its benchmark index.
      - >-
        The VanEck Biotech ETF (BBH) seeks to replicate the performance of the
        MVIS® US Listed Biotech 25 Index, which provides exposure to
        approximately 25 of the largest or leading U.S.-listed companies in the
        biotechnology industry. The fund normally invests at least 80% of its
        assets in securities comprising this market-cap-weighted index. The
        underlying index includes common stocks and depositary receipts of firms
        involved in the research, development, production, marketing, and sale
        of drugs based on genetic analysis and diagnostic equipment. While
        focusing on U.S.-listed companies, it may include foreign firms listed
        domestically, and medium-capitalization companies can be included.
        Reflecting the index's concentration, the fund is non-diversified and
        may have a top-heavy portfolio. The index is reviewed semi-annually.
datasets:
  - hobbang/stage1-triplet-dataset-selected
pipeline_tag: sentence-similarity
library_name: sentence-transformers

SentenceTransformer based on intfloat/multilingual-e5-large-instruct

This is a sentence-transformers model finetuned from intfloat/multilingual-e5-large-instruct on the stage1-triplet-dataset-selected dataset. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: XLMRobertaModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    "The Horizon Kinetics Medical ETF (MEDX) is an actively-managed, non-diversified fund aiming for long-term capital growth by investing primarily in global companies (U.S. and foreign) within the medical research, pharmaceuticals, medical technology, and related industries. The fund typically focuses on companies generating at least 50% of their revenue from these areas and may include companies of any market capitalization, with an emphasis on those involved in cancer research and treatment. Under normal circumstances, at least 80% of assets are invested in equity securities, convertibles, and warrants of such companies. Portfolio selection and weighting are based on the adviser's evaluation and discretion. The fund may also temporarily invest up to 100% in US short-term debt or invest in non-convertible high-yield bonds.",
    "The VanEck Biotech ETF (BBH) seeks to replicate the performance of the MVIS® US Listed Biotech 25 Index, which provides exposure to approximately 25 of the largest or leading U.S.-listed companies in the biotechnology industry. The fund normally invests at least 80% of its assets in securities comprising this market-cap-weighted index. The underlying index includes common stocks and depositary receipts of firms involved in the research, development, production, marketing, and sale of drugs based on genetic analysis and diagnostic equipment. While focusing on U.S.-listed companies, it may include foreign firms listed domestically, and medium-capitalization companies can be included. Reflecting the index's concentration, the fund is non-diversified and may have a top-heavy portfolio. The index is reviewed semi-annually.",
    'The SPDR S&P Oil & Gas Equipment & Services ETF (XES) seeks investment results corresponding generally to the total return performance of the S&P Oil & Gas Equipment & Services Select Industry Index. This index represents companies in the oil and gas equipment and services segment of the broad U.S. S&P Total Market Index (S&P TMI), including those involved in activities like wildcatting, drilling hardware, and related services. The index utilizes an equal-weighting methodology for its constituent companies, which are selected based on market capitalization and liquidity requirements and undergo quarterly rebalancing. The fund itself employs a sampling strategy, aiming to invest at least 80% of its total assets in the securities that comprise its benchmark index.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

stage1-triplet-dataset-selected

  • Dataset: stage1-triplet-dataset-selected at 18e0423
  • Size: 23,003 training samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 94 tokens
    • mean: 170.87 tokens
    • max: 224 tokens
    • min: 29 tokens
    • mean: 174.15 tokens
    • max: 261 tokens
    • min: 72 tokens
    • mean: 174.89 tokens
    • max: 261 tokens
  • Samples:
    anchor positive negative
    The Invesco Financial Preferred ETF (PGF) seeks to track the ICE Exchange-Listed Fixed Rate Financial Preferred Securities Index, primarily by investing at least 90% of its total assets in the securities comprising the index. The underlying index is market capitalization weighted and designed to track the performance of exchange-listed, fixed rate, U.S. dollar denominated preferred securities, including functionally equivalent instruments, issued by U.S. financial companies. PGF provides a concentrated portfolio exclusively focused on financial-sector preferred securities and is considered non-diversified, holding both investment- and non-investment-grade securities within this focus. The FlexShares ESG & Climate Investment Grade Corporate Core Index Fund (FEIG) is a passively managed ETF designed to provide broad-market, core exposure to USD-denominated investment-grade corporate bonds. It seeks to track the performance of the Northern Trust ESG & Climate Investment Grade U.S. Corporate Core IndexSM, which selects bonds from a universe of USD-denominated, investment-grade corporate debt with maturities of at least one year. The index employs an optimization process to increase the aggregate ESG score and reduce aggregate climate-related risk among constituent companies, involving ranking firms on material ESG metrics, governance, and carbon risks, while excluding controversial companies and international initiative violators. Weights are also optimized to minimize systematic risk, and the index is rebalanced monthly. Under normal circumstances, the fund invests at least 80% of its assets in the index's securities. The Viridi Bitcoin Miners ETF primarily invests in companies engaged in Bitcoin mining, aiming to allocate at least 80% of its net assets, plus borrowings for investment purposes, to securities of such companies under normal circumstances. The fund focuses on U.S. and non-U.S. equity securities in developed markets, which may include investments via depositary receipts. It also specifically targets common stock from newly listed IPOs, shares derived from SPAC IPOs, and securities resulting from reverse mergers. This ETF is non-diversified.
    The Invesco Financial Preferred ETF (PGF) seeks to track the ICE Exchange-Listed Fixed Rate Financial Preferred Securities Index, primarily by investing at least 90% of its total assets in the securities comprising the index. The underlying index is market capitalization weighted and designed to track the performance of exchange-listed, fixed rate, U.S. dollar denominated preferred securities, including functionally equivalent instruments, issued by U.S. financial companies. PGF provides a concentrated portfolio exclusively focused on financial-sector preferred securities and is considered non-diversified, holding both investment- and non-investment-grade securities within this focus. The Fidelity Sustainable High Yield ETF (FSYD) is an actively managed fund primarily seeking high income, and potentially capital growth, by investing at least 80% of its assets in global high-yield (below investment grade) debt securities. The fund focuses on issuers demonstrating proven or improving sustainability practices based on an evaluation of their individual environmental, social, and governance (ESG) profiles using a proprietary rating process. Its comprehensive selection approach also incorporates a multi-factor quantitative screening model and fundamental analysis of issuers, aiming to identify value and quality within the high-yield universe. The ETFMG Prime Mobile Payments ETF seeks to track the performance of the Nasdaq CTA Global Digital Payments Index, which identifies companies engaged in the global digital payments industry across categories like card networks, infrastructure, software, processors, and solutions. Under normal circumstances, the fund invests at least 80% of its net assets in common stocks (including ADRs and GDRs) of these Mobile Payments Companies. It typically holds a narrow portfolio expected to contain up to 50 companies, weighted using a theme-adjusted market capitalization scheme, and is considered non-diversified.
    The Invesco Financial Preferred ETF (PGF) seeks to track the ICE Exchange-Listed Fixed Rate Financial Preferred Securities Index, primarily by investing at least 90% of its total assets in the securities comprising the index. The underlying index is market capitalization weighted and designed to track the performance of exchange-listed, fixed rate, U.S. dollar denominated preferred securities, including functionally equivalent instruments, issued by U.S. financial companies. PGF provides a concentrated portfolio exclusively focused on financial-sector preferred securities and is considered non-diversified, holding both investment- and non-investment-grade securities within this focus. The First Trust TCW Securitized Plus ETF (DEED) is an actively-managed fund focused on U.S. securitized debt securities, aiming to maximize long-term total return and outperform the Bloomberg US Mortgage-Backed Securities Index. Under normal market conditions, the fund allocates at least 80% of its net assets to securitized debt, including asset-backed securities, residential and commercial mortgage-backed securities, and collateralized loan obligations (CLOs). At least 50% of total assets are invested in securities issued or guaranteed by the U.S. government, its agencies, or government-sponsored entities, while the balance may include non-government and privately-issued securitized debt. The fund invests across various maturities and credit qualities (junk and investment-grade), using proprietary research to identify undervalued securities, and may utilize OTC derivatives for up to 25% of the portfolio. The First Trust Growth Strength UCITS ETF aims to track the price and yield of The Growth Strength Index. Passively managed, the fund normally invests at least 80% of its assets in the index's common stocks and REIT components. The index selects 50 equal-weighted, well-capitalized, large-cap US companies from the top 500 US securities by market capitalization based on fundamental criteria such as return on equity, long-term debt levels, liquidity, positive shareholder equity, and a composite ranking based on 3-year revenue and cash flow growth. The resulting portfolio is non-diversified and rebalanced quarterly.
  • Loss: TripletLoss with these parameters:
    {
        "distance_metric": "TripletDistanceMetric.COSINE",
        "triplet_margin": 0.05
    }
    

Evaluation Dataset

stage1-triplet-dataset-selected

  • Dataset: stage1-triplet-dataset-selected at 18e0423
  • Size: 388 evaluation samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 388 samples:
    anchor positive negative
    type string string string
    details
    • min: 85 tokens
    • mean: 176.98 tokens
    • max: 271 tokens
    • min: 85 tokens
    • mean: 176.83 tokens
    • max: 271 tokens
    • min: 85 tokens
    • mean: 175.41 tokens
    • max: 271 tokens
  • Samples:
    anchor positive negative
    The Global X S&P 500 Risk Managed Income ETF seeks to track the Cboe S&P 500 Risk Managed Income Index by investing at least 80% of its assets in index securities. The index's strategy involves holding the underlying stocks of the S&P 500 Index while applying an options collar, specifically selling at-the-money covered call options and buying monthly 5% out-of-the-money put options corresponding to the portfolio's value. This approach aims to generate income, ideally resulting in a net credit from the options premiums, and provide risk management, though selling at-the-money calls inherently caps the fund's potential for upside participation. The U.S. Global Technology and Aerospace & Defense ETF is an actively managed ETF seeking capital appreciation by investing in equity securities of companies expected to benefit from national defense efforts. These efforts include technological innovations and the development of products and services related to aerospace, physical, and cybersecurity defense, often in preparation for or in response to domestic, regional, or global conflicts. The fund is non-diversified. The KraneShares Global Carbon Offset Strategy ETF (KSET) was the first US-listed ETF providing exposure to the global voluntary carbon market. It achieved this by investing primarily in liquid carbon offset credit futures, including CME-traded Global Emissions Offsets (GEOs) and Nature-Based Global Emission Offsets (N-GEOs), which are designed to help businesses meet greenhouse gas reduction goals. Tracking an index that weighted eligible futures based on liquidity, the fund sought exposure to the same carbon offset credit futures, typically those maturing within two years. The ETF was considered non-diversified and utilized a Cayman Island subsidiary. However, the fund was delisted, with its last day of trading on an exchange being March 14, 2024.
    The Global X S&P 500 Risk Managed Income ETF seeks to track the Cboe S&P 500 Risk Managed Income Index by investing at least 80% of its assets in index securities. The index's strategy involves holding the underlying stocks of the S&P 500 Index while applying an options collar, specifically selling at-the-money covered call options and buying monthly 5% out-of-the-money put options corresponding to the portfolio's value. This approach aims to generate income, ideally resulting in a net credit from the options premiums, and provide risk management, though selling at-the-money calls inherently caps the fund's potential for upside participation. The JPMorgan Social Advancement ETF (UPWD) is an actively managed, non-diversified fund that seeks to invest globally in companies facilitating social and economic advancements and empowerment across the socioeconomic spectrum. Primarily holding common stocks, depositary receipts, and REITs, the fund targets themes including essential amenities, affordable housing, healthcare, education, attainable financing, and the digital ecosystem, potentially investing in companies of various sizes, including small-caps, across U.S., foreign, and emerging markets with possible concentration in specific sectors. Security selection follows a proprietary three-step process involving exclusions, thematic ranking using a ThemeBot, and a sustainable investment inclusion process combined with fundamental research. Please note that this security is being delisted, with its last day of trading scheduled for December 15, 2023. The Direxion Daily Gold Miners Index Bull 2X Shares (NUGT) is designed to provide 200% of the daily performance of the NYSE Arca Gold Miners Index, before fees and expenses. This market-cap-weighted index comprises publicly traded global companies, primarily involved in gold mining and to a lesser extent silver mining, operating in both developed and emerging markets. NUGT achieves its objective by investing at least 80% of its net assets in financial instruments providing 2X daily leveraged exposure to the index. As a leveraged fund intended for daily results, NUGT is designed for short-term trading, typically held for only one trading day, and holding it for longer periods can lead to performance results that differ significantly from the stated daily target due to the effects of compounding. The fund is also non-diversified.
    The Global X S&P 500 Risk Managed Income ETF seeks to track the Cboe S&P 500 Risk Managed Income Index by investing at least 80% of its assets in index securities. The index's strategy involves holding the underlying stocks of the S&P 500 Index while applying an options collar, specifically selling at-the-money covered call options and buying monthly 5% out-of-the-money put options corresponding to the portfolio's value. This approach aims to generate income, ideally resulting in a net credit from the options premiums, and provide risk management, though selling at-the-money calls inherently caps the fund's potential for upside participation. The Xtrackers MSCI Emerging Markets ESG Leaders Equity ETF tracks an index of large- and mid-cap emerging market stocks that emphasize strong environmental, social, and governance (ESG) characteristics. The index first excludes companies involved in specific controversial industries. From the remaining universe, it ranks stocks based on MSCI ESG scores, including a controversy component, to identify and select the highest-ranking ESG leaders, effectively screening out ESG laggards. To maintain market-like country and sector weights, the index selects the top ESG-scoring stocks within each sector until a specified market capitalization threshold is reached. Selected stocks are then weighted by market capitalization within their respective sectors. The fund typically invests over 80% of its assets in the securities of this underlying index. The BlackRock Future Climate and Sustainable Economy ETF (BECO) is an actively managed equity fund focused on the transition to a lower carbon economy and future climate themes. It seeks a relatively concentrated, non-diversified portfolio of globally-listed companies of any market capitalization, investing across multiple subthemes such as sustainable energy, resource efficiency, future transport, sustainable nutrition, and biodiversity. The fund utilizes proprietary environmental criteria, including carbon metrics, and aims to align with the Paris Climate Agreement goals for net-zero emissions by 2050, while excluding certain high-emission industries and companies violating the UN Global Compact. It also attempts to achieve a better aggregate environmental and ESG score than its benchmark, the MSCI ACWI Multiple Industries Select Index. Note that BECO is being delisted, with its last day of trading on an exchange scheduled for August 12, 2024.
  • Loss: TripletLoss with these parameters:
    {
        "distance_metric": "TripletDistanceMetric.COSINE",
        "triplet_margin": 0.05
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 16
  • learning_rate: 2e-06
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • bf16: True
  • dataloader_drop_last: True
  • load_best_model_at_end: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-06
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: True
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • tp_size: 0
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss Validation Loss
0.0139 10 0.0367 -
0.0279 20 0.0378 -
0.0418 30 0.0346 -
0.0557 40 0.0337 -
0.0696 50 0.0328 -
0.0836 60 0.0291 -
0.0975 70 0.0257 -
0.1114 80 0.0206 -
0.1253 90 0.0201 -
0.1393 100 0.0208 0.0132
0.1532 110 0.0167 -
0.1671 120 0.0167 -
0.1811 130 0.0156 -
0.1950 140 0.0153 -
0.2089 150 0.0125 -
0.2228 160 0.0141 -
0.2368 170 0.0153 -
0.2507 180 0.0142 -
0.2646 190 0.0095 -
0.2786 200 0.0144 0.0111
0.2925 210 0.0132 -
0.3064 220 0.0107 -
0.3203 230 0.0116 -
0.3343 240 0.0134 -
0.3482 250 0.0112 -
0.3621 260 0.0115 -
0.3760 270 0.0124 -
0.3900 280 0.0126 -
0.4039 290 0.0105 -
0.4178 300 0.0111 0.0109
0.4318 310 0.0136 -
0.4457 320 0.0123 -
0.4596 330 0.0113 -
0.4735 340 0.0125 -
0.4875 350 0.0082 -
0.5014 360 0.0102 -
0.5153 370 0.0081 -
0.5292 380 0.0115 -
0.5432 390 0.0107 -
0.5571 400 0.012 0.0106
0.5710 410 0.0094 -
0.5850 420 0.0099 -
0.5989 430 0.0105 -
0.6128 440 0.0101 -
0.6267 450 0.0099 -
0.6407 460 0.0106 -
0.6546 470 0.0099 -
0.6685 480 0.0108 -
0.6825 490 0.01 -
0.6964 500 0.0084 0.0102
0.7103 510 0.0092 -
0.7242 520 0.0084 -
0.7382 530 0.0077 -
0.7521 540 0.0096 -
0.7660 550 0.0099 -
0.7799 560 0.0103 -
0.7939 570 0.0082 -
0.8078 580 0.009 -
0.8217 590 0.0078 -
0.8357 600 0.0091 0.0104
0.8496 610 0.0088 -
0.8635 620 0.0103 -
0.8774 630 0.0109 -
0.8914 640 0.0072 -
0.9053 650 0.0084 -
0.9192 660 0.0099 -
0.9331 670 0.008 -
0.9471 680 0.0081 -
0.9610 690 0.0075 -
0.9749 700 0.0096 0.0103
0.9889 710 0.0089 -
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 4.1.0
  • Transformers: 4.51.3
  • PyTorch: 2.1.0+cu118
  • Accelerate: 1.6.0
  • Datasets: 3.5.0
  • Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

TripletLoss

@misc{hermans2017defense,
    title={In Defense of the Triplet Loss for Person Re-Identification},
    author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
    year={2017},
    eprint={1703.07737},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}