|
|
--- |
|
|
tags: |
|
|
- sentence-transformers |
|
|
- sentence-similarity |
|
|
- feature-extraction |
|
|
- dense |
|
|
- generated_from_trainer |
|
|
- dataset_size:309 |
|
|
- loss:MultipleNegativesRankingLoss |
|
|
base_model: sentence-transformers/all-MiniLM-L6-v2 |
|
|
widget: |
|
|
- source_sentence: Find the element that handles identifies the underlying asset when |
|
|
it is an exchange-traded fund. |
|
|
sentences: |
|
|
- '[exchangeTradedFund]: Identifies the underlying asset when it is an exchange-traded |
|
|
fund.' |
|
|
- '[mutualFund]: Identifies the class of unit issued by a fund.' |
|
|
- '[lcIssuanceFeePayment]: No description available' |
|
|
- source_sentence: Find the element that handles specifies the return payments of |
|
|
a commodity return swap. |
|
|
sentences: |
|
|
- '[consentRefused]: No description available' |
|
|
- '[commodityReturnLeg]: Specifies the return payments of a commodity return swap. |
|
|
There can be one or two return legs. In simple return swaps there is a return |
|
|
leg and an interest (a.k.a. ''fee'') leg. In the case of a outperformance swap |
|
|
there are two return legs: the return performance of two commodity underlyers |
|
|
are swapped. In the case of a fully-funded return swap there is no financing component |
|
|
and, therefore, only a single return leg is specified.' |
|
|
- '[loanTrade]: No description available' |
|
|
- source_sentence: The fpml tag for the parameters for defining the exercise period |
|
|
for a european style option together with any rules governing the notional amount |
|
|
of the underlying which can be exercised on any given exercise date and any associated |
|
|
exercise fees. |
|
|
sentences: |
|
|
- '[priceSourceDisruption]: If present indicates that the event is considered to |
|
|
have occurred if it is impossible to obtain information about the Spot Rate for |
|
|
a Valuation Date from the price source specified in the Settlement Rate Option |
|
|
that hass been agreed by the parties.' |
|
|
- '[europeanExercise]: The parameters for defining the exercise period for a European |
|
|
style option together with any rules governing the notional amount of the underlying |
|
|
which can be exercised on any given exercise date and any associated exercise |
|
|
fees.' |
|
|
- '[nonDeliverableSubstitute]: If present indicates that the obligation to pay the |
|
|
In-the-Money amount of foreign currency is replaced with an obligation to pay |
|
|
an equivalent amount in another currency.' |
|
|
- source_sentence: The fpml tag for global element representing a repo. |
|
|
sentences: |
|
|
- '[facilityPrepaymentFeePayment]: No description available' |
|
|
- '[product]: An abstract element used as a place holder for the substituting product |
|
|
elements.' |
|
|
- '[repo]: Global element representing a Repo.' |
|
|
- source_sentence: Can you give me the fpml tag for fxcurvevaluation? |
|
|
sentences: |
|
|
- '[fxCurveValuation]: No description available' |
|
|
- '[loanLegalActionStatement]: No description available' |
|
|
- '[loanAllocationSettlementDateAvailability]: No description available' |
|
|
pipeline_tag: sentence-similarity |
|
|
library_name: sentence-transformers |
|
|
--- |
|
|
|
|
|
|
|
|
# thelocalhost/fpml-semantic-model |
|
|
|
|
|
This is the fpml-semantic-model version of the sentence-transformers/all-MiniLM-L6-v2 for generating text embeddings. |
|
|
|
|
|
## Specialized Semantic Search Model for FpML XSD |
|
|
|
|
|
This model, fpml-semantic-model, is a specialized version of the base all-MiniLM-L6-v2 Sentence Transformer, fine-tuned on proprietary data derived from FpML XSD schema definitions and documentation. It is engineered to provide superior semantic relevance for financial terminology, significantly outperforming general-purpose models when searching FpML data structures. |
|
|
|
|
|
The `fpml-semantic-model` provides a robust, offline-capable utility for performing **semantic searches** across the Financial products Markup Language (FpML) XSD schema definitions. Instead of relying on exact keyword matches, this model allows users to search for complex FpML elements using natural language, helping developers and analysts quickly locate the correct XML structure for derivatives and financial transaction data. |
|
|
|
|
|
## ๐ก Model Overview |
|
|
|
|
|
This model maps complex natural language queries about financial concepts directly to the specific FpML tags and definitions within the schema. |
|
|
|
|
|
### Key Use Cases |
|
|
|
|
|
- Schema Exploration: Quickly locate the exact FpML element (e.g., tradeId, floatingRateIndex) needed for a given financial scenario. |
|
|
|
|
|
- Data Mapping: Improve accuracy when mapping regulatory or proprietary data fields to FpML standards.- |
|
|
|
|
|
- Validation: Use semantic similarity to suggest relevant documentation for validation failures. |
|
|
|
|
|
## ๐ ๏ธ Usage with fpml-semantic-search Package |
|
|
|
|
|
If you are using the official fpml-semantic-search Python package, this model is automatically loaded via the SemanticSearchModel class. |
|
|
|
|
|
``` |
|
|
from fpml_semantic_search import StructuralModel, SemanticSearchModel |
|
|
from fpml_semantic_search.fpml_structural_model import MOCK_STRUCTURAL_DATA |
|
|
|
|
|
# 1. Initialize the Structural Data (Mocking data for demonstration) |
|
|
structural_model = StructuralModel(MOCK_STRUCTURAL_DATA) |
|
|
|
|
|
# 2. Initialize the Semantic Search Model |
|
|
# NOTE: This class automatically loads the 'your_username/fpml-semantic-model' |
|
|
# from the Hugging Face Hub (as defined in the source code). |
|
|
search_model = SemanticSearchModel(structural_model.data) |
|
|
|
|
|
# 3. Perform a natural language search |
|
|
query = "What element defines the reference rate used for calculating interest payments in a swap?" |
|
|
|
|
|
results = search_model.semantic_search(query, top_k=3) |
|
|
|
|
|
print(f"\n--- Semantic Search Results for: '{query}' ---\n") |
|
|
for result in results: |
|
|
print(f"[{result['Score']}] Tag: {result['Tag Name']} | Description: {result['Description']}") |
|
|
|
|
|
# Expected Output (High relevance scores due to fine-tuning): |
|
|
# [0.9125] Tag: floatingRateIndex | Description: The benchmark rate (e.g., 'USD-LIBOR-BBA', 'EUR-EONIA-OIS') used for the floating stream. |
|
|
# [0.8850] Tag: rateTreatment | Description: Specifies how a mid-market rate is treated, such as 'mid', 'average', or 'interpolated'. |
|
|
|
|
|
``` |
|
|
--- |
|
|
|
|
|
## ๐ Installation |
|
|
|
|
|
This package requires Python (3.8+). |
|
|
|
|
|
```bash |
|
|
pip install fpml-semantic-model sentence-transformers numpy scipy |
|
|
``` |
|
|
|
|
|
Note: The sentence-transformers and numpy/scipy libraries are required for embedding the query and calculating similarity scores, respectively. |
|
|
|
|
|
## ๐ก How It Works |
|
|
|
|
|
-Embeddings: The model uses pre-calculated embeddings for FpML element names and their documentation, generated using the highly efficient all-MiniLM-L6-v2 Sentence Transformer model. These embeddings are stored locally in the package's default_embeddings.json file. |
|
|
|
|
|
-Search: When a user provides a natural language query (e.g., "details about when the interest rate will be paid"), the model embeds the query and calculates the Cosine Similarity against all stored FpML element vectors. |
|
|
|
|
|
-Results: It returns the elements with the highest semantic relevance, ranked by score. |
|
|
|
|
|
## Direct Sentence-Transformer Loading |
|
|
|
|
|
You can also load this model directly into any application using the SentenceTransformer library: |
|
|
|
|
|
``` |
|
|
from sentence_transformers import SentenceTransformer |
|
|
|
|
|
MODEL_ID = 'thelocalhost/fpml-semantic-model' |
|
|
model = SentenceTransformer(MODEL_ID) |
|
|
|
|
|
# Example: Embed a query and compare it to known FpML elements |
|
|
query_embedding = model.encode("currency of the notional amount", normalize_embeddings=True) |
|
|
``` |
|
|
|
|
|
## Model Details |
|
|
|
|
|
``` |
|
|
Attribute Value |
|
|
|
|
|
Model ID: thelocalhost/fpml-semantic-model |
|
|
Backbone Model: Fine-tuned all-MiniLM-L6-v2 (Sentence Transformers) |
|
|
Training Dataset: FpML Element Names and corresponding XSD Documentation Pairs |
|
|
Loss Function: Multiple Negatives Ranking Loss (MNR) |
|
|
Embedding Dimension: 384 |
|
|
Similarity Metric: Cosine Similarity |
|
|
Performance: Optimized for FpML-specific semantic alignment. |
|
|
|
|
|
|
|
|
Default Data Source Elements and documentation extracted from core FpML XSD files (provided via default_embeddings.json). |
|
|
Use Case Semantic indexing and retrieval, FpML schema exploration. |
|
|
``` |
|
|
|
|
|
## ๐ Training Data Generation |
|
|
|
|
|
The training dataset was generated using the following methodology: |
|
|
|
|
|
1. Corpus: All element names and their full XSD documentation strings were extracted from the core FpML schema files. |
|
|
|
|
|
2. Positive Pairs: A positive pair was defined as (Element Name, Documentation String). |
|
|
|
|
|
Example: (tradeId, A unique identifier assigned to the trade by one of the parties.) |
|
|
|
|
|
3. Fine-Tuning: The model was trained to ensure the vector distance between these positive pairs was minimized, thereby aligning the vector space specifically toward FpML concepts. |
|
|
|