Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper • 1908.10084 • Published • 13
How to use lufercho/my-finetuned-sentence-bert with sentence-transformers:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("lufercho/my-finetuned-sentence-bert")
sentences = [
"Auto-WEKA: Combined Selection and Hyperparameter Optimization of\n Classification Algorithms",
" It has been a long time, since data mining technologies have made their ways\nto the field of data management. Classification is one of the most important\ndata mining tasks for label prediction, categorization of objects into groups,\nadvertisement and data management. In this paper, we focus on the standard\nclassification problem which is predicting unknown labels in Euclidean space.\nMost efforts in Machine Learning communities are devoted to methods that use\nprobabilistic algorithms which are heavy on Calculus and Linear Algebra. Most\nof these techniques have scalability issues for big data, and are hardly\nparallelizable if they are to maintain their high accuracies in their standard\nform. Sampling is a new direction for improving scalability, using many small\nparallel classifiers. In this paper, rather than conventional sampling methods,\nwe focus on a discrete classification algorithm with O(n) expected running\ntime. Our approach performs a similar task as sampling methods. However, we use\ncolumn-wise sampling of data, rather than the row-wise sampling used in the\nliterature. In either case, our algorithm is completely deterministic. Our\nalgorithm, proposes a way of combining 2D convex hulls in order to achieve high\nclassification accuracy as well as scalability in the same time. First, we\nthoroughly describe and prove our O(n) algorithm for finding the convex hull of\na point set in 2D. Then, we show with experiments our classifier model built\nbased on this idea is very competitive compared with existing sophisticated\nclassification algorithms included in commercial statistical applications such\nas MATLAB.\n",
" Many different machine learning algorithms exist; taking into account each\nalgorithm's hyperparameters, there is a staggeringly large number of possible\nalternatives overall. We consider the problem of simultaneously selecting a\nlearning algorithm and setting its hyperparameters, going beyond previous work\nthat addresses these issues in isolation. We show that this problem can be\naddressed by a fully automated approach, leveraging recent innovations in\nBayesian optimization. Specifically, we consider a wide range of feature\nselection techniques (combining 3 search and 8 evaluator methods) and all\nclassification approaches implemented in WEKA, spanning 2 ensemble methods, 10\nmeta-methods, 27 base classifiers, and hyperparameter settings for each\nclassifier. On each of 21 popular datasets from the UCI repository, the KDD Cup\n09, variants of the MNIST dataset and CIFAR-10, we show classification\nperformance often much better than using standard selection/hyperparameter\noptimization methods. We hope that our approach will help non-expert users to\nmore effectively identify machine learning algorithms and hyperparameter\nsettings appropriate to their applications, and hence to achieve improved\nperformance.\n",
" Nonnegative matrix factorization (NMF) has become a ubiquitous tool for data\nanalysis. An important variant is the sparse NMF problem which arises when we\nexplicitly require the learnt features to be sparse. A natural measure of\nsparsity is the L$_0$ norm, however its optimization is NP-hard. Mixed norms,\nsuch as L$_1$/L$_2$ measure, have been shown to model sparsity robustly, based\non intuitive attributes that such measures need to satisfy. This is in contrast\nto computationally cheaper alternatives such as the plain L$_1$ norm. However,\npresent algorithms designed for optimizing the mixed norm L$_1$/L$_2$ are slow\nand other formulations for sparse NMF have been proposed such as those based on\nL$_1$ and L$_0$ norms. Our proposed algorithm allows us to solve the mixed norm\nsparsity constraints while not sacrificing computation time. We present\nexperimental evidence on real-world datasets that shows our new algorithm\nperforms an order of magnitude faster compared to the current state-of-the-art\nsolvers optimizing the mixed norm and is suitable for large-scale datasets.\n"
]
embeddings = model.encode(sentences)
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [4, 4]This is a sentence-transformers model finetuned from lufercho/my-finetuned-bert-mlm. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("lufercho/my-finetuned-sentence-bert")
# Run inference
sentences = [
'Maximin affinity learning of image segmentation',
' Images can be segmented by first using a classifier to predict an affinity\ngraph that reflects the degree to which image pixels must be grouped together\nand then partitioning the graph to yield a segmentation. Machine learning has\nbeen applied to the affinity classifier to produce affinity graphs that are\ngood in the sense of minimizing edge misclassification rates. However, this\nerror measure is only indirectly related to the quality of segmentations\nproduced by ultimately partitioning the affinity graph. We present the first\nmachine learning algorithm for training a classifier to produce affinity graphs\nthat are good in the sense of producing segmentations that directly minimize\nthe Rand index, a well known segmentation performance measure. The Rand index\nmeasures segmentation performance by quantifying the classification of the\nconnectivity of image pixel pairs after segmentation. By using the simple graph\npartitioning algorithm of finding the connected components of the thresholded\naffinity graph, we are able to train an affinity classifier to directly\nminimize the Rand index of segmentations resulting from the graph partitioning.\nOur learning algorithm corresponds to the learning of maximin affinities\nbetween image pixel pairs, which are predictive of the pixel-pair connectivity.\n',
' Changes in the UK electricity market mean that domestic users will be\nrequired to modify their usage behaviour in order that supplies can be\nmaintained. Clustering allows usage profiles collected at the household level\nto be clustered into groups and assigned a stereotypical profile which can be\nused to target marketing campaigns. Fuzzy C Means clustering extends this by\nallowing each household to be a member of many groups and hence provides the\nopportunity to make personalised offers to the household dependent on their\ndegree of membership of each group. In addition, feedback can be provided on\nhow user\'s changing behaviour is moving them towards more "green" or cost\neffective stereotypical usage.\n',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
sentence_0, sentence_1, and sentence_2| sentence_0 | sentence_1 | sentence_2 | |
|---|---|---|---|
| type | string | string | string |
| details |
|
|
|
| sentence_0 | sentence_1 | sentence_2 |
|---|---|---|
Clustering with Transitive Distance and K-Means Duality |
Recent spectral clustering methods are a propular and powerful technique for |
We show that the log-likelihood of several probabilistic graphical models is |
Clustering Dynamic Web Usage Data |
Most classification methods are based on the assumption that data conforms to |
Exponential family extensions of principal component analysis (EPCA) have |
Trading USDCHF filtered by Gold dynamics via HMM coupling |
We devise a USDCHF trading strategy using the dynamics of gold as a filter. |
Most existing machine learning classifiers are highly vulnerable to |
TripletLoss with these parameters:{
"distance_metric": "TripletDistanceMetric.EUCLIDEAN",
"triplet_margin": 5
}
per_device_train_batch_size: 16per_device_eval_batch_size: 16num_train_epochs: 2multi_dataset_batch_sampler: round_robinoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: noprediction_loss_only: Trueper_device_train_batch_size: 16per_device_eval_batch_size: 16per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1num_train_epochs: 2max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.0warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Falsehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: round_robin| Epoch | Step | Training Loss |
|---|---|---|
| 1.5974 | 500 | 0.8647 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{hermans2017defense,
title={In Defense of the Triplet Loss for Person Re-Identification},
author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
year={2017},
eprint={1703.07737},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
Base model
lufercho/my-finetuned-bert-mlm