metadata
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- dense
- generated_from_trainer
- dataset_size:369891
- loss:TripletLoss
base_model: jinaai/jina-embeddings-v3
widget:
- source_sentence: >-
Echoes in the Alley is evolving into this brooding masterpiece, and
heightening Jax's voice has me buzzing—let's iterate on a sample monologue
to make it sing with poetic rhythm!
sentences:
- >-
Sam, the lead researcher, strongly advocated in the last team meeting
for temporarily excluding the Hawaii lab's pH data from the primary
analysis until the September 15th deadline.
- >-
Maria has a strong, established working relationship with the in-house
data science team, who recently developed a proprietary lookalike
modeling tool that integrates directly with the existing ad platform.
- >-
In a previous collaboration, Taylor's roommate, cast as Jax, delivered a
standout improvised monologue during a poetry reading event that
captured the character's vulnerability without any scripted props,
earning praise from peers for its raw authenticity.
- source_sentence: >-
Can you model the critical path impact if we delay contractor onboarding
until September 1st?
sentences:
- >-
Eleanor volunteers 20 hours a week at the local animal shelter and
values community engagement much higher than maximizing every dollar of
tax savings.
- >-
The $10,000/week contractor specializes in backend database scaling, not
the UI/UX features Alex needs built before the October demo.
- >-
Alex's long-term goal is to secure Series A funding within 18 months,
which requires establishing a reputation for reliable, on-time delivery
of all milestones.
- source_sentence: I'm anxious that winter rains could delay test drives for used SUVs.
sentences:
- >-
Robert's daughter, an insurance claims adjuster, offered to take a week
off work in late November to help him thoroughly inspect and negotiate
the final purchase, as she has expertise in contracts.
- >-
Harold's long-term goal, shared with Evelyn, is to ensure their
grandchildren (Sarah's children) spend quality time with Evelyn every
summer to learn about local history, a tradition they deeply value.
- >-
Robert is actively reading reviews comparing the safety ratings of 2023
SUV models versus 2024 models regarding their performance in heavy
downpours.
- source_sentence: >-
With only six months until my deadline and grading piling up at school,
this plot hole is keeping me up at night; any ideas for weaving in the
partner's death more subtly in Book 2 without retconning?
sentences:
- >-
Liam secretly paid off Mia's outstanding student loan debt ($4,500)
three months ago as a surprise, not wanting her to worry about it
anymore.
- >-
Sofia is actively applying for a prestigious $10,000 regional arts grant
next month, which specifically funds community-focused, educational
digital media.
- >-
Maria's editor at the publishing house is a former detective novelist
who insists on psychological realism in character backstories, having
rejected an earlier submission from Maria for lacking emotional depth in
grief portrayal.
- source_sentence: >-
Preparing instructions for potential Brazilian yoga classes excites
me—could you curate a professional list of Portuguese phrases for guiding
poses and breathing exercises?
sentences:
- >-
The previous attempt at self-study failed because Liam found the
standard textbook pronunciation guide recordings to be grating and
overly formal, leading him to stop practicing after two weeks.
- >-
Rosa secretly purchased a high-end, lightweight portable folding chair
specifically designed for extended standing/sitting comfort at outdoor
events last month.
- >-
Chloe has a documented, severe anxiety disorder requiring her to
maintain a structured, predictable routine; sudden, high-stress
financial calculations or immediate high-stakes decisions trigger
significant health setbacks.
pipeline_tag: sentence-similarity
library_name: sentence-transformers
SentenceTransformer based on jinaai/jina-embeddings-v3
This is a sentence-transformers model finetuned from jinaai/jina-embeddings-v3. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: jinaai/jina-embeddings-v3
- Maximum Sequence Length: 8192 tokens
- Output Dimensionality: 1024 dimensions
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(transformer): Transformer(
(auto_model): PeftModelForFeatureExtraction(
(base_model): LoraModel(
(model): XLMRobertaLoRA(
(roberta): XLMRobertaModel(
(embeddings): XLMRobertaEmbeddings(
(word_embeddings): ParametrizedEmbedding(
250002, 1024, padding_idx=1
(parametrizations): ModuleDict(
(weight): ParametrizationList(
(0): LoRAParametrization()
)
)
)
(token_type_embeddings): ParametrizedEmbedding(
1, 1024
(parametrizations): ModuleDict(
(weight): ParametrizationList(
(0): LoRAParametrization()
)
)
)
)
(emb_drop): Dropout(p=0.1, inplace=False)
(emb_ln): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(encoder): XLMRobertaEncoder(
(layers): ModuleList(
(0-23): 24 x Block(
(mixer): MHA(
(rotary_emb): RotaryEmbedding()
(Wqkv): ParametrizedLinearResidual(
in_features=1024, out_features=3072, bias=True
(parametrizations): ModuleDict(
(weight): ParametrizationList(
(0): LoRAParametrization()
)
)
)
(inner_attn): SelfAttention(
(drop): Dropout(p=0.1, inplace=False)
)
(inner_cross_attn): CrossAttention(
(drop): Dropout(p=0.1, inplace=False)
)
(out_proj): lora.Linear(
(base_layer): ParametrizedLinear(
in_features=1024, out_features=1024, bias=True
(parametrizations): ModuleDict(
(weight): ParametrizationList(
(0): LoRAParametrization()
)
)
)
(lora_dropout): ModuleDict(
(default): Dropout(p=0.1, inplace=False)
)
(lora_A): ModuleDict(
(default): Linear(in_features=1024, out_features=32, bias=False)
)
(lora_B): ModuleDict(
(default): Linear(in_features=32, out_features=1024, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
(lora_magnitude_vector): ModuleDict()
)
)
(dropout1): Dropout(p=0.1, inplace=False)
(drop_path1): StochasticDepth(p=0.0, mode=row)
(norm1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): lora.Linear(
(base_layer): ParametrizedLinear(
in_features=1024, out_features=4096, bias=True
(parametrizations): ModuleDict(
(weight): ParametrizationList(
(0): LoRAParametrization()
)
)
)
(lora_dropout): ModuleDict(
(default): Dropout(p=0.1, inplace=False)
)
(lora_A): ModuleDict(
(default): Linear(in_features=1024, out_features=32, bias=False)
)
(lora_B): ModuleDict(
(default): Linear(in_features=32, out_features=4096, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
(lora_magnitude_vector): ModuleDict()
)
(fc2): lora.Linear(
(base_layer): ParametrizedLinear(
in_features=4096, out_features=1024, bias=True
(parametrizations): ModuleDict(
(weight): ParametrizationList(
(0): LoRAParametrization()
)
)
)
(lora_dropout): ModuleDict(
(default): Dropout(p=0.1, inplace=False)
)
(lora_A): ModuleDict(
(default): Linear(in_features=4096, out_features=32, bias=False)
)
(lora_B): ModuleDict(
(default): Linear(in_features=32, out_features=1024, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
(lora_magnitude_vector): ModuleDict()
)
)
(dropout2): Dropout(p=0.1, inplace=False)
(drop_path2): StochasticDepth(p=0.0, mode=row)
(norm2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
)
)
(pooler): XLMRobertaPooler(
(dense): ParametrizedLinear(
in_features=1024, out_features=1024, bias=True
(parametrizations): ModuleDict(
(weight): ParametrizationList(
(0): LoRAParametrization()
)
)
)
(activation): Tanh()
)
)
)
)
)
)
(pooler): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(normalizer): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("Mercity/memory-retrieval-jina-v3-lora")
# Run inference
sentences = [
'Preparing instructions for potential Brazilian yoga classes excites me—could you curate a professional list of Portuguese phrases for guiding poses and breathing exercises?',
'The previous attempt at self-study failed because Liam found the standard textbook pronunciation guide recordings to be grating and overly formal, leading him to stop practicing after two weeks.',
'Chloe has a documented, severe anxiety disorder requiring her to maintain a structured, predictable routine; sudden, high-stress financial calculations or immediate high-stakes decisions trigger significant health setbacks.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[ 1.0000, 0.8699, -0.1061],
# [ 0.8699, 1.0000, -0.1572],
# [-0.1061, -0.1572, 1.0000]])
Training Details
Training Dataset
Unnamed Dataset
- Size: 369,891 training samples
- Columns:
sentence_0,sentence_1, andsentence_2 - Approximate statistics based on the first 1000 samples:
sentence_0 sentence_1 sentence_2 type string string string details - min: 12 tokens
- mean: 36.09 tokens
- max: 81 tokens
- min: 20 tokens
- mean: 40.88 tokens
- max: 78 tokens
- min: 15 tokens
- mean: 36.83 tokens
- max: 68 tokens
- Samples:
sentence_0 sentence_1 sentence_2 To achieve sufficient relaxation by 11 PM after a demanding shift, suggest budget-conscious, non-stimulating pursuits that differ from audiobooks and suit my solo living situation.Alex has been working on mastering the art of traditional ink drawing (Sumi-e) as a meditative hobby, which requires minimal light and focus.Maria has set a personal milestone to donate 10% of her memoir's first-year royalties to a burnout recovery nonprofit, tying her publication success directly to the book's perceived authenticity and impact.I'm so pumped about this new grammar series—it's going to make such a difference for my subscribers who keep mixing up noun genders! Can you brainstorm ways to animate those common pitfalls like the -o ending myth?The beta group overwhelmingly preferred short, character-driven skits over abstract quizzes, specifically mentioning that the last tutorial that relied heavily on on-screen text overlays resulted in lower engagement.Alex previously boosted his geometry understanding on the SAT by reviewing sample test questions daily during short 30-minute sessions after school.Jamal pushes safe bets, yet deadline looms like a storm—verify this claim?Maria received an internal promotion review last week, and exceeding expectations on this presentation is the single biggest factor determining her eligibility for the Senior Manager role opening in January.Jamal is currently bogged down trying to reconcile conflicting Q3 sales data from three different regional offices, which he finds deeply frustrating. - Loss:
TripletLosswith these parameters:{ "distance_metric": "TripletDistanceMetric.COSINE", "triplet_margin": 0.5 }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy: stepsper_device_train_batch_size: 32per_device_eval_batch_size: 32num_train_epochs: 1fp16: Truemulti_dataset_batch_sampler: round_robin
All Hyperparameters
Click to expand
overwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 32per_device_eval_batch_size: 32per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1num_train_epochs: 1max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.0warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falsebf16: Falsefp16: Truefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}parallelism_config: Nonedeepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torch_fusedoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthproject: huggingfacetrackio_space_id: trackioddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsehub_revision: Nonegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters:auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: noneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseliger_kernel_config: Noneeval_use_gather_object: Falseaverage_tokens_across_devices: Trueprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: round_robinrouter_mapping: {}learning_rate_mapping: {}
Training Logs
| Epoch | Step | Training Loss |
|---|---|---|
| 0.0433 | 500 | 0.2143 |
| 0.0865 | 1000 | 0.1182 |
Framework Versions
- Python: 3.12.3
- Sentence Transformers: 5.1.2
- Transformers: 4.57.1
- PyTorch: 2.8.0+cu128
- Accelerate: 1.11.0
- Datasets: 4.4.1
- Tokenizers: 0.22.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
TripletLoss
@misc{hermans2017defense,
title={In Defense of the Triplet Loss for Person Re-Identification},
author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
year={2017},
eprint={1703.07737},
archivePrefix={arXiv},
primaryClass={cs.CV}
}