whitepenguin's picture
Upload Jina v3 LoRA fine-tuned model for memory retrieval
862cde8 verified
metadata
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - dense
  - generated_from_trainer
  - dataset_size:369891
  - loss:TripletLoss
base_model: jinaai/jina-embeddings-v3
widget:
  - source_sentence: >-
      Echoes in the Alley is evolving into this brooding masterpiece, and
      heightening Jax's voice has me buzzing—let's iterate on a sample monologue
      to make it sing with poetic rhythm!
    sentences:
      - >-
        Sam, the lead researcher, strongly advocated in the last team meeting
        for temporarily excluding the Hawaii lab's pH data from the primary
        analysis until the September 15th deadline.
      - >-
        Maria has a strong, established working relationship with the in-house
        data science team, who recently developed a proprietary lookalike
        modeling tool that integrates directly with the existing ad platform.
      - >-
        In a previous collaboration, Taylor's roommate, cast as Jax, delivered a
        standout improvised monologue during a poetry reading event that
        captured the character's vulnerability without any scripted props,
        earning praise from peers for its raw authenticity.
  - source_sentence: >-
      Can you model the critical path impact if we delay contractor onboarding
      until September 1st?
    sentences:
      - >-
        Eleanor volunteers 20 hours a week at the local animal shelter and
        values community engagement much higher than maximizing every dollar of
        tax savings.
      - >-
        The $10,000/week contractor specializes in backend database scaling, not
        the UI/UX features Alex needs built before the October demo.
      - >-
        Alex's long-term goal is to secure Series A funding within 18 months,
        which requires establishing a reputation for reliable, on-time delivery
        of all milestones.
  - source_sentence: I'm anxious that winter rains could delay test drives for used SUVs.
    sentences:
      - >-
        Robert's daughter, an insurance claims adjuster, offered to take a week
        off work in late November to help him thoroughly inspect and negotiate
        the final purchase, as she has expertise in contracts.
      - >-
        Harold's long-term goal, shared with Evelyn, is to ensure their
        grandchildren (Sarah's children) spend quality time with Evelyn every
        summer to learn about local history, a tradition they deeply value.
      - >-
        Robert is actively reading reviews comparing the safety ratings of 2023
        SUV models versus 2024 models regarding their performance in heavy
        downpours.
  - source_sentence: >-
      With only six months until my deadline and grading piling up at school,
      this plot hole is keeping me up at night; any ideas for weaving in the
      partner's death more subtly in Book 2 without retconning?
    sentences:
      - >-
        Liam secretly paid off Mia's outstanding student loan debt ($4,500)
        three months ago as a surprise, not wanting her to worry about it
        anymore.
      - >-
        Sofia is actively applying for a prestigious $10,000 regional arts grant
        next month, which specifically funds community-focused, educational
        digital media.
      - >-
        Maria's editor at the publishing house is a former detective novelist
        who insists on psychological realism in character backstories, having
        rejected an earlier submission from Maria for lacking emotional depth in
        grief portrayal.
  - source_sentence: >-
      Preparing instructions for potential Brazilian yoga classes excites
      me—could you curate a professional list of Portuguese phrases for guiding
      poses and breathing exercises?
    sentences:
      - >-
        The previous attempt at self-study failed because Liam found the
        standard textbook pronunciation guide recordings to be grating and
        overly formal, leading him to stop practicing after two weeks.
      - >-
        Rosa secretly purchased a high-end, lightweight portable folding chair
        specifically designed for extended standing/sitting comfort at outdoor
        events last month.
      - >-
        Chloe has a documented, severe anxiety disorder requiring her to
        maintain a structured, predictable routine; sudden, high-stress
        financial calculations or immediate high-stakes decisions trigger
        significant health setbacks.
pipeline_tag: sentence-similarity
library_name: sentence-transformers

SentenceTransformer based on jinaai/jina-embeddings-v3

This is a sentence-transformers model finetuned from jinaai/jina-embeddings-v3. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: jinaai/jina-embeddings-v3
  • Maximum Sequence Length: 8192 tokens
  • Output Dimensionality: 1024 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (transformer): Transformer(
    (auto_model): PeftModelForFeatureExtraction(
      (base_model): LoraModel(
        (model): XLMRobertaLoRA(
          (roberta): XLMRobertaModel(
            (embeddings): XLMRobertaEmbeddings(
              (word_embeddings): ParametrizedEmbedding(
                250002, 1024, padding_idx=1
                (parametrizations): ModuleDict(
                  (weight): ParametrizationList(
                    (0): LoRAParametrization()
                  )
                )
              )
              (token_type_embeddings): ParametrizedEmbedding(
                1, 1024
                (parametrizations): ModuleDict(
                  (weight): ParametrizationList(
                    (0): LoRAParametrization()
                  )
                )
              )
            )
            (emb_drop): Dropout(p=0.1, inplace=False)
            (emb_ln): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
            (encoder): XLMRobertaEncoder(
              (layers): ModuleList(
                (0-23): 24 x Block(
                  (mixer): MHA(
                    (rotary_emb): RotaryEmbedding()
                    (Wqkv): ParametrizedLinearResidual(
                      in_features=1024, out_features=3072, bias=True
                      (parametrizations): ModuleDict(
                        (weight): ParametrizationList(
                          (0): LoRAParametrization()
                        )
                      )
                    )
                    (inner_attn): SelfAttention(
                      (drop): Dropout(p=0.1, inplace=False)
                    )
                    (inner_cross_attn): CrossAttention(
                      (drop): Dropout(p=0.1, inplace=False)
                    )
                    (out_proj): lora.Linear(
                      (base_layer): ParametrizedLinear(
                        in_features=1024, out_features=1024, bias=True
                        (parametrizations): ModuleDict(
                          (weight): ParametrizationList(
                            (0): LoRAParametrization()
                          )
                        )
                      )
                      (lora_dropout): ModuleDict(
                        (default): Dropout(p=0.1, inplace=False)
                      )
                      (lora_A): ModuleDict(
                        (default): Linear(in_features=1024, out_features=32, bias=False)
                      )
                      (lora_B): ModuleDict(
                        (default): Linear(in_features=32, out_features=1024, bias=False)
                      )
                      (lora_embedding_A): ParameterDict()
                      (lora_embedding_B): ParameterDict()
                      (lora_magnitude_vector): ModuleDict()
                    )
                  )
                  (dropout1): Dropout(p=0.1, inplace=False)
                  (drop_path1): StochasticDepth(p=0.0, mode=row)
                  (norm1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
                  (mlp): Mlp(
                    (fc1): lora.Linear(
                      (base_layer): ParametrizedLinear(
                        in_features=1024, out_features=4096, bias=True
                        (parametrizations): ModuleDict(
                          (weight): ParametrizationList(
                            (0): LoRAParametrization()
                          )
                        )
                      )
                      (lora_dropout): ModuleDict(
                        (default): Dropout(p=0.1, inplace=False)
                      )
                      (lora_A): ModuleDict(
                        (default): Linear(in_features=1024, out_features=32, bias=False)
                      )
                      (lora_B): ModuleDict(
                        (default): Linear(in_features=32, out_features=4096, bias=False)
                      )
                      (lora_embedding_A): ParameterDict()
                      (lora_embedding_B): ParameterDict()
                      (lora_magnitude_vector): ModuleDict()
                    )
                    (fc2): lora.Linear(
                      (base_layer): ParametrizedLinear(
                        in_features=4096, out_features=1024, bias=True
                        (parametrizations): ModuleDict(
                          (weight): ParametrizationList(
                            (0): LoRAParametrization()
                          )
                        )
                      )
                      (lora_dropout): ModuleDict(
                        (default): Dropout(p=0.1, inplace=False)
                      )
                      (lora_A): ModuleDict(
                        (default): Linear(in_features=4096, out_features=32, bias=False)
                      )
                      (lora_B): ModuleDict(
                        (default): Linear(in_features=32, out_features=1024, bias=False)
                      )
                      (lora_embedding_A): ParameterDict()
                      (lora_embedding_B): ParameterDict()
                      (lora_magnitude_vector): ModuleDict()
                    )
                  )
                  (dropout2): Dropout(p=0.1, inplace=False)
                  (drop_path2): StochasticDepth(p=0.0, mode=row)
                  (norm2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
                )
              )
            )
            (pooler): XLMRobertaPooler(
              (dense): ParametrizedLinear(
                in_features=1024, out_features=1024, bias=True
                (parametrizations): ModuleDict(
                  (weight): ParametrizationList(
                    (0): LoRAParametrization()
                  )
                )
              )
              (activation): Tanh()
            )
          )
        )
      )
    )
  )
  (pooler): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (normalizer): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("Mercity/memory-retrieval-jina-v3-lora")
# Run inference
sentences = [
    'Preparing instructions for potential Brazilian yoga classes excites me—could you curate a professional list of Portuguese phrases for guiding poses and breathing exercises?',
    'The previous attempt at self-study failed because Liam found the standard textbook pronunciation guide recordings to be grating and overly formal, leading him to stop practicing after two weeks.',
    'Chloe has a documented, severe anxiety disorder requiring her to maintain a structured, predictable routine; sudden, high-stress financial calculations or immediate high-stakes decisions trigger significant health setbacks.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[ 1.0000,  0.8699, -0.1061],
#         [ 0.8699,  1.0000, -0.1572],
#         [-0.1061, -0.1572,  1.0000]])

Training Details

Training Dataset

Unnamed Dataset

  • Size: 369,891 training samples
  • Columns: sentence_0, sentence_1, and sentence_2
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1 sentence_2
    type string string string
    details
    • min: 12 tokens
    • mean: 36.09 tokens
    • max: 81 tokens
    • min: 20 tokens
    • mean: 40.88 tokens
    • max: 78 tokens
    • min: 15 tokens
    • mean: 36.83 tokens
    • max: 68 tokens
  • Samples:
    sentence_0 sentence_1 sentence_2
    To achieve sufficient relaxation by 11 PM after a demanding shift, suggest budget-conscious, non-stimulating pursuits that differ from audiobooks and suit my solo living situation. Alex has been working on mastering the art of traditional ink drawing (Sumi-e) as a meditative hobby, which requires minimal light and focus. Maria has set a personal milestone to donate 10% of her memoir's first-year royalties to a burnout recovery nonprofit, tying her publication success directly to the book's perceived authenticity and impact.
    I'm so pumped about this new grammar series—it's going to make such a difference for my subscribers who keep mixing up noun genders! Can you brainstorm ways to animate those common pitfalls like the -o ending myth? The beta group overwhelmingly preferred short, character-driven skits over abstract quizzes, specifically mentioning that the last tutorial that relied heavily on on-screen text overlays resulted in lower engagement. Alex previously boosted his geometry understanding on the SAT by reviewing sample test questions daily during short 30-minute sessions after school.
    Jamal pushes safe bets, yet deadline looms like a storm—verify this claim? Maria received an internal promotion review last week, and exceeding expectations on this presentation is the single biggest factor determining her eligibility for the Senior Manager role opening in January. Jamal is currently bogged down trying to reconcile conflicting Q3 sales data from three different regional offices, which he finds deeply frustrating.
  • Loss: TripletLoss with these parameters:
    {
        "distance_metric": "TripletDistanceMetric.COSINE",
        "triplet_margin": 0.5
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • num_train_epochs: 1
  • fp16: True
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss
0.0433 500 0.2143
0.0865 1000 0.1182

Framework Versions

  • Python: 3.12.3
  • Sentence Transformers: 5.1.2
  • Transformers: 4.57.1
  • PyTorch: 2.8.0+cu128
  • Accelerate: 1.11.0
  • Datasets: 4.4.1
  • Tokenizers: 0.22.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

TripletLoss

@misc{hermans2017defense,
    title={In Defense of the Triplet Loss for Person Re-Identification},
    author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
    year={2017},
    eprint={1703.07737},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}