mini-gte / README.md

prdev

Update README.md

934eade verified 12 months ago

preview code

raw

history blame

3.46 kB

metadata

tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
base_model: distilbert/distilbert-base-uncased
model-index:
  - name: prdev/mini-gte
    results:
      - dataset:
          config: en
          name: MTEB AmazonCounterfactualClassification (en)
          revision: e8379541af4e31359cca9fbcf4b00f2671dba205
          split: test
          type: mteb/amazon_counterfactual
        metrics:
          - type: accuracy
            value: 74.8955
          - type: f1
            value: 68.84209999999999
          - type: f1_weighted
            value: 77.1819
          - type: ap
            value: 37.731500000000004
          - type: ap_weighted
            value: 37.731500000000004
          - type: main_score
            value: 74.8955
        task:
          type: Classification
pipeline_tag: sentence-similarity
library_name: sentence-transformers

Mini-GTE

This is a distillbert-based model trained from GTE-base. It can be used as a faster query encoder for the GTE series or as a standalone unit (MTEB scores are for standalone).

Model Details

Model Description

Model Type: Sentence Transformer
Base model: distilbert/distilbert-base-uncased
Maximum Sequence Length: 512 tokens
Output Dimensionality: 768 dimensions
Similarity Function: Cosine Similarity

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'The weather is lovely today.',
    "It's so sunny outside!",
    'He drove to the stadium.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Framework Versions

Python: 3.10.12
Sentence Transformers: 3.3.1
Transformers: 4.48.0.dev0
PyTorch: 2.1.0a0+32f93b1
Accelerate: 1.2.0
Datasets: 2.21.0
Tokenizers: 0.21.0

prdev
/

mini-gte