How to use from the
Use from the
sentence-transformers library
from pylate import models

queries = [
    "Which planet is known as the Red Planet?",
    "What is the largest planet in our solar system?",
]

documents = [
    ["Mars is the Red Planet.", "Venus is Earth's twin."],
    ["Jupiter is the largest planet.", "Saturn has rings."],
]

model = models.ColBERT(model_name_or_path="samheym/GerColBERT")

queries_emb = model.encode(queries, is_query=True)
docs_emb = model.encode(documents, is_query=False)

Model Overview

GerColBERT is a ColBERT-based retrieval model trained on German text. It is designed for efficient late interaction-based retrieval while maintaining high-quality ranking performance. Training Configuration

  • Base Model: deepset/gbert-base
  • Training Dataset: samheym/ger-dpr-collection
  • Dataset: 10% of randomly selected triples from the final dataset
  • Vector Length: 128
  • Maximum Document Length: 256 Tokens
  • Batch Size: 50
  • Training Steps: 80,000
  • Gradient Accumulation: 1 step
  • Learning Rate: 5 × 10⁻⁶
  • Optimizer: AdamW
  • In-Batch Negatives: Included

Usage

First install the PyLate library:

pip install -U pylate

Retrieval

PyLate provides a streamlined interface to index and retrieve documents using ColBERT models. The index leverages the Voyager HNSW index to efficiently handle document embeddings and enable fast retrieval.

from pylate import indexes, models, retrieve

# Step 1: Load the ColBERT model
model = models.ColBERT(
    model_name_or_path=samheym/GerColBERT,
)
Downloads last month
229
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for samheym/GerColBERT

Finetuned
(78)
this model

Dataset used to train samheym/GerColBERT