samheym
/

GerColBERT

Sentence Similarity

sentence-transformers

text-embeddings-inference

Model card Files Files and versions

samheym commited on Mar 20, 2025

Commit

9b148d4

·

verified ·

1 Parent(s): d84587b

Update README.md

Files changed (1) hide show

README.md +15 -23

README.md CHANGED Viewed

@@ -14,22 +14,24 @@ base_model:
 - deepset/gbert-base
 ---
-# GerColBERT
-This is a [PyLate](https://github.com/lightonai/pylate) model trained. It maps sentences & paragraphs to sequences of 128-dimensional dense vectors and can be used for semantic textual similarity using the MaxSim operator.
-## Model Details
-### Model Description
-- **Model Type:** PyLate model
-- **Base model:** [deepset/gbert-base](https://huggingface.co/deepset/gbert-base)
-- **Document Length:** 180 tokens
-- **Query Length:** 32 tokens
-- **Output Dimensionality:** 128 tokens
-- **Similarity Function:** MaxSim
-- **Training Dataset:** samheym/ger-dpr-collection
-- **Language:** de
-<!-- - **License:** Unknown -->
@@ -55,17 +57,7 @@ model = models.ColBERT(
-## Training Details
-### Framework Versions
-- Python: 3.12.3
-- Sentence Transformers: 3.4.1
-- PyLate: 1.1.4
-- Transformers: 4.48.2
-- PyTorch: 2.6.0+cu124
-- Accelerate: 1.4.0
-- Datasets: 2.21.0
-- Tokenizers: 0.21.0
 <!--
 ## Citation

 - deepset/gbert-base
 ---
+# Model Overview
+GerColBERT is a ColBERT-based retrieval model trained on German text. It is designed for efficient late interaction-based retrieval while maintaining high-quality ranking performance.
+Training Configuration
+- Base Model: [deepset/gbert-base](https://huggingface.co/deepset/gbert-base)
+- Training Dataset: samheym/ger-dpr-collection
+- Dataset: 10% of randomly selected triples from the final dataset
+- Vector Length: 128
+- Maximum Document Length: 256 characters
+- Batch Size: 50
+- Training Steps: 80,000
+- Gradient Accumulation: 1 step
+- Learning Rate: 5 × 10⁻⁶
+- Optimizer: AdamW
+- In-Batch Negatives: Included
 <!--
 ## Citation