|
|
--- |
|
|
language: |
|
|
- de |
|
|
tags: |
|
|
- ColBERT |
|
|
- PyLate |
|
|
- sentence-transformers |
|
|
- sentence-similarity |
|
|
pipeline_tag: sentence-similarity |
|
|
library_name: PyLate |
|
|
datasets: |
|
|
- samheym/ger-dpr-collection |
|
|
base_model: |
|
|
- deepset/gbert-base |
|
|
--- |
|
|
|
|
|
# Model Overview |
|
|
|
|
|
GerColBERT is a ColBERT-based retrieval model trained on German text. It is designed for efficient late interaction-based retrieval while maintaining high-quality ranking performance. |
|
|
Training Configuration |
|
|
|
|
|
- Base Model: [deepset/gbert-base](https://huggingface.co/deepset/gbert-base) |
|
|
- Training Dataset: samheym/ger-dpr-collection |
|
|
- Dataset: 10% of randomly selected triples from the final dataset |
|
|
- Vector Length: 128 |
|
|
- Maximum Document Length: 256 Tokens |
|
|
- Batch Size: 50 |
|
|
- Training Steps: 80,000 |
|
|
- Gradient Accumulation: 1 step |
|
|
- Learning Rate: 5 × 10⁻⁶ |
|
|
- Optimizer: AdamW |
|
|
- In-Batch Negatives: Included |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## Usage |
|
|
First install the PyLate library: |
|
|
|
|
|
```bash |
|
|
pip install -U pylate |
|
|
``` |
|
|
|
|
|
### Retrieval |
|
|
|
|
|
PyLate provides a streamlined interface to index and retrieve documents using ColBERT models. The index leverages the Voyager HNSW index to efficiently handle document embeddings and enable fast retrieval. |
|
|
|
|
|
```python |
|
|
from pylate import indexes, models, retrieve |
|
|
|
|
|
# Step 1: Load the ColBERT model |
|
|
model = models.ColBERT( |
|
|
model_name_or_path=samheym/GerColBERT, |
|
|
) |
|
|
``` |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<!-- |
|
|
## Citation |
|
|
|
|
|
### BibTeX |
|
|
|
|
|
<!-- |
|
|
## Glossary |
|
|
|
|
|
*Clearly define terms in order to be accessible across audiences.* |
|
|
--> |
|
|
|
|
|
<!-- |
|
|
## Model Card Authors |
|
|
|
|
|
*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.* |
|
|
--> |
|
|
|
|
|
<!-- |
|
|
## Model Card Contact |
|
|
|
|
|
*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.* |
|
|
--> |