samheym
/

GerColBERT

Sentence Similarity

sentence-transformers

text-embeddings-inference

Model card Files Files and versions

GerColBERT / README.md

samheym's picture

Update README.md

e9f30c8 verified 10 months ago

|

history blame contribute delete

1.78 kB

	---
	language:
	- de
	tags:
	- ColBERT
	- PyLate
	- sentence-transformers
	- sentence-similarity
	pipeline_tag: sentence-similarity
	library_name: PyLate
	datasets:
	- samheym/ger-dpr-collection
	base_model:
	- deepset/gbert-base
	---

	# Model Overview

	GerColBERT is a ColBERT-based retrieval model trained on German text. It is designed for efficient late interaction-based retrieval while maintaining high-quality ranking performance.
	Training Configuration

	- Base Model: [deepset/gbert-base](https://huggingface.co/deepset/gbert-base)
	- Training Dataset: samheym/ger-dpr-collection
	- Dataset: 10% of randomly selected triples from the final dataset
	- Vector Length: 128
	- Maximum Document Length: 256 Tokens
	- Batch Size: 50
	- Training Steps: 80,000
	- Gradient Accumulation: 1 step
	- Learning Rate: 5 × 10⁻⁶
	- Optimizer: AdamW
	- In-Batch Negatives: Included





	## Usage
	First install the PyLate library:

	```bash
	pip install -U pylate
	```

	### Retrieval

	PyLate provides a streamlined interface to index and retrieve documents using ColBERT models. The index leverages the Voyager HNSW index to efficiently handle document embeddings and enable fast retrieval.

	```python
	from pylate import indexes, models, retrieve

	# Step 1: Load the ColBERT model
	model = models.ColBERT(
	model_name_or_path=samheym/GerColBERT,
	)
	```





	<!--
	## Citation

	### BibTeX

	<!--
	## Glossary

	Clearly define terms in order to be accessible across audiences.
	-->

	<!--
	## Model Card Authors

	Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.
	-->

	<!--
	## Model Card Contact

	Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.
	-->