rausch
/

de-t5-sci-transfer-init

Text Generation

text2text-generation

text-generation-inference

Model card Files Files and versions

de-t5-sci-transfer-init / README.md

rausch's picture

Create README.md

79cddb6 verified 3 months ago

|

history blame contribute delete

1.48 kB

	---
	language:
	- de
	license: mit
	library_name: transformers
	pipeline_tag: text-generation
	tags:
	- t5
	- german
	- wechsel
	- cross-lingual
	datasets:
	- unpaywall-scientific
	---

	# DE-T5-Sci-Transfer-Init

	WECHSEL-initialized checkpoint: English EN-T5-Sci weights + German tokenizer (`GermanT5/t5-efficient-gc4-german-base-nl36`) aligned using WECHSEL (static embeddings + bilingual dictionary). No additional German training after transfer. Folder includes `transfer_metadata.pt` with alignment diagnostics.

	## Model Details
	- Embedding init: Orthogonal Procrustes map (fastText n-gram embeddings) + temperature-weighted mixtures (k-nearest neighbors)
	- Special tokens: `<extra_id_0..99>` aligned, sentinel behavior preserved
	- Tokenizer: GermanT5 SentencePiece (files bundled here)

	## Evaluation (Global-MMLU, zero-shot)
	\| Metric \| EN \| DE \|
	\| --- \| --- \| --- \|
	\| Overall accuracy \| 0.2434 \| 0.2463 \|
	\| Humanities \| 0.2485 \| 0.2559 \|
	\| STEM \| 0.2391 \| 0.2445 \|
	\| Social Sciences \| 0.2317 \| 0.2307 \|
	\| Other \| 0.2517 \| 0.2491 \|

	This demonstrates immediate cross-lingual transfer without any German gradient steps.

	## Intended Use
	Starting point for German continued pretraining or fine-tuning where English scientific knowledge should be retained but a German tokenizer is required.

	## Limitations
	- No German data exposure beyond embedding alignment; you should run additional continued pretraining (see next model) for best performance.
	- Still limited to 512-token context.