metadata
language:
- de
library_name: transformers
pipeline_tag: feature-extraction
tags:
- BERT
- encoder
- embeddings
- TiME
- de
- size:s
license: apache-2.0
teacher_model: FacebookAI/xlm-roberta-large
datasets:
- uonlp/CulturaX
TiME German (de, s)
Monolingual BERT-style encoder that outputs embeddings for German. Distilled from FacebookAI/xlm-roberta-large.
Specs
- language: German (de)
- size: s
- architecture: BERT encoder
- layers: 6
- hidden size: 384
- intermediate size: 1536
Usage (mean pooled embeddings)
from transformers import AutoTokenizer, AutoModel
import torch
repo = "dschulmeist/TiME-de-s"
tok = AutoTokenizer.from_pretrained(repo)
mdl = AutoModel.from_pretrained(repo)
def mean_pool(last_hidden_state, attention_mask):
mask = attention_mask.unsqueeze(-1).type_as(last_hidden_state)
return (last_hidden_state * mask).sum(1) / mask.sum(1).clamp(min=1e-9)
inputs = tok(["example sentence"], padding=True, truncation=True, return_tensors="pt")
outputs = mdl(**inputs)
emb = mean_pool(outputs.last_hidden_state, inputs['attention_mask'])
print(emb.shape)