Danish Punctuation & Capitalization Restoration (ONNX)

ONNX conversion of Alvenir/bert-punct-restoration-da for use with transformers.js and other ONNX runtimes.

Model Details

  • Architecture: BertForTokenClassification (based on Maltehb/danish-bert-botxo)
  • Parameters: ~110M
  • Task: Token classification with 15 labels encoding punctuation + capitalization
  • Language: Danish (da)

Labels

Each label has format <punctuation><case>:

Label Punctuation Capitalization
OO None lowercase
OU None Uppercase
.O Period lowercase
.U Period Uppercase
,O Comma lowercase
,U Comma Uppercase
?O Question mark lowercase
?U Question mark Uppercase
!O Exclamation lowercase
!U Exclamation Uppercase
:O Colon lowercase
:U Colon Uppercase
;O Semicolon lowercase
'O Apostrophe lowercase
-O Hyphen lowercase

Files

File Precision Size
onnx/model.onnx FP32 ~440 MB
onnx/model_fp16.onnx FP16 ~220 MB
onnx/model_quantized.onnx INT8 ~111 MB

Usage with transformers.js

import { AutoTokenizer, BertForTokenClassification } from "@huggingface/transformers";

const tokenizer = await AutoTokenizer.from_pretrained("hlevring/bert-punct-restoration-da-onnx");
const model = await BertForTokenClassification.from_pretrained("hlevring/bert-punct-restoration-da-onnx", {
  dtype: "q8",
});

const encoded = tokenizer("hej og velkommen til linket horoskop", { return_tensors: "pt" });
const output = await model(encoded);
// output.logits: [batch, seq_len, 15] - argmax to get label IDs

Available dtype options

  • "fp32" - Full precision (440 MB)
  • "fp16" - Half precision (220 MB)
  • "q8" - INT8 quantized (111 MB)

Original Model

This is an ONNX export of Alvenir/bert-punct-restoration-da, part of the punctfix library.

Downloads last month
155
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for hlevring/bert-punct-restoration-da-onnx

Quantized
(1)
this model