Danish Punctuation & Capitalization Restoration (ONNX)
ONNX conversion of Alvenir/bert-punct-restoration-da for use with transformers.js and other ONNX runtimes.
Model Details
- Architecture: BertForTokenClassification (based on Maltehb/danish-bert-botxo)
- Parameters: ~110M
- Task: Token classification with 15 labels encoding punctuation + capitalization
- Language: Danish (da)
Labels
Each label has format <punctuation><case>:
| Label | Punctuation | Capitalization |
|---|---|---|
| OO | None | lowercase |
| OU | None | Uppercase |
| .O | Period | lowercase |
| .U | Period | Uppercase |
| ,O | Comma | lowercase |
| ,U | Comma | Uppercase |
| ?O | Question mark | lowercase |
| ?U | Question mark | Uppercase |
| !O | Exclamation | lowercase |
| !U | Exclamation | Uppercase |
| :O | Colon | lowercase |
| :U | Colon | Uppercase |
| ;O | Semicolon | lowercase |
| 'O | Apostrophe | lowercase |
| -O | Hyphen | lowercase |
Files
| File | Precision | Size |
|---|---|---|
onnx/model.onnx |
FP32 | ~440 MB |
onnx/model_fp16.onnx |
FP16 | ~220 MB |
onnx/model_quantized.onnx |
INT8 | ~111 MB |
Usage with transformers.js
import { AutoTokenizer, BertForTokenClassification } from "@huggingface/transformers";
const tokenizer = await AutoTokenizer.from_pretrained("hlevring/bert-punct-restoration-da-onnx");
const model = await BertForTokenClassification.from_pretrained("hlevring/bert-punct-restoration-da-onnx", {
dtype: "q8",
});
const encoded = tokenizer("hej og velkommen til linket horoskop", { return_tensors: "pt" });
const output = await model(encoded);
// output.logits: [batch, seq_len, 15] - argmax to get label IDs
Available dtype options
"fp32"- Full precision (440 MB)"fp16"- Half precision (220 MB)"q8"- INT8 quantized (111 MB)
Original Model
This is an ONNX export of Alvenir/bert-punct-restoration-da, part of the punctfix library.
- Downloads last month
- 155
Model tree for hlevring/bert-punct-restoration-da-onnx
Base model
Alvenir/bert-punct-restoration-da