File size: 1,621 Bytes
fbea672 fffbd93 fbea672 bc8b9e0 fbea672 bc8b9e0 fffbd93 fbea672 fffbd93 fbea672 bc8b9e0 fbea672 fffbd93 fbea672 55a237f c89e03d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 |
---
license: apache-2.0
language:
- en
- fr
- de
- es
- it
pipeline_tag: token-classification
library_name: onnx
tags:
- onnxruntime
- word-segmentation
- bilstm-crf
- text-processing
- domain-names
---
# DKSplit
Word segmentation model for concatenated text. Split domain names, brand names, and phrases into words.
**Current Version: 0.2.3**
## Model Description
- **Architecture:** BiLSTM-CRF (384 embedding, 768 hidden, 3 layers)
- **Format:** ONNX with INT8 quantization
- **Size:** ~9MB
- **Input:** Lowercase a-z, 0-9 (max 64 characters)
## Usage
### Install
```bash
pip install dksplit
```
### Python
```python
import dksplit
dksplit.split("chatgptlogin")
# ['chatgpt', 'login']
dksplit.split_batch(["openaikey", "microsoftoffice"])
# [['openai', 'key'], ['microsoft', 'office']]
```
### Direct ONNX
```python
import onnxruntime as ort
import numpy as np
session = ort.InferenceSession("dksplit-int8.onnx")
# See GitHub for full inference code
```
## Files
- `dksplit-int8.onnx` - ONNX model (INT8 quantized)
- `dksplit.npz` - CRF parameters
## Limitations
- Input: a-z, 0-9 only
- Max length: 64 characters
- Non-Latin scripts: use Romanized form
## Links
- Website: [domainkits.com](https://domainkits.com), [ABTdomain.com](https://ABTdomain.com)
- GitHub: [github.com/ABTdomain/dksplit](https://github.com/ABTdomain/dksplit)
- PyPI: [pypi.org/project/dksplit](https://pypi.org/project/dksplit)
## License
[Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0) · Copyright 2026 ABTdomain
**Please attribute as:** DKsplit by [ABTdomain](https://abtdomain.com) |