File size: 1,621 Bytes

fbea672
fffbd93
fbea672
 
bc8b9e0
fbea672
bc8b9e0
 
fffbd93
 
fbea672
fffbd93
fbea672
 
 
 
 
 
 
 
 
 
bc8b9e0
 
fbea672
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
fffbd93
 
 
fbea672
 
 
55a237f
 
c89e03d

---
license: apache-2.0
language:
  - en
  - fr
  - de
  - es
  - it
pipeline_tag: token-classification
library_name: onnx
tags:
  - onnxruntime
  - word-segmentation
  - bilstm-crf
  - text-processing
  - domain-names
---

# DKSplit

Word segmentation model for concatenated text. Split domain names, brand names, and phrases into words.

**Current Version: 0.2.3**

## Model Description

- **Architecture:** BiLSTM-CRF (384 embedding, 768 hidden, 3 layers)
- **Format:** ONNX with INT8 quantization
- **Size:** ~9MB
- **Input:** Lowercase a-z, 0-9 (max 64 characters)

## Usage

### Install
```bash
pip install dksplit
```

### Python
```python
import dksplit

dksplit.split("chatgptlogin")
# ['chatgpt', 'login']

dksplit.split_batch(["openaikey", "microsoftoffice"])
# [['openai', 'key'], ['microsoft', 'office']]
```

### Direct ONNX
```python
import onnxruntime as ort
import numpy as np

session = ort.InferenceSession("dksplit-int8.onnx")
# See GitHub for full inference code
```

## Files

- `dksplit-int8.onnx` - ONNX model (INT8 quantized)
- `dksplit.npz` - CRF parameters

## Limitations

- Input: a-z, 0-9 only
- Max length: 64 characters
- Non-Latin scripts: use Romanized form

## Links

- Website: [domainkits.com](https://domainkits.com), [ABTdomain.com](https://ABTdomain.com)
- GitHub: [github.com/ABTdomain/dksplit](https://github.com/ABTdomain/dksplit)
- PyPI: [pypi.org/project/dksplit](https://pypi.org/project/dksplit)

## License

[Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0) · Copyright 2026 ABTdomain

**Please attribute as:** DKsplit by [ABTdomain](https://abtdomain.com)