File size: 2,751 Bytes
3dcbf14
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
---

language:
- de
library_name: tflite
tags:
- named-entity-recognition
- ner
- german
- tflite
- on-device
- mobile
- android
- ios
datasets:
- GermanEval/germeval_14
base_model: deepset/gelectra-large
pipeline_tag: token-classification
license: mit
---


# MobAnon NER Model

German Named Entity Recognition model for the [MobAnon](https://github.com/jurasoft/JURA-KI-Anonymer-Mobile) document anonymization app. Fine-tuned from [deepset/gelectra-large](https://huggingface.co/deepset/gelectra-large) on [GermEval14](https://huggingface.co/datasets/GermanEval/germeval_14) for on-device inference.

## Model Details

| Property | Value |
|----------|-------|
| Base model | deepset/gelectra-large |
| Training data | GermEval14 (German NER) |
| Format | TensorFlow Lite (float16 quantized) |
| Size | ~638 MB |
| Test F1 | ~87-89% |
| Max sequence length | 128 tokens |

## Entity Types

The model detects four semantic entity types using BIO tagging:

| Entity | Examples |
|--------|----------|
| **PERSON** | Max Mustermann, Dr. Schmidt |
| **ORGANIZATION** | Deutsche Bank, Bundesgerichtshof |
| **LOCATION** | Frankfurt, Deutschland, Berliner Str. |
| **MISC** | Events, dates, other named entities |

MobAnon supplements these with regex-based detection for structured entities (email, phone, IBAN, identifiers).

## Usage

This model is downloaded automatically by the MobAnon app on first use. No manual setup required.

### Direct download

```bash

# Via huggingface-cli

huggingface-cli download PaulCamacho/mobanon-models deepseek.tflite



# Via URL

wget https://huggingface.co/PaulCamacho/mobanon-models/resolve/main/deepseek.tflite

```

### Input/Output Specification

| Tensor | Shape | Type | Description |
|--------|-------|------|-------------|
| `input_ids` | [1, 128] | int32 | Tokenized input IDs |
| `attention_mask` | [1, 128] | int32 | Attention mask |
| `logits` | [1, 128, 9] | float32 | Per-token logits for 9 BIO labels |

### Labels

| Index | Label | Entity |
|-------|-------|--------|
| 0 | O | Outside |
| 1 | B-PER | Begin Person |
| 2 | I-PER | Inside Person |
| 3 | B-ORG | Begin Organization |
| 4 | I-ORG | Inside Organization |
| 5 | B-LOC | Begin Location |
| 6 | I-LOC | Inside Location |
| 7 | B-MISC | Begin Miscellaneous |
| 8 | I-MISC | Inside Miscellaneous |

## Training

```bash

cd base_model

python train_ner.py --epochs 3 --batch-size 16 --fp16

python export_to_onnx.py --static-shapes

python convert_to_tflite.py --quantize float16

```

See the [base_model README](https://github.com/jurasoft/JURA-KI-Anonymer-Mobile/tree/main/base_model) for the full training and conversion pipeline.

## License

MIT