PaulCamacho
/

mobanon-models

Token Classification

named-entity-recognition

Model card Files Files and versions

mobanon-models / README.md

PaulCamacho's picture

Upload README.md with huggingface_hub

3dcbf14 verified 6 days ago

|

history blame contribute delete

2.75 kB

	---
	language:
	- de
	library_name: tflite
	tags:
	- named-entity-recognition
	- ner
	- german
	- tflite
	- on-device
	- mobile
	- android
	- ios
	datasets:
	- GermanEval/germeval_14
	base_model: deepset/gelectra-large
	pipeline_tag: token-classification
	license: mit
	---

	# MobAnon NER Model

	German Named Entity Recognition model for the [MobAnon](https://github.com/jurasoft/JURA-KI-Anonymer-Mobile) document anonymization app. Fine-tuned from [deepset/gelectra-large](https://huggingface.co/deepset/gelectra-large) on [GermEval14](https://huggingface.co/datasets/GermanEval/germeval_14) for on-device inference.

	## Model Details

	\| Property \| Value \|
	\|----------\|-------\|
	\| Base model \| deepset/gelectra-large \|
	\| Training data \| GermEval14 (German NER) \|
	\| Format \| TensorFlow Lite (float16 quantized) \|
	\| Size \| ~638 MB \|
	\| Test F1 \| ~87-89% \|
	\| Max sequence length \| 128 tokens \|

	## Entity Types

	The model detects four semantic entity types using BIO tagging:

	\| Entity \| Examples \|
	\|--------\|----------\|
	\| PERSON \| Max Mustermann, Dr. Schmidt \|
	\| ORGANIZATION \| Deutsche Bank, Bundesgerichtshof \|
	\| LOCATION \| Frankfurt, Deutschland, Berliner Str. \|
	\| MISC \| Events, dates, other named entities \|

	MobAnon supplements these with regex-based detection for structured entities (email, phone, IBAN, identifiers).

	## Usage

	This model is downloaded automatically by the MobAnon app on first use. No manual setup required.

	### Direct download

	```bash
	# Via huggingface-cli
	huggingface-cli download PaulCamacho/mobanon-models deepseek.tflite

	# Via URL
	wget https://huggingface.co/PaulCamacho/mobanon-models/resolve/main/deepseek.tflite
	```

	### Input/Output Specification

	\| Tensor \| Shape \| Type \| Description \|
	\|--------\|-------\|------\|-------------\|
	\| `input_ids` \| [1, 128] \| int32 \| Tokenized input IDs \|
	\| `attention_mask` \| [1, 128] \| int32 \| Attention mask \|
	\| `logits` \| [1, 128, 9] \| float32 \| Per-token logits for 9 BIO labels \|

	### Labels

	\| Index \| Label \| Entity \|
	\|-------\|-------\|--------\|
	\| 0 \| O \| Outside \|
	\| 1 \| B-PER \| Begin Person \|
	\| 2 \| I-PER \| Inside Person \|
	\| 3 \| B-ORG \| Begin Organization \|
	\| 4 \| I-ORG \| Inside Organization \|
	\| 5 \| B-LOC \| Begin Location \|
	\| 6 \| I-LOC \| Inside Location \|
	\| 7 \| B-MISC \| Begin Miscellaneous \|
	\| 8 \| I-MISC \| Inside Miscellaneous \|

	## Training

	```bash
	cd base_model
	python train_ner.py --epochs 3 --batch-size 16 --fp16
	python export_to_onnx.py --static-shapes
	python convert_to_tflite.py --quantize float16
	```

	See the [base_model README](https://github.com/jurasoft/JURA-KI-Anonymer-Mobile/tree/main/base_model) for the full training and conversion pipeline.

	## License

	MIT