samirmsallem
/

gbert-base-definition_classification

Text Classification

Eval Results (legacy)

Model card Files Files and versions

gbert-base-definition_classification / README.md

samirmsallem's picture

Update README.md

36dc731 verified 9 months ago

|

history blame contribute delete

2.46 kB

	---
	datasets:
	- samirmsallem/wiki_definitions_de_multitask
	language:
	- de
	base_model:
	- deepset/gbert-base
	pipeline_tag: text-classification
	library_name: transformers
	tags:
	- science
	- ner
	- def_extraction
	- definitions
	metrics:
	- accuracy
	model-index:
	- name: checkpoints
	results:
	- task:
	name: Text Classification
	type: text-classification
	dataset:
	name: samirmsallem/wiki_definitions_de_multitask
	type: samirmsallem/wiki_definitions_de_multitask
	metrics:
	- name: Accuracy
	type: accuracy
	value: 0.9630331753554502
	---

	## Text classification model for definition recognition in German scientific texts

	gbert-base-definition_classification is a text classification model in the scientific domain in German, finetuned from the model [gbert-base](https://huggingface.co/deepset/gbert-base).
	It was trained using a custom annotated dataset of around 10,000 training and 2,000 test examples containing definition- and non-definition-related sentences from wikipedia articles in german.
	The model was selected because it overall achieved the best score in the [NER task](https://huggingface.co/samirmsallem/gbert-base-definitions_ner#model-performance-comparision-on-wiki_definitions_de_multitask)

	The model is specifically designed to recognize and classify sentences as definition or non-definition sentences:

	\|Text Classification Tag\| Text Classification Label \| Description \|
	\| :----: \| :----: \| :----: \|
	\| 0 \| NON_DEF_SENTENCE \| Text equals a non-definitional sentence \|
	\| 1 \| DEF_SENTENCE \| Text equals a definitional sentence \|

	Training was conducted using a standard Text classification objective. The model achieves an accuracy of approximately 96% on the evaluation set.

	Here are the overall final metrics on the test dataset after 4 epochs of training:
	- Accuracy: 0.9630331753554502
	- Loss: 0.17300711572170258


	### Usage


	```
	from transformers import pipeline

	pipe = pipeline("text-classification", model="samirmsallem/gbert-base-definition_classification")

	results = pipe(['Natural Language Processing ist ein Verfahren der künstlichen Intelligenz.',
	'Rosen sind rot, Veilchen sind blau.'])
	print(results)

	# [{'label': 'DEF_SENTENCE', 'score': 0.9995753169059753}, {'label': 'NON_DEF_SENTENCE', 'score': 0.999630331993103}]
	```