kaenova
/

headline_detector

Model card Files Files and versions

headline_detector / README.md

kaenova's picture

Update README.md

987d07d almost 3 years ago

|

history blame contribute delete

2.52 kB

	---
	language:
	- id
	---
	# headline_detector

	[![Open in Spaces](https://huggingface.co/datasets/huggingface/badges/raw/main/open-in-hf-spaces-sm.svg)](https://huggingface.co/spaces/kaenova/headline_detector_space)

	_Indonesian Headline Detection Model Repository_

	There's a [Python library](https://github.com/kaenova/headline_detector) that provides APIs for detecting headlines in textual data, especially on social media platforms such as Twitter. The library utilizes a model that has been developed and trained on a dataset of Twitter posts containing both headline and non-headline texts, with the assistance of journalism professionals to ensure the data quality.

	```sh
	$ pip install headline-detector
	```

	## Available scenario and the performance

	\| Model \| Scenario 1 \| Scenario 2 \| Scenario 3 \| Scenario 4 \| Scenario 5 \| Scenario 6 \|
	\| ------------ \| ---------- \| ---------- \| ---------- \| ---------- \| ---------- \| ---------- \|
	\| Fasttext \| 0.8766 \| 0.8714 \| 0.8793 \| 0.8714 \| 0.8714 \| 0.8661 \|
	\| CNN \| 0.9081 \| 0.9081 \| 0.8950 \| 0.8898 \| 0.8950 \| 0.8898 \|
	\| IndoBERTweet \| 0.9895 \| 0.9921 \| 0.9738 \| 0.9580 \| 0.9843 \| 0.9685 \|

	All meassured in accuracy

	### Model Throughput

	\| Model \| Throughput (± Text/seconds) \|
	\| ------------ \| --------------------------- \|
	\| IndoBERTweet \| ±1.3 \|
	\| CNN \| ±281.60 \|
	\| Fasttext \| ±2048.41 \|

	Tested on Intel i7-6700k and 32GB of RAM.

	## Usage

	Output either 0 (non-headline) and 1 (headline)

	```python
	from headline_detector import FasttextDetector, IndoBERTweetDetector, CNNDetector

	detector = FasttextDetector.load_from_scenario(1)
	data = detector.predict_text(
	[
	"nama kamu siapa?",
	"Kapolda Jatim Teddy Minahasa Dikabarkan Ditangkap Terkait Narkoba https://t.co/LD9X6VFaUR",
	]
	)
	print(data) # output: [0, 1]

	detector = CNNDetector.load_from_scenario(3)
	data = detector.predict_text(
	[
	"nama kamu siapa?",
	"Kapolda Jatim Teddy Minahasa Dikabarkan Ditangkap Terkait Narkoba https://t.co/LD9X6VFaUR",
	]
	)
	print(data) # output: [0, 1]

	detector = IndoBERTweetDetector.load_from_scenario(5)
	data = detector.predict_text(
	[
	"nama kamu siapa?",
	"Kapolda Jatim Teddy Minahasa Dikabarkan Ditangkap Terkait Narkoba https://t.co/LD9X6VFaUR",
	]
	)
	print(data) # output: [0, 1]

	# 0 is non-headline
	# 1 is headline
	```