headline_detector / README.md
kaenova's picture
Update README.md
987d07d
---
language:
- id
---
# headline_detector
[![Open in Spaces](https://huggingface.co/datasets/huggingface/badges/raw/main/open-in-hf-spaces-sm.svg)](https://huggingface.co/spaces/kaenova/headline_detector_space)
_Indonesian Headline Detection Model Repository_
There's a [Python library](https://github.com/kaenova/headline_detector) that provides APIs for detecting headlines in textual data, especially on social media platforms such as Twitter. The library utilizes a model that has been developed and trained on a dataset of Twitter posts containing both headline and non-headline texts, with the assistance of journalism professionals to ensure the data quality.
```sh
$ pip install headline-detector
```
## Available scenario and the performance
| Model | Scenario 1 | Scenario 2 | Scenario 3 | Scenario 4 | Scenario 5 | Scenario 6 |
| ------------ | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- |
| Fasttext | 0.8766 | 0.8714 | 0.8793 | 0.8714 | 0.8714 | 0.8661 |
| CNN | 0.9081 | 0.9081 | 0.8950 | 0.8898 | 0.8950 | 0.8898 |
| IndoBERTweet | 0.9895 | 0.9921 | 0.9738 | 0.9580 | 0.9843 | 0.9685 |
All meassured in accuracy
### Model Throughput
| Model | Throughput (± Text/seconds) |
| ------------ | --------------------------- |
| IndoBERTweet | ±1.3 |
| CNN | ±281.60 |
| Fasttext | ±2048.41 |
Tested on Intel i7-6700k and 32GB of RAM.
## Usage
Output either 0 (non-headline) and 1 (headline)
```python
from headline_detector import FasttextDetector, IndoBERTweetDetector, CNNDetector
detector = FasttextDetector.load_from_scenario(1)
data = detector.predict_text(
[
"nama kamu siapa?",
"Kapolda Jatim Teddy Minahasa Dikabarkan Ditangkap Terkait Narkoba https://t.co/LD9X6VFaUR",
]
)
print(data) # output: [0, 1]
detector = CNNDetector.load_from_scenario(3)
data = detector.predict_text(
[
"nama kamu siapa?",
"Kapolda Jatim Teddy Minahasa Dikabarkan Ditangkap Terkait Narkoba https://t.co/LD9X6VFaUR",
]
)
print(data) # output: [0, 1]
detector = IndoBERTweetDetector.load_from_scenario(5)
data = detector.predict_text(
[
"nama kamu siapa?",
"Kapolda Jatim Teddy Minahasa Dikabarkan Ditangkap Terkait Narkoba https://t.co/LD9X6VFaUR",
]
)
print(data) # output: [0, 1]
# 0 is non-headline
# 1 is headline
```