|
|
--- |
|
|
license: mit |
|
|
language: |
|
|
- fi |
|
|
metrics: |
|
|
- f1 |
|
|
- precision |
|
|
- recall |
|
|
- accuracy |
|
|
base_model: |
|
|
- google-bert/bert-base-uncased |
|
|
- TurkuNLP/bert-base-finnish-cased-v1 |
|
|
pipeline_tag: text-classification |
|
|
tags: |
|
|
- classification |
|
|
- news |
|
|
--- |
|
|
# News Relevancy Classifiers |
|
|
|
|
|
## FinBERT-ft-v3 |
|
|
|
|
|
 |
|
|
|
|
|
### Model Description |
|
|
- **Purpose**: This model is trained for a specific task in research, it is not a commmercial product and should not be used in for-profit. |
|
|
- **Architecture**: `bert-base-finnish-cased-v1` |
|
|
- **Fine-tuning task**: Four-class Finnish news-headline relevancy classification |
|
|
- **Dataset**: ~225 Finnish headlines (2024β2025) manually labeled into: |
|
|
- 0 β Not Relevant |
|
|
- 1 β Least Relevant |
|
|
- 2 β Highly Relevant |
|
|
- 3 β Most Relevant |
|
|
- **HF Repo**: [`cloud0day3/finbert-ft-v3`](https://huggingface.co/cloud0day3/finbert-ft-v3) (latest v4 checkpoint, 6 June 2025) |
|
|
- **Date Trained**: 2025-06-06 |
|
|
|
|
|
#### Model Inputs |
|
|
|
|
|
- A raw Finnish headline (string), truncated/padded to 96 tokens. |
|
|
- Tokenization handled by the bundled `vocab.txt` + `tokenizer_config.json` + `special_tokens_map.json`. |
|
|
|
|
|
#### Model Outputs |
|
|
|
|
|
- A single integer label (0β3). Mapped to human-readable categories: |
|
|
```python |
|
|
LABELS = { |
|
|
0: "Not Relevant", |
|
|
1: "Least Relevant", |
|
|
2: "Highly Relevant", |
|
|
3: "Most Relevant" |
|
|
} |
|
|
|
|
|
|
|
|
#### Intended Use |
|
|
- **Primary**: Automatically assign a relevancy score to Finnish news headlines so that downstream pipelines (e.g., filtering, ranking) can operate without manual triage. |
|
|
|
|
|
#### Examples of use: |
|
|
|
|
|
- Pre-filtering a news aggregation feed. |
|
|
|
|
|
- Prioritizing headlines for editorial review. |
|
|
|
|
|
- Input to summarization/retrieval pipelines. |
|
|
|
|
|
#### Out-of-Scope Uses |
|
|
- Any non-Finnish text (e.g., English, Swedish). |
|
|
|
|
|
- Multi-sentence inputs or full articles (this model is tuned on single-sentence headlines). |
|
|
|
|
|
- Tasks other than relevancy (e.g., sentiment analysis, topic modeling). |
|
|
|
|
|
- High-risk decision making without human oversight (e.g., emergency alerts). |