cloud0day3 commited on
Commit
d5b2039
·
verified ·
1 Parent(s): 7f1776d

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +62 -0
README.md ADDED
@@ -0,0 +1,62 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ base_model:
6
+ - google-bert/bert-base-uncased
7
+ pipeline_tag: text-classification
8
+ ---
9
+ # News Relevancy Classifiers
10
+
11
+ ## bert-ft-v2
12
+
13
+ ![BERTft Badge](https://img.shields.io/badge/Model-BERT--ft--v2-blue)
14
+
15
+ ### Model Description
16
+ - **Purpose**: This model is trained for a specific task in research, it is not a commmercial product and should not be used in for-profit.
17
+ - **Architecture**: `bert-base-uncased`
18
+ - **Fine-tuning task**: Four-class English healthcare and AI news-headline relevancy classification
19
+ - **Dataset**: ~254 English headlines (2024–2025) manually labeled into:
20
+ - 0 — Not Relevant
21
+ - 1 — Least Relevant
22
+ - 2 — Highly Relevant
23
+ - 3 — Most Relevant
24
+ - **HF Repo**: [`cloud0day3/bert-ft-v2`](https://huggingface.co/cloud0day3/bert-ft-v2) (latest v3 checkpoint, 6 June 2025)
25
+ - **Date Trained**: 2025-06-06
26
+
27
+ #### Model Inputs
28
+
29
+ - A raw English headline (string), truncated/padded to 96 tokens.
30
+ - Tokenization handled by the bundled `vocab.txt` + `tokenizer_config.json` + `special_tokens_map.json`.
31
+
32
+ #### Model Outputs
33
+
34
+ - A single integer label (0–3). Mapped to human-readable categories:
35
+ ```python
36
+ LABELS = {
37
+ 0: "Not Relevant",
38
+ 1: "Least Relevant",
39
+ 2: "Highly Relevant",
40
+ 3: "Most Relevant"
41
+ }
42
+
43
+
44
+ #### Intended Use
45
+ - **Primary**: Automatically assign a relevancy score to healthcare and AI English news headlines so that downstream pipelines (e.g., filtering, ranking) can operate without manual triage.
46
+
47
+ #### Examples of use:
48
+
49
+ - Pre-filtering a news aggregation feed to capture healthcare and AI news.
50
+
51
+ - Prioritizing headlines for editorial review.
52
+
53
+ - Input to summarization/retrieval pipelines.
54
+
55
+ #### Out-of-Scope Uses
56
+ - Any non-English text.
57
+
58
+ - Multi-sentence inputs or full articles (this model is tuned on single-sentence headlines).
59
+
60
+ - Tasks other than healthcare-tech relevancy (e.g., sentiment analysis, topic modeling).
61
+
62
+ - High-risk decision making without human oversight (e.g., emergency alerts).