george121212afasf commited on
Commit
2f2a4a6
·
verified ·
1 Parent(s): 929adb8

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +93 -0
README.md ADDED
@@ -0,0 +1,93 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: id
3
+ tags:
4
+ - indonesian
5
+ - ner
6
+ - named-entity-recognition
7
+ - sports
8
+ - football
9
+ - indobert
10
+ license: mit
11
+ ---
12
+
13
+ # SportExtract NER Model
14
+
15
+ ## Model Description
16
+
17
+ This is a Named Entity Recognition (NER) model fine-tuned on Indonesian sports news articles, specifically for football/soccer content.
18
+
19
+ **Base Model:** IndoBERT (indobenchmark/indobert-base-p1)
20
+
21
+ **Model Type:** Multi-label token classification
22
+
23
+ ## Entities Detected
24
+
25
+ The model can detect the following entities in Indonesian sports articles:
26
+
27
+ - **ATLET** - Athletes/Players
28
+ - **TIM** - Teams
29
+ - **ORGANISASI** - Organizations
30
+ - **KEWARGANEGARAAN** - Nationality
31
+ - **POSISI** - Player positions
32
+ - **UMUR** - Age
33
+ - **AKSI** - Actions in matches
34
+ - **PENGHARGAAN** - Awards/achievements
35
+ - **STATISTIK** - Statistics
36
+ - **SKOR** - Match scores
37
+ - **TANGGAL** - Dates
38
+ - **STADION** - Stadiums
39
+ - **KEJUARAAN** - Tournaments/competitions
40
+ - **ALASAN_PERISTIWA** - Event reasons/context
41
+
42
+ ## Usage
43
+
44
+ ```python
45
+ import torch
46
+ from transformers import AutoTokenizer, AutoModel
47
+ from huggingface_hub import hf_hub_download
48
+
49
+ # Download model
50
+ model_path = hf_hub_download(
51
+ repo_id="george121212afasf/model",
52
+ filename="best_model.pt"
53
+ )
54
+
55
+ # Load checkpoint
56
+ checkpoint = torch.load(model_path, map_location='cpu')
57
+
58
+ # Get tokenizer
59
+ tokenizer = AutoTokenizer.from_pretrained("indobenchmark/indobert-base-p1")
60
+
61
+ # Your model class and inference code here
62
+ ```
63
+
64
+ ## Training Data
65
+
66
+ Trained on annotated Indonesian sports news articles from various sources.
67
+
68
+ ## Model Size
69
+
70
+ - Parameters: ~125M (IndoBERT base)
71
+ - File size: ~1420 MB
72
+
73
+ ## Intended Use
74
+
75
+ This model is designed for extracting sports-related entities from Indonesian news articles, particularly for:
76
+ - Sports journalism analysis
77
+ - Automated content tagging
78
+ - Information extraction from sports news
79
+ - 5W1H (Who, What, When, Where, Why, How) analysis
80
+
81
+ ## Limitations
82
+
83
+ - Optimized for Indonesian language sports content
84
+ - Best performance on football/soccer articles
85
+ - May not generalize well to other sports domains
86
+
87
+ ## License
88
+
89
+ MIT License
90
+
91
+ ## Contact
92
+
93
+ For questions or feedback, please open an issue in the repository.