| | --- |
| | language: id |
| | tags: |
| | - indonesian |
| | - ner |
| | - named-entity-recognition |
| | - sports |
| | - football |
| | - indobert |
| | --- |
| | |
| | # SportExtract NER Model |
| |
|
| | ## Model Description |
| |
|
| | This is a Named Entity Recognition (NER) model fine-tuned on Indonesian sports news articles, specifically for football/soccer content. |
| |
|
| | **Base Model:** IndoBERT (indobenchmark/indobert-base-p1) |
| |
|
| | **Model Type:** Multi-label token classification |
| |
|
| | ## Entities Detected |
| |
|
| | The model can detect the following entities in Indonesian sports articles: |
| |
|
| | - **ATLET** - Athletes/Players |
| | - **TIM** - Teams |
| | - **ORGANISASI** - Organizations |
| | - **KEWARGANEGARAAN** - Nationality |
| | - **POSISI** - Player positions |
| | - **UMUR** - Age |
| | - **AKSI** - Actions in matches |
| | - **PENGHARGAAN** - Awards/achievements |
| | - **STATISTIK** - Statistics |
| | - **SKOR** - Match scores |
| | - **TANGGAL** - Dates |
| | - **STADION** - Stadiums |
| | - **KEJUARAAN** - Tournaments/competitions |
| | - **ALASAN_PERISTIWA** - Event reasons/context |
| | |
| | ## Usage |
| | |
| | ```python |
| | import torch |
| | from transformers import AutoTokenizer, AutoModel |
| | from huggingface_hub import hf_hub_download |
| | |
| | # Download model |
| | model_path = hf_hub_download( |
| | repo_id="george121212afasf/model", |
| | filename="best_model.pt" |
| | ) |
| | |
| | # Load checkpoint |
| | checkpoint = torch.load(model_path, map_location='cpu') |
| | |
| | # Get tokenizer |
| | tokenizer = AutoTokenizer.from_pretrained("indobenchmark/indobert-base-p1") |
| | |
| | # Your model class and inference code here |
| | ``` |
| | |
| | ## Training Data |
| | |
| | Trained on annotated Indonesian sports news articles from various sources. |
| | |
| | ## Model Size |
| | |
| | - Parameters: ~125M (IndoBERT base) |
| | - File size: ~1420 MB |
| | |
| | ## Intended Use |
| | |
| | This model is designed for extracting sports-related entities from Indonesian news articles, particularly for: |
| | - Sports journalism analysis |
| | - Automated content tagging |
| | - Information extraction from sports news |
| | - 5W1H (Who, What, When, Where, Why, How) analysis |
| | |
| | ## Limitations |
| | |
| | - Optimized for Indonesian language sports content |
| | - Best performance on football, basketball, and badminton articles |
| | - May not generalize well to other sports domains |
| | |
| | ## Contact |
| | |
| | For questions or feedback, please open an issue in the repository. |