--- language: id tags: - indonesian - ner - named-entity-recognition - sports - football - indobert --- # SportExtract NER Model ## Model Description This is a Named Entity Recognition (NER) model fine-tuned on Indonesian sports news articles, specifically for football/soccer content. **Base Model:** IndoBERT (indobenchmark/indobert-base-p1) **Model Type:** Multi-label token classification ## Entities Detected The model can detect the following entities in Indonesian sports articles: - **ATLET** - Athletes/Players - **TIM** - Teams - **ORGANISASI** - Organizations - **KEWARGANEGARAAN** - Nationality - **POSISI** - Player positions - **UMUR** - Age - **AKSI** - Actions in matches - **PENGHARGAAN** - Awards/achievements - **STATISTIK** - Statistics - **SKOR** - Match scores - **TANGGAL** - Dates - **STADION** - Stadiums - **KEJUARAAN** - Tournaments/competitions - **ALASAN_PERISTIWA** - Event reasons/context ## Usage ```python import torch from transformers import AutoTokenizer, AutoModel from huggingface_hub import hf_hub_download # Download model model_path = hf_hub_download( repo_id="george121212afasf/model", filename="best_model.pt" ) # Load checkpoint checkpoint = torch.load(model_path, map_location='cpu') # Get tokenizer tokenizer = AutoTokenizer.from_pretrained("indobenchmark/indobert-base-p1") # Your model class and inference code here ``` ## Training Data Trained on annotated Indonesian sports news articles from various sources. ## Model Size - Parameters: ~125M (IndoBERT base) - File size: ~1420 MB ## Intended Use This model is designed for extracting sports-related entities from Indonesian news articles, particularly for: - Sports journalism analysis - Automated content tagging - Information extraction from sports news - 5W1H (Who, What, When, Where, Why, How) analysis ## Limitations - Optimized for Indonesian language sports content - Best performance on football, basketball, and badminton articles - May not generalize well to other sports domains ## Contact For questions or feedback, please open an issue in the repository.