metadata
language: id
tags:
- indonesian
- ner
- named-entity-recognition
- sports
- football
- indobert
SportExtract NER Model
Model Description
This is a Named Entity Recognition (NER) model fine-tuned on Indonesian sports news articles, specifically for football/soccer content.
Base Model: IndoBERT (indobenchmark/indobert-base-p1)
Model Type: Multi-label token classification
Entities Detected
The model can detect the following entities in Indonesian sports articles:
- ATLET - Athletes/Players
- TIM - Teams
- ORGANISASI - Organizations
- KEWARGANEGARAAN - Nationality
- POSISI - Player positions
- UMUR - Age
- AKSI - Actions in matches
- PENGHARGAAN - Awards/achievements
- STATISTIK - Statistics
- SKOR - Match scores
- TANGGAL - Dates
- STADION - Stadiums
- KEJUARAAN - Tournaments/competitions
- ALASAN_PERISTIWA - Event reasons/context
Usage
import torch
from transformers import AutoTokenizer, AutoModel
from huggingface_hub import hf_hub_download
# Download model
model_path = hf_hub_download(
repo_id="george121212afasf/model",
filename="best_model.pt"
)
# Load checkpoint
checkpoint = torch.load(model_path, map_location='cpu')
# Get tokenizer
tokenizer = AutoTokenizer.from_pretrained("indobenchmark/indobert-base-p1")
# Your model class and inference code here
Training Data
Trained on annotated Indonesian sports news articles from various sources.
Model Size
- Parameters: ~125M (IndoBERT base)
- File size: ~1420 MB
Intended Use
This model is designed for extracting sports-related entities from Indonesian news articles, particularly for:
- Sports journalism analysis
- Automated content tagging
- Information extraction from sports news
- 5W1H (Who, What, When, Where, Why, How) analysis
Limitations
- Optimized for Indonesian language sports content
- Best performance on football, basketball, and badminton articles
- May not generalize well to other sports domains
Contact
For questions or feedback, please open an issue in the repository.