model / README.md
george121212afasf's picture
Update README.md
d0afc22 verified
---
language: id
tags:
- indonesian
- ner
- named-entity-recognition
- sports
- football
- indobert
---
# SportExtract NER Model
## Model Description
This is a Named Entity Recognition (NER) model fine-tuned on Indonesian sports news articles, specifically for football/soccer content.
**Base Model:** IndoBERT (indobenchmark/indobert-base-p1)
**Model Type:** Multi-label token classification
## Entities Detected
The model can detect the following entities in Indonesian sports articles:
- **ATLET** - Athletes/Players
- **TIM** - Teams
- **ORGANISASI** - Organizations
- **KEWARGANEGARAAN** - Nationality
- **POSISI** - Player positions
- **UMUR** - Age
- **AKSI** - Actions in matches
- **PENGHARGAAN** - Awards/achievements
- **STATISTIK** - Statistics
- **SKOR** - Match scores
- **TANGGAL** - Dates
- **STADION** - Stadiums
- **KEJUARAAN** - Tournaments/competitions
- **ALASAN_PERISTIWA** - Event reasons/context
## Usage
```python
import torch
from transformers import AutoTokenizer, AutoModel
from huggingface_hub import hf_hub_download
# Download model
model_path = hf_hub_download(
repo_id="george121212afasf/model",
filename="best_model.pt"
)
# Load checkpoint
checkpoint = torch.load(model_path, map_location='cpu')
# Get tokenizer
tokenizer = AutoTokenizer.from_pretrained("indobenchmark/indobert-base-p1")
# Your model class and inference code here
```
## Training Data
Trained on annotated Indonesian sports news articles from various sources.
## Model Size
- Parameters: ~125M (IndoBERT base)
- File size: ~1420 MB
## Intended Use
This model is designed for extracting sports-related entities from Indonesian news articles, particularly for:
- Sports journalism analysis
- Automated content tagging
- Information extraction from sports news
- 5W1H (Who, What, When, Where, Why, How) analysis
## Limitations
- Optimized for Indonesian language sports content
- Best performance on football, basketball, and badminton articles
- May not generalize well to other sports domains
## Contact
For questions or feedback, please open an issue in the repository.