model / README.md

george121212afasf

Update README.md

d0afc22 verified 4 months ago

preview code

raw

history blame contribute delete

2.1 kB

metadata

language: id
tags:
  - indonesian
  - ner
  - named-entity-recognition
  - sports
  - football
  - indobert

SportExtract NER Model

Model Description

This is a Named Entity Recognition (NER) model fine-tuned on Indonesian sports news articles, specifically for football/soccer content.

Base Model: IndoBERT (indobenchmark/indobert-base-p1)

Model Type: Multi-label token classification

Entities Detected

The model can detect the following entities in Indonesian sports articles:

ATLET - Athletes/Players
TIM - Teams
ORGANISASI - Organizations
KEWARGANEGARAAN - Nationality
POSISI - Player positions
UMUR - Age
AKSI - Actions in matches
PENGHARGAAN - Awards/achievements
STATISTIK - Statistics
SKOR - Match scores
TANGGAL - Dates
STADION - Stadiums
KEJUARAAN - Tournaments/competitions
ALASAN_PERISTIWA - Event reasons/context

Usage

import torch
from transformers import AutoTokenizer, AutoModel
from huggingface_hub import hf_hub_download

# Download model
model_path = hf_hub_download(
    repo_id="george121212afasf/model",
    filename="best_model.pt"
)

# Load checkpoint
checkpoint = torch.load(model_path, map_location='cpu')

# Get tokenizer
tokenizer = AutoTokenizer.from_pretrained("indobenchmark/indobert-base-p1")

# Your model class and inference code here

Training Data

Trained on annotated Indonesian sports news articles from various sources.

Model Size

Parameters: ~125M (IndoBERT base)
File size: ~1420 MB

Intended Use

This model is designed for extracting sports-related entities from Indonesian news articles, particularly for:

Sports journalism analysis
Automated content tagging
Information extraction from sports news
5W1H (Who, What, When, Where, Why, How) analysis

Limitations

Optimized for Indonesian language sports content
Best performance on football, basketball, and badminton articles
May not generalize well to other sports domains

Contact

For questions or feedback, please open an issue in the repository.