File size: 2,320 Bytes

b78be3b
 
 
848b5ab
b78be3b
788b35d
 
 
 
 
 
 
 
 
 
 
 
848b5ab
 
 
 
abc7785
 
 
 
848b5ab
 
 
 
 
 
abc7785
848b5ab
 
abc7785
 
 
 
 
 
 
 
 
 
 
848b5ab
 
abc7785
848b5ab
 
 
 
 
abc7785
 
848b5ab
 
 
 
 
 
 
 
abc7785
848b5ab
 
 
 
 
 
 
 
 
 
abc7785
 
848b5ab
abc7785

---
license: mit
language:
  - id
pipeline_tag: token-classification
tags:
  - token-classification
  - indonesian
  - bert
  - ner
  - named-entity-recognition
  - transformers
datasets:
  - custom
widget:
  - text: "Presiden Joko Widodo berkunjung ke Jakarta untuk bertemu dengan Gubernur Anies Baswedan."
inference: true
---

# BERT Base Indonesian Named Entity Recognition

This is a BERT-based model fine-tuned for Named Entity Recognition (NER) tasks in Indonesian.  
The model is trained to identify and classify named entities such as persons, organizations, locations, and other relevant entities in Indonesian text.

---

## Model Details

- **Model Type**: BERT (Bidirectional Encoder Representations from Transformers)
- **Language**: Indonesian (id)
- **Task**: Token Classification / Named Entity Recognition
- **Base Model**: [`cahya/bert-base-indonesian-1.5G`](https://huggingface.co/cahya/bert-base-indonesian-1.5G)
- **License**: MIT

### Base Model Reference

The base model, **BERT Base Indonesian (uncased)**, was pre-trained on:
- ~522MB Indonesian Wikipedia
- ~1GB Indonesian newspaper text  
using a masked language modeling (MLM) objective with a 32,000 WordPiece vocabulary.

Full details are available on its [model card](https://huggingface.co/cahya/bert-base-indonesian-1.5G).

---

## Intended Use

This fine-tuned model is intended for:

- Named Entity Recognition in Indonesian text
- Information extraction from Indonesian documents
- Text analysis and processing applications

---

## How to Use

### Using with Transformers

```python
from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch

model_name = "nahiar/BERT-NER"  # replace with your Hugging Face repo ID
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)

text = "Presiden Joko Widodo berkunjung ke Jakarta untuk bertemu dengan Gubernur Anies Baswedan."
inputs = tokenizer(text, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.argmax(outputs.logits, dim=2)

tokens = [tokenizer.convert_ids_to_tokens(ids) for ids in inputs["input_ids"]]
labels = [model.config.id2label[label_id] for label_id in predictions[0].tolist()]

print("Tokens:", tokens)
print("Labels:", labels)
```