nahiar
/

BERT-NER

Token Classification

named-entity-recognition

Model card Files Files and versions

BERT-NER / README.md

nahiar's picture

Update README.md

abc7785 verified 4 months ago

|

history blame contribute delete

2.32 kB

	---
	license: mit
	language:
	- id
	pipeline_tag: token-classification
	tags:
	- token-classification
	- indonesian
	- bert
	- ner
	- named-entity-recognition
	- transformers
	datasets:
	- custom
	widget:
	- text: "Presiden Joko Widodo berkunjung ke Jakarta untuk bertemu dengan Gubernur Anies Baswedan."
	inference: true
	---

	# BERT Base Indonesian Named Entity Recognition

	This is a BERT-based model fine-tuned for Named Entity Recognition (NER) tasks in Indonesian.
	The model is trained to identify and classify named entities such as persons, organizations, locations, and other relevant entities in Indonesian text.

	---

	## Model Details

	- Model Type: BERT (Bidirectional Encoder Representations from Transformers)
	- Language: Indonesian (id)
	- Task: Token Classification / Named Entity Recognition
	- Base Model: [`cahya/bert-base-indonesian-1.5G`](https://huggingface.co/cahya/bert-base-indonesian-1.5G)
	- License: MIT

	### Base Model Reference

	The base model, BERT Base Indonesian (uncased), was pre-trained on:
	- ~522MB Indonesian Wikipedia
	- ~1GB Indonesian newspaper text
	using a masked language modeling (MLM) objective with a 32,000 WordPiece vocabulary.

	Full details are available on its [model card](https://huggingface.co/cahya/bert-base-indonesian-1.5G).

	---

	## Intended Use

	This fine-tuned model is intended for:

	- Named Entity Recognition in Indonesian text
	- Information extraction from Indonesian documents
	- Text analysis and processing applications

	---

	## How to Use

	### Using with Transformers

	```python
	from transformers import AutoTokenizer, AutoModelForTokenClassification
	import torch

	model_name = "nahiar/BERT-NER" # replace with your Hugging Face repo ID
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForTokenClassification.from_pretrained(model_name)

	text = "Presiden Joko Widodo berkunjung ke Jakarta untuk bertemu dengan Gubernur Anies Baswedan."
	inputs = tokenizer(text, return_tensors="pt")

	with torch.no_grad():
	outputs = model(**inputs)
	predictions = torch.argmax(outputs.logits, dim=2)

	tokens = [tokenizer.convert_ids_to_tokens(ids) for ids in inputs["input_ids"]]
	labels = [model.config.id2label[label_id] for label_id in predictions[0].tolist()]

	print("Tokens:", tokens)
	print("Labels:", labels)
	```