BERT-NER / README.md

Upload README.md with huggingface_hub

788b35d verified 5 months ago

4.05 kB

	---
	license: mit
	language:
	- id
	pipeline_tag: token-classification
	tags:
	- token-classification
	- indonesian
	- bert
	- ner
	- named-entity-recognition
	- transformers
	datasets:
	- custom
	widget:
	- text: "Presiden Joko Widodo berkunjung ke Jakarta untuk bertemu dengan Gubernur Anies Baswedan."
	inference: true
	---

	# BERT Base Indonesian Named Entity Recognition

	This is a BERT-based model fine-tuned for Named Entity Recognition (NER) tasks in Indonesian language. The model is trained to identify and classify named entities such as persons, organizations, locations, and other relevant entities in Indonesian text.

	## Model Details

	- Model Type: BERT (Bidirectional Encoder Representations from Transformers)
	- Language: Indonesian (id)
	- Task: Token Classification / Named Entity Recognition
	- Base Model: BERT Base
	- License: MIT

	## Intended Use

	This model is intended for:

	- Named Entity Recognition in Indonesian text
	- Information extraction from Indonesian documents
	- Text analysis and processing applications

	## How to Use

	### Using with Transformers

	```python
	from transformers import AutoTokenizer, AutoModelForTokenClassification
	import torch

	# Load model and tokenizer
	model_name = "path/to/bert-base-indonesian-NER" # or Hugging Face model ID if uploaded
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForTokenClassification.from_pretrained(model_name)

	# Prepare input text
	text = "Presiden Joko Widodo berkunjung ke Jakarta untuk bertemu dengan Gubernur Anies Baswedan."
	inputs = tokenizer(text, return_tensors="pt")

	# Get predictions
	with torch.no_grad():
	outputs = model(**inputs)
	predictions = torch.argmax(outputs.logits, dim=2)

	# Decode predictions
	predicted_tokens = [tokenizer.convert_ids_to_tokens(ids) for ids in inputs["input_ids"]]
	predicted_labels = [model.config.id2label[label_id] for label_id in predictions[0].tolist()]

	print("Tokens:", predicted_tokens)
	print("Labels:", predicted_labels)
	```

	### Using with Pipeline

	```python
	from transformers import pipeline

	# Load NER pipeline
	ner_pipeline = pipeline("ner", model="path/to/bert-base-indonesian-NER")

	# Process text
	text = "PT Bank Central Asia Tbk memiliki kantor pusat di Jakarta."
	results = ner_pipeline(text)

	for result in results:
	print(f"Entity: {result['word']}, Label: {result['entity']}, Confidence: {result['score']:.4f}")
	```

	## Label Mapping

	The model uses the following entity labels:

	- `B-PER`: Beginning of Person name
	- `I-PER`: Inside of Person name
	- `B-ORG`: Beginning of Organization name
	- `I-ORG`: Inside of Organization name
	- `B-LOC`: Beginning of Location name
	- `I-LOC`: Inside of Location name
	- `B-MISC`: Beginning of Miscellaneous entity
	- `I-MISC`: Inside of Miscellaneous entity
	- `O`: Outside (not an entity)

	## Training Data

	The model was trained on Indonesian text datasets containing annotated named entities. The training data includes:

	- News articles
	- Wikipedia pages
	- Social media posts
	- Government documents

	## Performance

	The model achieves the following performance metrics on the test set:

	- Precision: 0.XX
	- Recall: 0.XX
	- F1-Score: 0.XX

	## Limitations

	- The model may not perform well on informal or slang-heavy Indonesian text
	- Performance may vary across different domains
	- The model is trained on data up to a certain date and may not recognize newer entities

	## Ethical Considerations

	- This model should not be used for surveillance or tracking individuals without consent
	- Always consider privacy implications when processing personal data
	- The model's predictions should be validated by human experts for critical applications

	## Citation

	If you use this model in your research or applications, please cite:

	```text
	@misc{bert-indonesian-ner,
	title={BERT Base Indonesian Named Entity Recognition},
	author={Your Name},
	year={2024},
	publisher={Hugging Face},
	url={https://huggingface.co/model-id}
	}
	```

	## Contact

	For questions or issues, please contact [your contact information].