--- library_name: transformers base_model: HooshvareLab/bert-base-parsbert-uncased tags: - ner - persian - fine-tuned - transformers model-index: - name: bert-finetuned-ner results: [] --- # bert-finetuned-ner This model is a **fine-tuned Persian BERT model** based on [HooshvareLab/bert-base-parsbert-uncased](https://huggingface.co/HooshvareLab/bert-base-parsbert-uncased) for **Named Entity Recognition (NER)**. It has been trained to identify entities such as persons, organizations, locations, and products in Persian text. ## Model Description `bert-finetuned-ner` is designed for token-level classification in Persian. The model uses **ParsBERT**, a BERT variant pretrained on a large Persian corpus, as the base model and is fine-tuned on a wnut2017-persian dataset. It can predict entity labels for each token in input text, supporting tasks such as text analysis, information extraction, and question answering pipelines. ## Intended Uses & Limitations ### Intended Uses - Named Entity Recognition (NER) in Persian text. - Information extraction for NLP pipelines in Persian language applications. - Academic research or industrial projects requiring entity tagging. ### Limitations - Performance depends heavily on the coverage and quality of the training data. Entities not represented in the dataset may not be recognized. - The model may misclassify rare or out-of-vocabulary words. - For critical applications, manual verification of predictions is recommended. - Trained on formal text; performance on dialects or colloquial Persian may vary. ## Training and Evaluation Data - **Dataset:** ('Amir13/wnut2017-persian') - Entities annotated include: persons, organizations, locations, creative works, and products. - Tokenization handled using ParsBERT tokenizer (`HooshvareLab/bert-base-parsbert-uncased`). ## Training Procedure ### Training Hyperparameters - Learning rate: 2e-5 - Train batch size: 8 - Evaluation batch size: 8 - Optimizer: `AdamW` (betas=(0.9, 0.999), epsilon=1e-8) - Learning rate scheduler: linear - Number of epochs: 10 - Seed: 42 ### Training Environment - Framework: Transformers 4.56.1 - PyTorch 2.8.0+cu126 - Datasets 4.0.0 - Tokenizers 0.22.0 ### Training Results - The model achieved training loss of ~0.13 (averaged over all epochs). - Accuracy, F1, precision, and recall metrics are recommended to be computed on a held-out test set for full evaluation. ## How to Use ```python from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline model_path = "path_to_saved_model" tokenizer = AutoTokenizer.from_pretrained(model_path) model = AutoModelForTokenClassification.from_pretrained(model_path) ner_pipeline = pipeline( "ner", model=model, tokenizer=tokenizer, aggregation_strategy="simple" ) text = "سلام. در تهران زندگی میکنم." results = ner_pipeline(text) print(results)