You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Persian Sentence Completion Classifier

A BERT-based classifier that determines whether a Persian sentence is Complete or Incomplete.
Designed for use in ASR post-processing pipelines (e.g. after speech-to-text).

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

MODEL_ID = "MohammadJRanjbar/persian-sentence-completion"

tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
model = AutoModelForSequenceClassification.from_pretrained(MODEL_ID)
model.eval()

label_map = {0: "Incomplete", 1: "Complete"}

def classify_sentences(sentences, batch_size=16):
    results = []
    for i in range(0, len(sentences), batch_size):
        batch = sentences[i:i+batch_size]
        inputs = tokenizer(batch, return_tensors="pt", padding=True,
                           truncation=True, max_length=128)
        with torch.no_grad():
            outputs = model(**inputs)
            preds = torch.argmax(outputs.logits, dim=-1)
            results.extend([label_map[p.item()] for p in preds])
    return results

# Example
texts = ["امروز هوا خیلی عالی", "امروز هوا خیلی عالی است."]
print(classify_sentences(texts))
# → ['Incomplete', 'Complete']

Citation

If you use this model, please cite the following works:

@misc{kalahroodi2026persianpunclargescaledatasetbertbased,
      title={PersianPunc: A Large-Scale Dataset and BERT-Based Approach for Persian Punctuation Restoration}, 
      author={Mohammad Javad Ranjbar Kalahroodi and Heshaam Faili and Azadeh Shakery},
      year={2026},
      eprint={2603.05314},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2603.05314}, 
}

@misc{kalahroodi2025parsvoicelargescalemultispeakerpersian,
    title = {ParsVoice: A Large-Scale Multi-Speaker Persian Speech Corpus for Text-to-Speech Synthesis},
    author = {Mohammad Javad Ranjbar Kalahroodi and Heshaam Faili and Azadeh Shakery},
    year = {2025},
    eprint = {2510.10774},
    archivePrefix = {arXiv},
    primaryClass = {cs.SD},
    url = {https://arxiv.org/abs/2510.10774},
}

Downloads last month: 4

Safetensors

Model size

0.2B params

Tensor type

F32

Model tree for MohammadJRanjbar/persian-sentence-completion

Base model

HooshvareLab/bert-base-parsbert-uncased

Finetuned

(24)

this model

Dataset used to train MohammadJRanjbar/persian-sentence-completion

Papers for MohammadJRanjbar/persian-sentence-completion

PersianPunc: A Large-Scale Dataset and BERT-Based Approach for Persian Punctuation Restoration

Paper • 2603.05314 • Published Mar 5 • 1

ParsVoice: A Large-Scale Multi-Speaker Persian Speech Corpus for Text-to-Speech Synthesis

Paper • 2510.10774 • Published Oct 12, 2025 • 5