Arabic Punctuation Restoration Model (BiLSTM)

This is a Bidirectional LSTM (BiLSTM) model designed to restore punctuation marks in raw Arabic text. It takes unpunctuated Arabic text as input and inserts the appropriate punctuation marks.

Model Details

  • Architecture: BiLSTM (2 Layers, Hidden Dim 256)
  • Embeddings: AraVec (Twitter-CBOW 300d)
  • Vocabulary Size: ~50k words
  • Input: Raw Arabic text (with or without diacritics)
  • Output: Text with restored punctuation marks

Supported Punctuation Marks

The model predicts the following punctuation marks:

ID Mark Name
0 (None) No Punctuation
1 ? Question Mark (؟)
2 ، Arabic Comma
3 : Colon
4 ؛ Arabic Semicolon
5 ! Exclamation Mark
6 . Period / Full Stop

How to Use

Since this is a custom PyTorch model, you need to load the model structure and vocabulary.

Method 1: Using the Inference Script (Recommended)

Download the inference.py file from this repository to use the model easily.

from huggingface_hub import hf_hub_download
import importlib.util

# 1. Download the script
script_path = hf_hub_download(repo_id="malkhuzanie/arabic-punctuation-checkpoints", filename="inference.py")

# 2. Load the script
spec = importlib.util.spec_from_file_location("inference", script_path)
inference = importlib.util.module_from_spec(spec)
spec.loader.exec_module(inference)

# 3. Initialize and Predict
model = inference.PunctuationRestorer()

text = "هل تساءلت يوما عن معنى الحياة ما هي الأسئلة التي تشغل بالك"
print(model.predict(text))
# Output: هل تساءلت يوماً عن معنى الحياة؟ ما هي الأسئلة التي تشغل بالك؟
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Space using malkhuzanie/arabic-punctuation-checkpoints 1