Arabic Punctuation Restoration Model (BiLSTM)

This is a Bidirectional LSTM (BiLSTM) model designed to restore punctuation marks in raw Arabic text. It takes unpunctuated Arabic text as input and inserts the appropriate punctuation marks.

Model Details

Architecture: BiLSTM (2 Layers, Hidden Dim 256)
Embeddings: AraVec (Twitter-CBOW 300d)
Vocabulary Size: ~50k words
Input: Raw Arabic text (with or without diacritics)
Output: Text with restored punctuation marks

Supported Punctuation Marks

The model predicts the following punctuation marks:

ID	Mark	Name
0	(None)	No Punctuation
1	?	Question Mark (؟)
2	،	Arabic Comma
3	:	Colon
4	؛	Arabic Semicolon
5	!	Exclamation Mark
6	.	Period / Full Stop

How to Use

Since this is a custom PyTorch model, you need to load the model structure and vocabulary.

Method 1: Using the Inference Script (Recommended)

Download the inference.py file from this repository to use the model easily.

from huggingface_hub import hf_hub_download
import importlib.util

# 1. Download the script
script_path = hf_hub_download(repo_id="malkhuzanie/arabic-punctuation-checkpoints", filename="inference.py")

# 2. Load the script
spec = importlib.util.spec_from_file_location("inference", script_path)
inference = importlib.util.module_from_spec(spec)
spec.loader.exec_module(inference)

# 3. Initialize and Predict
model = inference.PunctuationRestorer()

text = "هل تساءلت يوما عن معنى الحياة ما هي الأسئلة التي تشغل بالك"
print(model.predict(text))
# Output: هل تساءلت يوماً عن معنى الحياة؟ ما هي الأسئلة التي تشغل بالك؟

Downloads last month: -; Downloads are not tracked for this model. How to track

malkhuzanie
/

arabic-punctuation-checkpoints

Arabic Punctuation Restoration Model (BiLSTM)

Model Details

Supported Punctuation Marks

How to Use

Method 1: Using the Inference Script (Recommended)

Space using malkhuzanie/arabic-punctuation-checkpoints 1