Instructions to use QomSSLab/Anonymizer-xlm-roberta with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use QomSSLab/Anonymizer-xlm-roberta with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="QomSSLab/Anonymizer-xlm-roberta")

# Load model directly
from transformers import AutoTokenizer, AutoModelForTokenClassification

tokenizer = AutoTokenizer.from_pretrained("QomSSLab/Anonymizer-xlm-roberta")
model = AutoModelForTokenClassification.from_pretrained("QomSSLab/Anonymizer-xlm-roberta")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use QomSSLab/Anonymizer-xlm-roberta with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "QomSSLab/Anonymizer-xlm-roberta"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "QomSSLab/Anonymizer-xlm-roberta",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/QomSSLab/Anonymizer-xlm-roberta

SGLang

How to use QomSSLab/Anonymizer-xlm-roberta with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "QomSSLab/Anonymizer-xlm-roberta" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "QomSSLab/Anonymizer-xlm-roberta",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "QomSSLab/Anonymizer-xlm-roberta" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "QomSSLab/Anonymizer-xlm-roberta",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use QomSSLab/Anonymizer-xlm-roberta with Docker Model Runner:
```
docker model run hf.co/QomSSLab/Anonymizer-xlm-roberta
```

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

QomSSLab/Anonymizer-4b

QomSSLab/Anonymizer-4b is a fine-tuned Gemma 3 4B model designed to anonymize Persian legal texts by masking or replacing all personally identifiable information (PII). It is trained on the QomSSLab/Anonymized_Cases dataset.

💡 Use Cases

Data privacy for legal document processing.
Preprocessing step for building publicly shareable Persian legal corpora.
Protecting PII in judicial NLP pipelines.

🧠 Model Details

Base Model: Gemma 3 4B
Language: Persian (Farsi)
Training Data: Synthetic and real anonymized Persian legal cases.
Task: Text-to-text generation (anonymization)

📦 Example Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
from transformers import pipeline


model = AutoModelForTokenClassification.from_pretrained("QomSSLab/Anonymizer-xlm-roberta",  device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("QomSSLab/Anonymizer-xlm-roberta")
ner = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy="simple")

text="پرونده‌ای درباره ازدواج بین هانیه و عبدالرحیم با اطلاعات هویتی متعدد"

entities = ner(text)

for ent in entities:
    print(f"Entity: {ent['word'],ent['start'], ent['end']}, Type: {ent['entity_group']}, Score: {ent['score']:.4f}")

📊 Evaluation

The model was evaluated qualitatively on a diverse collection of Persian legal documents. It effectively identifies and anonymizes a range of personally identifiable information (PII), including:

Full names
National IDs
Addresses
Dates of birth
Case numbers
Geographic locations

The model is particularly well-suited for preprocessing court cases for research, public data release, or downstream tasks like summarization and classification while preserving privacy.

Limitations

May occasionally miss rare or out-of-distribution PII formats.
Not guaranteed to anonymize very short or extremely noisy texts.
Trained primarily on formal legal language; performance may degrade on informal Persian.

📁 Dataset

This model was fine-tuned on the QomSSLab/Anonymized_Cases dataset, which includes manually and synthetically anonymized court documents and legal filings in Persian. The dataset contains a mix of real and simulated entities, helping the model generalize across varied legal formats and writing styles.

Downloads last month: -

Safetensors

Model size

0.6B params

Tensor type

F32

QomSSLab
/

Anonymizer-xlm-roberta