Instructions to use QomSSLab/Anonymizer-4b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use QomSSLab/Anonymizer-4b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="QomSSLab/Anonymizer-4b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("QomSSLab/Anonymizer-4b")
model = AutoModelForCausalLM.from_pretrained("QomSSLab/Anonymizer-4b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use QomSSLab/Anonymizer-4b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "QomSSLab/Anonymizer-4b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "QomSSLab/Anonymizer-4b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/QomSSLab/Anonymizer-4b

SGLang

How to use QomSSLab/Anonymizer-4b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "QomSSLab/Anonymizer-4b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "QomSSLab/Anonymizer-4b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "QomSSLab/Anonymizer-4b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "QomSSLab/Anonymizer-4b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use QomSSLab/Anonymizer-4b with Docker Model Runner:
```
docker model run hf.co/QomSSLab/Anonymizer-4b
```

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

QomSSLab/Anonymizer-4b

QomSSLab/Anonymizer-4b is a fine-tuned Gemma 3 4B model designed to anonymize Persian legal texts by masking or replacing all personally identifiable information (PII). It is trained on the QomSSLab/Anonymized_Cases dataset.

💡 Use Cases

Data privacy for legal document processing.
Preprocessing step for building publicly shareable Persian legal corpora.
Protecting PII in judicial NLP pipelines.

🧠 Model Details

Base Model: Gemma 3 4B
Language: Persian (Farsi)
Training Data: Synthetic and real anonymized Persian legal cases.
Task: Text-to-text generation (anonymization)

📦 Example Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model = AutoModelForCausalLM.from_pretrained("QomSSLab/Anonymizer-4b",  device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("QomSSLab/Anonymizer-4b")
tokenizer.add_eos_token = False

messages = [
    {"role": "system", "content": "You are a data privacy expert. Your task is to anonymize the following case text by removing or replacing all personally identifiable information (PII)."},
    {"role": "user", "content": "پرونده‌ای درباره ازدواج بین هانیه و عبدالرحیم با اطلاعات هویتی متعدد..."}
]

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True, add_special_tokens=False)
inputs = tokenizer([prompt], return_tensors="pt", add_special_tokens=False).to("cuda")

outputs = model.generate(
    **inputs,
    max_new_tokens=400,
    temperature=0.1,
    top_p=0.95,
    top_k=64,
    disable_compile=True
)

anonymized_text = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
print(anonymized_text)

📊 Evaluation

The model was evaluated qualitatively on a diverse collection of Persian legal documents. It effectively identifies and anonymizes a range of personally identifiable information (PII), including:

Full names
National IDs
Addresses
Dates of birth
Case numbers
Geographic locations

The model is particularly well-suited for preprocessing court cases for research, public data release, or downstream tasks like summarization and classification while preserving privacy.

Limitations

May occasionally miss rare or out-of-distribution PII formats.
Not guaranteed to anonymize very short or extremely noisy texts.
Trained primarily on formal legal language; performance may degrade on informal Persian.

📁 Dataset

This model was fine-tuned on the QomSSLab/Anonymized_Cases dataset, which includes manually and synthetically anonymized court documents and legal filings in Persian. The dataset contains a mix of real and simulated entities, helping the model generalize across varied legal formats and writing styles.

Downloads last month: -

Safetensors

Model size

4B params

Tensor type

BF16

QomSSLab
/

Anonymizer-4b