jonmabe
/

privacy-classifier-electra

Text Classification

content-moderation

Eval Results (legacy)

Model card Files Files and versions

privacy-classifier-electra / README.md

jonmabe's picture

Update model card with documentation and examples

5d5556e verified 10 days ago

|

history blame contribute delete

3.55 kB

	---
	license: apache-2.0
	language:
	- en
	pipeline_tag: text-classification
	tags:
	- privacy
	- content-moderation
	- classifier
	- electra
	datasets:
	- custom
	metrics:
	- accuracy
	model-index:
	- name: privacy-classifier-electra
	results:
	- task:
	type: text-classification
	name: Privacy Classification
	metrics:
	- type: accuracy
	value: 0.9968
	name: Validation Accuracy
	widget:
	- text: "My social security number is 123-45-6789"
	example_title: "Sensitive (SSN)"
	- text: "The weather is nice today"
	example_title: "Safe"
	- text: "My password is hunter2"
	example_title: "Sensitive (Password)"
	- text: "I like pizza"
	example_title: "Safe"
	---

	# Privacy Classifier (ELECTRA)

	A fine-tuned ELECTRA model for detecting sensitive/private information in text.

	## Model Description

	This model classifies text as either safe or sensitive, helping identify content that may contain private information like:
	- Social security numbers
	- Passwords and credentials
	- Financial account numbers
	- Personal health information
	- Home addresses
	- Phone numbers

	### Base Model
	- Architecture: [google/electra-base-discriminator](https://huggingface.co/google/electra-base-discriminator)
	- Parameters: ~110M
	- Task: Binary text classification

	## Training Details

	\| Parameter \| Value \|
	\|-----------\|-------\|
	\| Epochs \| 5 \|
	\| Validation Accuracy \| 99.68% \|
	\| Training Hardware \| NVIDIA RTX 5090 (32GB) \|
	\| Framework \| PyTorch + Transformers \|

	### Labels
	- `safe` (0): Content does not contain sensitive information
	- `sensitive` (1): Content may contain private/sensitive information

	## Usage

	```python
	from transformers import pipeline

	classifier = pipeline("text-classification", model="jonmabe/privacy-classifier-electra")

	# Examples
	result = classifier("My SSN is 123-45-6789")
	# [{'label': 'sensitive', 'score': 0.99...}]

	result = classifier("The meeting is at 3pm")
	# [{'label': 'safe', 'score': 0.99...}]
	```

	### Direct Usage

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification
	import torch

	tokenizer = AutoTokenizer.from_pretrained("jonmabe/privacy-classifier-electra")
	model = AutoModelForSequenceClassification.from_pretrained("jonmabe/privacy-classifier-electra")

	text = "My credit card number is 4111-1111-1111-1111"
	inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)

	with torch.no_grad():
	outputs = model(**inputs)
	prediction = torch.argmax(outputs.logits, dim=-1)
	label = "sensitive" if prediction.item() == 1 else "safe"
	print(f"Classification: {label}")
	```

	## Intended Use

	- Primary Use: Pre-screening text before logging, storage, or transmission
	- Use Cases:
	- Filtering sensitive content from logs
	- Flagging potential PII in user-generated content
	- Privacy-aware content moderation
	- Data loss prevention (DLP) systems

	## Limitations

	- Trained primarily on English text
	- May not catch all forms of sensitive information
	- Should be used as one layer in a defense-in-depth approach
	- Not a substitute for proper data handling policies

	## Training Data

	Custom dataset combining:
	- Synthetic examples of sensitive patterns (SSN, passwords, etc.)
	- Safe text samples from various domains
	- Balanced classes for robust classification

	## Citation

	```bibtex
	@misc{privacy-classifier-electra,
	author = {jonmabe},
	title = {Privacy Classifier based on ELECTRA},
	year = {2026},
	publisher = {Hugging Face},
	url = {https://huggingface.co/jonmabe/privacy-classifier-electra}
	}
	```