--- license: apache-2.0 --- # Model Card: Redact-V1 PII Detection Model This model is designed to automatically detect and redact personally identifiable information (PII) from text. It leverages a deep learning architecture implemented in TensorFlow and fine-tuned on a curated dataset. ## Overview The **Redact-V1** model is engineered for robust PII detection, with applications in data redaction and privacy preservation. The model has been trained and evaluated using the [Redact-V1 dataset](https://huggingface.co/datasets/darkmatter2222/redact-v1), ensuring a high degree of accuracy in recognizing sensitive entities. ## Model Details - **Model File:** [final_model.h5](final_model.h5) - **Labels:** [labels.json](labels.json) The training performance indicators (loss, accuracy, precision, and recall) have been recorded and can be found in the training performance file. Visualizations of model performance, including confusion matrices and training history, are available in the [images](images/) folder. ![Highlighted Sample](https://huggingface.co/darkmatter2222/redact-v1/blob/main/images/highlighted_sample_4.png) ## Supported Classes The model supports the following PII classes: - **People Name:** - **Card Number:** - **Account Number:** - **Social Security Number:** - **Government ID Number:** - **Date of Birth:** - **Password:** - **Tax ID Number:** - **Phone Number:** - **Residential Address:** - **Email Address:** - **IP Number:** - **Passport:** - **Driver License:** ## Usage Below is sample code to load and use the model in a Python environment: ```python import os import json import tensorflow as tf import tensorflow_hub as hub # Paths to the model and labels. MODEL_PATH = r"final_model.h5" LABELS_PATH = r"labels.json" def load_labels(labels_file): with open(labels_file, 'r', encoding='utf-8') as f: return json.load(f) def main(): print("Loading model from:", MODEL_PATH) model = tf.keras.models.load_model(MODEL_PATH, custom_objects={'KerasLayer': hub.KerasLayer}) print("Model loaded successfully.") labels = load_labels(LABELS_PATH) print("Loaded labels:", labels) # Sample sentence for testing. sample_sentence = "John Doe's account number 1234567890 was flagged for review due to unusual activity." print("Sample sentence:", sample_sentence) # Run prediction. predictions = model.predict([sample_sentence]) print("Predictions:") for label, prob in zip(labels, predictions[0]): print(f"{label}: {prob:.2f}") if __name__ == "__main__": main() ``` # Professional Model Card ## Workspace Collecting workspace information ## Training Data & Source Code - **Training Data:** The model was trained on the [Redact-V1 dataset](https://huggingface.co/datasets/darkmatter2222/redact-v1). - **Source Code:** The training pipeline and preprocessing code can be reviewed in the [NLU-Redact-PII repository](https://github.com/darkmatter2222/NLU-Redact-PII). ## License This project is licensed under the Apache-2.0 license.