--- language: en license: apache-2.0 library_name: transformers tags: - bert - text-classification - privacy-policy - gdpr - torchscript datasets: - MAPP-116 metrics: - f1 model-index: - name: PARENT BERT results: - task: type: text-classification dataset: name: MAPP-116 type: text metrics: - name: f1 type: score value: 0.80 # replace with your actual F1 score --- # PARENT BERT Models for Privacy Policy Analysis This repository contains **TorchScript versions of 15 fine-tuned BERT models** used in the PARENT project to analyse mobile app privacy policies. These models identify **what data is collected, why it is collected, and how it is processed**, helping assess GDPR compliance. They are part of a hybrid framework designed for non-technical users, particularly parents concerned about children’s privacy. --- ## Model Purpose - Segment privacy policies to detect: - Data collection types (e.g., contact info, location) - Purpose of data collection - How data is processed - Support GDPR compliance evaluation - Detect potential third-party sharing (in combination with a logistic regression model) --- ## References - **MAPP Dataset:** Arora, S., Hosseini, H., Utz, C., Bannihatti Kumar, V., Dhellemmes, T., Ravichander, A., Story, P., Mangat, J., Chen, R., Degeling, M., Norton, T.B., Hupperich, T., Wilson, S., & Sadeh, N.M. (2022). *A tale of two regulatory regimes: Creation and analysis of a bilingual privacy policy corpus*. Proceedings of the International Conference on Language Resources and Evaluation (LREC 2022). [PDF link](https://aclanthology.org/2022.lrec-1.585.pdf) [Accessed 12 July 2025]. --- ## Usage ```python import torch from transformers import BertTokenizerFast from huggingface_hub import hf_hub_download device = torch.device("cuda" if torch.cuda.is_available() else "cpu") REPO_ID = "Bnaad/PARENT_bert" # Load tokenizer tokenizer = BertTokenizerFast.from_pretrained("bert-base-uncased") # Load one TorchScript model from Hugging Face label_name = "Information Type_Contact information" safe_label = label_name.replace(" ", "_").replace("/", "_") filename = f"torchscript_{safe_label}.pt" model_path = hf_hub_download(repo_id=REPO_ID, filename=filename) model = torch.jit.load(model_path, map_location=device) model.to(device) model.eval() # Example inference sample_text = """For any questions about your account or our services, please contact our customer support team by emailing support@example.com, calling +1-800-555-1234, or visiting our office at 123 Main Street, Springfield, IL, 62701 during business hours""" inputs = tokenizer( sample_text, return_tensors="pt", truncation=True, padding="max_length", max_length=512 ).to(device) with torch.no_grad(): outputs = model(inputs["input_ids"], inputs["attention_mask"]) print("Logits:", outputs) prob = torch.sigmoid(outputs.squeeze()) print(prob)