| | --- |
| | license: apache-2.0 |
| | datasets: |
| | - agentlans/text-quality-v2 |
| | language: |
| | - en |
| | base_model: |
| | - microsoft/deberta-v3-base |
| | pipeline_tag: text-classification |
| | --- |
| | # DeBERTa v3 for Text Quality Assessment |
| |
|
| | ## Model Details |
| |
|
| | - **Model Architecture:** DeBERTa v3 (xsmall and base variants) |
| | - **Task:** Text quality assessment (regression) |
| | - **Training Data:** Text Quality Meta-Analysis Dataset at [agentlans/text-quality-v2](https://huggingface.co/datasets/agentlans/text-quality-v2) |
| | - **Output:** Single continuous value representing text quality |
| |
|
| | ## Intended Use |
| |
|
| | These models are designed to assess the quality of English text, where "quality" refers to legible sentences that are not spam and contain useful information. They can be used for: |
| |
|
| | - Content moderation |
| | - Spam detection |
| | - Information quality assessment |
| | - Text filtering |
| |
|
| | ## Usage |
| |
|
| | The models accept text input and return a single continuous value representing the assessed quality. Higher values indicate higher perceived quality. Example usage is provided in the code snippet. |
| |
|
| | ```python |
| | from transformers import AutoTokenizer, AutoModelForSequenceClassification |
| | import torch |
| | |
| | model_name="agentlans/deberta-v3-base-quality-v2" |
| | |
| | # Put model on GPU or else CPU |
| | tokenizer = AutoTokenizer.from_pretrained(model_name) |
| | model = AutoModelForSequenceClassification.from_pretrained(model_name) |
| | device = torch.device("cuda" if torch.cuda.is_available() else "cpu") |
| | model = model.to(device) |
| | |
| | def quality(text): |
| | """Processes the text using the model and returns its logits. |
| | In this case, it's interpreted as the the combined quality score for that text.""" |
| | inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True).to(device) |
| | with torch.no_grad(): |
| | logits = model(**inputs).logits.squeeze().cpu() |
| | return logits.tolist() |
| | |
| | # Example usage |
| | text = [x.strip() for x in """ |
| | Congratulations! You've won a $1,000 gift card! Click here to claim your prize now!!! |
| | Page 1 2 3 4 5 Next Last>> |
| | Urgent: Your account has been compromised! Click this link to verify your identity and secure your account immediately!!! |
| | Today marks a significant milestone in our journey towards sustainability! 🌍✨ We’re excited to announce our partnership with local organizations to plant 10,000 trees in our community this fall. Join us in making a positive impact on our environment! |
| | In recent years, the impact of climate change has become increasingly evident, affecting ecosystems and human livelihoods across the globe. |
| | The mitochondria is the powerhouse of the cell. |
| | Exclusive discount on Super MitoMax Energy Boost! Recharge your mitochondria today! |
| | Everyone is talking about this new diet that guarantees weight loss without exercise! |
| | Discover five tips for improving your productivity while working from home. |
| | """.strip().split("\n")] |
| | |
| | result = quality(text) |
| | for x, s in zip(text, result): |
| | print(f"Text: {x}\nQuality: {round(s, 2)}\n") |
| | ``` |
| |
|
| | Example output for the `base` size model: |
| | ``` |
| | Text: Congratulations! You've won a $1,000 gift card! Click here to claim your prize now!!! |
| | Quality: -1.25 |
| | |
| | Text: Page 1 2 3 4 5 Next Last>> |
| | Quality: -1.54 |
| | |
| | Text: Urgent: Your account has been compromised! Click this link to verify your identity and secure your account immediately!!! |
| | Quality: -2.01 |
| | |
| | Text: Today marks a significant milestone in our journey towards sustainability! 🌍✨ We’re excited to announce our partnership with local organizations to plant 10,000 trees in our community this fall. Join us in making a positive impact on our environment! |
| | Quality: -1.72 |
| | |
| | Text: In recent years, the impact of climate change has become increasingly evident, affecting ecosystems and human livelihoods across the globe. |
| | Quality: 0.45 |
| | |
| | Text: The mitochondria is the powerhouse of the cell. |
| | Quality: 1.32 |
| | |
| | Text: Exclusive discount on Super MitoMax Energy Boost! Recharge your mitochondria today! |
| | Quality: -1.16 |
| | |
| | Text: Everyone is talking about this new diet that guarantees weight loss without exercise! |
| | Quality: -0.27 |
| | |
| | Text: Discover five tips for improving your productivity while working from home. |
| | Quality: -0.42 |
| | ``` |
| |
|
| | ## Performance Metrics |
| |
|
| | Root mean squared error (RMSE) on 20% held-out evaluation set: |
| | - xsmall 0.7668 |
| | - base 0.7096 |
| |
|
| | The base model outperforms the xsmall variant in terms of accuracy. |
| |
|
| | ## Limitations and Biases |
| |
|
| | - The models are trained on a specific dataset and may not generalize well to all types of text or domains. |
| | - "Quality" is a subjective concept, and the models' assessments may not align with all human judgments. |
| | - The models may exhibit biases present in the training data. |
| | - For example, there is a bias against self-help, promotional, and public relations material. |
| | - They do not assess factual correctness or grammatical accuracy. |
| |
|
| | ## Ethical Considerations |
| |
|
| | - These models should not be used as the sole determinant for content moderation or censorship. |
| | - Care should be taken to avoid reinforcing existing biases in content selection or promotion. |
| | - The models' outputs should be interpreted as suggestions rather than definitive judgments. |
| |
|
| | ## Caveats and Recommendations |
| |
|
| | - Use these models in conjunction with other tools and human oversight for content moderation. |
| | - Regularly evaluate the models' performance on your specific use case and data. |
| | - Be aware that the models may not perform equally well across all text types or domains. |
| | - Consider fine-tuning the models on domain-specific data for improved performance in specialized applications. |