Instructions to use M-Arjun/SpamShield with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Scikit-learn
How to use M-Arjun/SpamShield with Scikit-learn:
from huggingface_hub import hf_hub_download import joblib model = joblib.load( hf_hub_download("M-Arjun/SpamShield", "sklearn_model.joblib") ) # only load pickle files from sources you trust # read more about it here https://skops.readthedocs.io/en/stable/persistence.html - Notebooks
- Google Colab
- Kaggle
| import re | |
| import string | |
| def preprocess_text(text: str) -> str: | |
| """ | |
| Lightweight preprocessing: | |
| - lowercase | |
| - URL normalization | |
| - remove excessive repeated characters | |
| - strip punctuation | |
| """ | |
| if not text: | |
| return "" | |
| # Lowercase | |
| text = text.lower() | |
| # URL normalization | |
| text = re.sub(r'https?://\S+|www\.\S+', ' [URL] ', text) | |
| # Remove excessive repeated characters (e.g., "freeeeee" -> "free") | |
| text = re.sub(r'(.)\1{2,}', r'\1', text) | |
| # Handle spaced out characters (e.g., "F R E E" -> "FREE") | |
| # Only if they are single characters separated by spaces, and more than 2 in a row | |
| text = re.sub(r'\b(\w\s){2,}\w\b', lambda m: m.group().replace(' ', ''), text) | |
| # Strip punctuation | |
| text = text.translate(str.maketrans('', '', string.punctuation)) | |
| # Remove extra whitespace | |
| text = re.sub(r'\s+', ' ', text).strip() | |
| return text | |