| --- |
| language: |
| - en |
| base_model: |
| - microsoft/deberta-v3-base |
| pipeline_tag: text-classification |
| license: mit |
| --- |
| Binary classification model for ad-detection on QA Systems. |
|
|
| ## Sample usage |
|
|
| ```python |
| import torch |
| from transformers import AutoTokenizer, AutoModelForSequenceClassification |
| classifier_model_path = "teknology/ad-classifier-v0.4" |
| tokenizer = AutoTokenizer.from_pretrained(classifier_model_path) |
| model = AutoModelForSequenceClassification.from_pretrained(classifier_model_path) |
| model.eval() |
| device = torch.device("cuda" if torch.cuda.is_available() else "cpu") |
| model.to(device) |
| def classify(passages): |
| inputs = tokenizer( |
| passages, padding=True, truncation=True, max_length=512, return_tensors="pt" |
| ) |
| inputs = {k: v.to(device) for k, v in inputs.items()} |
| with torch.no_grad(): |
| outputs = model(**inputs) |
| logits = outputs.logits |
| predictions = torch.argmax(logits, dim=-1) |
| return predictions.cpu().tolist() |
| preds = classify(["sample_text_1", "sample_text_2"]) |
| ``` |
|
|
|
|
| ## Version |
|
|
| Previous versions can be found at: |
| - v0.0: https://huggingface.co/jmvcoelho/ad-classifier-v0.0 |
| Trained with the official data from Webis Generated Native Ads 2024 |
| - v0.1: https://huggingface.co/jmvcoelho/ad-classifier-v0.1 |
| Trained with v0.0 data + new synthetic data |
| - v0.2: https://huggingface.co/jmvcoelho/ad-classifier-v0.2 |
| Similar to v0.1, but include more diversity in ad placement startegies through prompting. |
| - v0.3: https://huggingface.co/teknology/ad-classifier-v0.3 |
| Continued from v0.2, added a new synthetic dataset generated based on Wikipedia articles. |
| - **v0.4**: Same training data composition as v0.3, but curriculum learning with the mixed data. |
|
|