Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,78 @@
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
+
language:
|
| 4 |
+
- en
|
| 5 |
+
metrics:
|
| 6 |
+
- accuracy
|
| 7 |
+
pipeline_tag: text-classification
|
| 8 |
+
tags:
|
| 9 |
+
- advertising
|
| 10 |
---
|
| 11 |
+
|
| 12 |
+
# Tiny Bert Domain Advertising Classifier
|
| 13 |
+
|
| 14 |
+
## Overview
|
| 15 |
+
|
| 16 |
+
AdTargetingBERTClassifier is a small-scale BERT-based classifier designed for the task of ad targeting classification. The model is trained to predict multi-class labels associated with domains, as provided in the DAC693K dataset.
|
| 17 |
+
|
| 18 |
+
## Model Architecture
|
| 19 |
+
|
| 20 |
+
The classifier is built on the BERT (Bidirectional Encoder Representations from Transformers) architecture. It takes domain text as input and outputs logits for each class, enabling multi-class classification for ad targeting.
|
| 21 |
+
|
| 22 |
+
## Model Training
|
| 23 |
+
|
| 24 |
+
The model is trained on the "AdTargetingDataset" using a supervised learning approach. The training involves optimizing for the categorical cross-entropy loss, and the model is fine-tuned on the specific ad targeting classes associated with each domain.
|
| 25 |
+
|
| 26 |
+
## Usage
|
| 27 |
+
|
| 28 |
+
### Loading the Model
|
| 29 |
+
|
| 30 |
+
To use the trained classifier in your Python environment, you can load it using the following code:
|
| 31 |
+
|
| 32 |
+
```python
|
| 33 |
+
from transformers import BertTokenizer, BertForSequenceClassification
|
| 34 |
+
import torch
|
| 35 |
+
|
| 36 |
+
# Load the pre-trained model and tokenizer
|
| 37 |
+
model = BertForSequenceClassification.from_pretrained("ansi-code/bert-domain-advertising-classifier")
|
| 38 |
+
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
|
| 39 |
+
|
| 40 |
+
# Example inference
|
| 41 |
+
text = "google.com"
|
| 42 |
+
inputs = tokenizer(text, return_tensors="pt")
|
| 43 |
+
outputs = model(**inputs)
|
| 44 |
+
logits = outputs.logits
|
| 45 |
+
```
|
| 46 |
+
|
| 47 |
+
## Prediction
|
| 48 |
+
To make predictions with the loaded model, you can use the obtained logits. Convert the logits to probabilities and determine the predicted class based on the highest probability.
|
| 49 |
+
|
| 50 |
+
```python
|
| 51 |
+
Copy code
|
| 52 |
+
probabilities = torch.nn.functional.sigmoid(logits, dim=-1)
|
| 53 |
+
predicted_class = torch.argmax(probabilities).item()
|
| 54 |
+
```
|
| 55 |
+
|
| 56 |
+
## Model Evaluation
|
| 57 |
+
|
| 58 |
+
The model's performance can be assessed using standard evaluation metrics such as accuracy, precision, recall, and F1-score on a separate validation set or through cross-validation.
|
| 59 |
+
|
| 60 |
+
## License
|
| 61 |
+
|
| 62 |
+
This model is released under the Apache 2.0 License.
|
| 63 |
+
|
| 64 |
+
## Citation
|
| 65 |
+
|
| 66 |
+
If you use this model in your work, please cite it using the following BibTeX entry:
|
| 67 |
+
|
| 68 |
+
```bibtex
|
| 69 |
+
@model{silvi_2023_bert-domain-advertising-classifier,
|
| 70 |
+
title = {bert-domain-advertising-classifier},
|
| 71 |
+
author = {Andrea Silvi},
|
| 72 |
+
year = {2023},
|
| 73 |
+
}
|
| 74 |
+
```
|
| 75 |
+
|
| 76 |
+
## Acknowledgements
|
| 77 |
+
|
| 78 |
+
We would like to thank the developers of the Hugging Face Transformers library for providing the BERT model implementation.
|