Update README.md
Browse files
README.md
CHANGED
|
@@ -13,15 +13,15 @@ model-index:
|
|
| 13 |
# ModernBERT Engagement Content Regression
|
| 14 |
### What is this?
|
| 15 |
|
| 16 |
-
This
|
| 17 |
|
| 18 |
-
We will
|
| 19 |
|
| 20 |
-
This type of task
|
| 21 |
> Half my advertising is wasted; the trouble is, I don't know which half
|
| 22 |
> -John Wanamaker
|
| 23 |
|
| 24 |
-
|
| 25 |
|
| 26 |
Links for project:
|
| 27 |
- Model - [ModernBERT-Engagement-Content-Regression](https://huggingface.co/Forecast-ing/modernBERT-content-regression)
|
|
@@ -36,13 +36,11 @@ This work is indebted to the work of many community members and blog posts.
|
|
| 36 |
|
| 37 |
|
| 38 |
### Our dataset
|
| 39 |
-
We will be using a dataset of 548 emails where we have the text of the email
|
| 40 |
-
|
| 41 |
-
We look forward in the improvements of ModernBERT to fine-tune models specifically for each potential users email dataset. The variability of email data, as well as the small size of the dataset pose an interesting regression challenge.
|
| 42 |
|
|
|
|
| 43 |
### Benchmarking
|
| 44 |
-
We will start by using the Catboost library as a simple benchmark for text regression. For both the benchmark and the ModernBert run, we are using 'rmse' as the metric.
|
| 45 |
-
We recieve the following results:
|
| 46 |
| Metric | Value |
|
| 47 |
|--------|------------------|
|
| 48 |
| MSE | 2.552100633998035 |
|
|
@@ -79,11 +77,8 @@ After running hyperparameter tuning for ModernBERT, we get the following results
|
|
| 79 |
| SMAPE | 56.61447048187256 |
|
| 80 |
|
| 81 |
We see improvements in all metrics except for SMAPE. We believe that ModernBERT would scale even better with a larger dataset; as 500 example is very low for fine-tuning and are thus happy with the performance of this evaluation.
|
| 82 |
-
|
| 83 |
### Who are we?
|
| 84 |
-
At [Forecast.ing](https://forecast.ing) we are building a platform to help users create more enriching content by automatically researching trends and generating campaign ideas with AgenticAI.
|
| 85 |
-
We generate the content, and then create fine-tuned scores of how likely we think that content will succeed.
|
| 86 |
|
| 87 |
## Conclusion
|
| 88 |
-
We see that ModernBERT is a powerful model for text regression. We believe that with a larger dataset, we would see even better results. We are excited to see the future of ModernBERT and how it will be used for text regression.
|
| 89 |
-
If interested, I can be contacted at robin@forecast.ing
|
|
|
|
| 13 |
# ModernBERT Engagement Content Regression
|
| 14 |
### What is this?
|
| 15 |
|
| 16 |
+
This explores using modernBERT for the text regression task of predicting engagement metrics for text content. In this case, we predict the clickthrough rate (CTR) of email text content.
|
| 17 |
|
| 18 |
+
We will explore modernBert's hyperparameter tuning and how to use it for regression. We will also compare the results to a benchmark model.
|
| 19 |
|
| 20 |
+
This type of task is complex; we can remember the quote.
|
| 21 |
> Half my advertising is wasted; the trouble is, I don't know which half
|
| 22 |
> -John Wanamaker
|
| 23 |
|
| 24 |
+
In this experiment, we exclude other relevant factors, such as the time the email is sent, the day of the week, the recipient, etc.
|
| 25 |
|
| 26 |
Links for project:
|
| 27 |
- Model - [ModernBERT-Engagement-Content-Regression](https://huggingface.co/Forecast-ing/modernBERT-content-regression)
|
|
|
|
| 36 |
|
| 37 |
|
| 38 |
### Our dataset
|
| 39 |
+
We will be using a dataset of 548 emails where we have the text of the email text and the CTR we are trying to predict labels.
|
|
|
|
|
|
|
| 40 |
|
| 41 |
+
We look forward to ModernBERT's improvements, allowing us to fine-tune models for each potential user’s email dataset. The variability of email data and its small size pose interesting regression challenges.
|
| 42 |
### Benchmarking
|
| 43 |
+
We will start by using the Catboost library as a simple benchmark for text regression. For both the benchmark and the ModernBert run, we are using 'rmse' as the metric. We receive the following results:
|
|
|
|
| 44 |
| Metric | Value |
|
| 45 |
|--------|------------------|
|
| 46 |
| MSE | 2.552100633998035 |
|
|
|
|
| 77 |
| SMAPE | 56.61447048187256 |
|
| 78 |
|
| 79 |
We see improvements in all metrics except for SMAPE. We believe that ModernBERT would scale even better with a larger dataset; as 500 example is very low for fine-tuning and are thus happy with the performance of this evaluation.
|
|
|
|
| 80 |
### Who are we?
|
| 81 |
+
At [Forecast.ing](https://forecast.ing) we are building a platform to help users create more enriching content by automatically researching trends and generating campaign ideas with AgenticAI. We generate the content, and then create fine-tuned scores of how likely we think that content will succeed.
|
|
|
|
| 82 |
|
| 83 |
## Conclusion
|
| 84 |
+
We see that ModernBERT is a powerful model for text regression. We believe that with a larger dataset, we would see even better results. We are excited to see the future of ModernBERT and how it will be used for text regression. If interested, I can be contacted atrobin@forecast.ing
|
|
|