lear-lab
/

modernbert-content

Text Classification

Generated from Trainer

text-embeddings-inference

Model card Files Files and versions

wesleymorris commited on Feb 21, 2025

Commit

48c3ac3

·

verified ·

1 Parent(s): c906102

Update README.md

Files changed (1) hide show

README.md +30 -2

README.md CHANGED Viewed

@@ -21,11 +21,39 @@ It achieves the following results on the evaluation set:
 ## Model description
-More information needed
 ## Intended uses & limitations
-More information needed
 ## Training and evaluation data

 ## Model description
+This is a modernbert model with a regression head designed to predict the Content score of a summary.
+Before the finetuning step, the model was pretrained on a very large synthetic dataset.
+The input should be the summary + [sep] + source.
+```
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+model = AutoModelForSequenceClassification.from_pretrained("wesleymorris/modernbert-content", num_labels=1)
+tokenizer = AutoTokenizer.from_pretrained("wesleymorris/modernbert-content")
+def get_score(summary: str,
+              source: str):
+    text = summary+tokenizer.sep_token+source
+    inputs = tokenizer(text, return_tensors = 'pt')
+    return float(model(**inputs).logits[0])
+```
+### Corpus
+It was trained on a corpus of 4,233 summaries of 101 sources compiled by Botarleanu et al. (2022).
+The summaries were graded by expert raters on 6 criteria: Details, Main Point, Cohesion, Paraphrasing, Objective Language, and Language Beyond the Text.
+A principle component analyis was used to reduce the dimensionality of the outcome variables to two.
+Content includes Details, Main Point, Paraphrasing and Cohesion
+### Contact
+This model was developed by LEAR Lab at Vanderbilt University. For questions or comments about this model, please contact wesley.g.morris@vanderbilt.edu.
 ## Intended uses & limitations
+This model can be used to predict human scores of content for a summary.
+The scores are normalized such that 0 is the mean of the training data and 1 is one standard deviation from the mean.
 ## Training and evaluation data