sa-ma
/

tos-simplifier

Model card Files Files and versions

sa-ma commited on Aug 1, 2025

Commit

74c147e

·

verified ·

1 Parent(s): de5f988

update readme

Files changed (1) hide show

README.md +54 -1

README.md CHANGED Viewed

@@ -7,4 +7,57 @@ metrics:
 - rouge
 base_model:
 - allenai/led-base-16384
----

 - rouge
 base_model:
 - allenai/led-base-16384
+---
+# ToS Simplifier
+`sa-ma/tos-simplifier` is a fine-tuned **Longformer Encoder–Decoder (LED)** model that turns dense, jargon-filled Terms of Service (ToS) documents into clear, plain-English summaries. The underlying LED architecture processes sequences up to 16 384 tokens in one pass, making it ideal for very long contracts.:contentReference[oaicite:0]{index=0}
+## Model details
+| | |
+| --- | --- |
+| **Base model** | `allenai/led-base-16384` |
+| **Parameters** | ~162 M |
+| **Context window** | 16 384 tokens (encoder) / 1 024 (decoder) |
+| **Language** | English |
+| **License** | MIT |
+## Training
+The model was fine-tuned on an internal corpus of publicly available ToS and their human-written “plain language” summaries (≈ 1.2 k document–summary pairs).
+Key hyper-parameters:
+* Optimiser — Adam W (β₁ = 0.9, β₂ = 0.98)
+* Learning-rate — 3 × 10⁻⁵ with linear warm-up
+* Batch — 16 effective (8 × 2 GPUs, gradient-accumulation = 2)
+* Early-stop on validation ROUGE-L
+Full settings are stored in `training_args.bin`.
+## Intended use
+| ✔ What it’s for | ✖ What it’s **not** for |
+| --- | --- |
+| Summarising ToS, privacy policies, EULAs | Non-English input |
+| General long-form abstractive summarisation | Producing legally binding advice |
+| Making legal texts more accessible | Summarising sensitive or proprietary data without review |
+## Quick start
+```python
+from transformers import LEDTokenizer, LEDForConditionalGeneration, pipeline
+model_id = "sa-ma/tos-simplifier"
+summariser = pipeline(
+    "summarization",
+    model=model_id,
+    tokenizer=model_id,
+    device_map="auto",       # drop or change if running on CPU
+    max_length=256,
+    min_length=30,
+)
+long_doc = open("tos.txt").read()
+summary = summariser(long_doc)[0]["summary_text"]
+print(summary)