sa-ma commited on
Commit
74c147e
·
verified ·
1 Parent(s): de5f988

update readme

Browse files
Files changed (1) hide show
  1. README.md +54 -1
README.md CHANGED
@@ -7,4 +7,57 @@ metrics:
7
  - rouge
8
  base_model:
9
  - allenai/led-base-16384
10
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  - rouge
8
  base_model:
9
  - allenai/led-base-16384
10
+ ---
11
+
12
+ # ToS Simplifier
13
+
14
+ `sa-ma/tos-simplifier` is a fine-tuned **Longformer Encoder–Decoder (LED)** model that turns dense, jargon-filled Terms of Service (ToS) documents into clear, plain-English summaries. The underlying LED architecture processes sequences up to 16 384 tokens in one pass, making it ideal for very long contracts.:contentReference[oaicite:0]{index=0}
15
+
16
+ ## Model details
17
+
18
+ | | |
19
+ | --- | --- |
20
+ | **Base model** | `allenai/led-base-16384` |
21
+ | **Parameters** | ~162 M |
22
+ | **Context window** | 16 384 tokens (encoder) / 1 024 (decoder) |
23
+ | **Language** | English |
24
+ | **License** | MIT |
25
+
26
+ ## Training
27
+
28
+ The model was fine-tuned on an internal corpus of publicly available ToS and their human-written “plain language” summaries (≈ 1.2 k document–summary pairs).
29
+ Key hyper-parameters:
30
+
31
+ * Optimiser — Adam W (β₁ = 0.9, β₂ = 0.98)
32
+ * Learning-rate — 3 × 10⁻⁵ with linear warm-up
33
+ * Batch — 16 effective (8 × 2 GPUs, gradient-accumulation = 2)
34
+ * Early-stop on validation ROUGE-L
35
+
36
+ Full settings are stored in `training_args.bin`.
37
+
38
+ ## Intended use
39
+
40
+ | ✔ What it’s for | ✖ What it’s **not** for |
41
+ | --- | --- |
42
+ | Summarising ToS, privacy policies, EULAs | Non-English input |
43
+ | General long-form abstractive summarisation | Producing legally binding advice |
44
+ | Making legal texts more accessible | Summarising sensitive or proprietary data without review |
45
+
46
+ ## Quick start
47
+
48
+ ```python
49
+ from transformers import LEDTokenizer, LEDForConditionalGeneration, pipeline
50
+
51
+ model_id = "sa-ma/tos-simplifier"
52
+ summariser = pipeline(
53
+ "summarization",
54
+ model=model_id,
55
+ tokenizer=model_id,
56
+ device_map="auto", # drop or change if running on CPU
57
+ max_length=256,
58
+ min_length=30,
59
+ )
60
+
61
+ long_doc = open("tos.txt").read()
62
+ summary = summariser(long_doc)[0]["summary_text"]
63
+ print(summary)