CAMeL-Lab
/

text-editing-coda

Token Classification

Model card Files Files and versions

balhafni commited on Jun 4, 2025

Commit

63d0c56

·

verified ·

1 Parent(s): df3a50b

Update README.md

Files changed (1) hide show

README.md +35 -0

README.md CHANGED Viewed

@@ -18,6 +18,41 @@ The fine-tuning code and associated resources are publicly available on our GitH
 ## Citation
 ```bibtex
 @inter{alhafni-habash-2025-enhancing,

+## Intended uses
+To use the `CAMeL-Lab/text-editing-coda` model, you must clone our text editing [GitHub repository](https://github.com/CAMeL-Lab/text-editing) and follow the installation requirements.
+We used this `SWEET` model to report results on the MADAR CODA dev and test sets in our [paper](https://arxiv.org/abs/2503.00985).
+## How to use
+Clone our text editing [GitHub repository](https://github.com/CAMeL-Lab/text-editing) and follow the installation requirements
+```python
+from transformers import BertTokenizer, BertForTokenClassification
+import torch
+import torch.nn.functional as F
+from gec.tag import rewrite
+tokenizer = BertTokenizer.from_pretrained('CAMeL-Lab/text-editing-coda')
+model = BertForTokenClassification.from_pretrained('CAMeL-Lab/text-editing-coda')
+edits_map = model.config.id2label
+text = 'أنا بعطيك رقم تلفونو و عنوانو'.split()
+tokenized_text = tokenizer(text, return_tensors="pt", is_split_into_words=True)
+with torch.no_grad():
+    logits = model(**tokenized_text).logits
+    preds = F.softmax(logits.squeeze(), dim=-1)
+    preds = torch.argmax(preds, dim=-1).cpu().numpy()
+    edits = [edits_map[p] for p in preds[1:-1]]
+    assert len(edits) == len(tokenized_text['input_ids'][0][1:-1])
+subwords = tokenizer.convert_ids_to_tokens(tokenized_text['input_ids'][0][1:-1])
+output_sent = rewrite(subwords=[subwords], edits=[edits])[0][0]
+print(output_sent) # انا باعطيك رقم تلفونه وعنوانه
+```
 ## Citation
 ```bibtex
 @inter{alhafni-habash-2025-enhancing,