File size: 3,044 Bytes
6e7b9be
 
 
 
 
 
 
 
 
 
 
386d840
906cb49
 
 
833617d
8bdf0a5
833617d
56b67c2
906cb49
 
833617d
906cb49
833617d
906cb49
 
 
56b67c2
 
 
 
 
 
833617d
56b67c2
 
 
 
833617d
56b67c2
 
 
 
093ccc2
78fca5d
8949aa1
78fca5d
8949aa1
78fca5d
8949aa1
 
833617d
062fa6c
833617d
8949aa1
062fa6c
833617d
8949aa1
833617d
8949aa1
 
 
 
 
833617d
 
 
062fa6c
833617d
 
 
 
8949aa1
 
 
 
 
833617d
 
 
 
 
 
 
 
 
 
 
78fca5d
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
---
---
pipeline_tag: text-generation
language: en
library_name: transformers
tags:
- t5
- grammar-correction
- text-generation
---

# T5-REF-CORRUPT-EN: Automatic Error Correction of Academic Referencing According to Institutional Guidelines of the Center for Translation Studies (CTS) of University of Vienna

**Objective:** This model corrects errors in academic referencing. For example:  

*Input (wrong sentence)*: According to Smith **&** Peterson **2016 56**, the translation reveals patterns that suggest underlying semantic shifts

*Output (clean sentence)*: According to Smith **and** Peterson **(2016: 56)**, the translation reveals patterns that suggest underlying semantic shifts.

**Model Details:**  

- **Model name:** T5-REF-CORRUPT-EN  
- **Base model:** T5-base  
- **Language:** English  
- **Training data:** Synthetically generated using LLMs and synthetically corrupted real student sentences.  

**Usage Cases:** Error correction of academic references according to CTS guidelines.

## Example

```python
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

model_id = "elizaveta-dev/T5-REF-CORRUPT-EN"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSeq2SeqLM.from_pretrained(model_id)

text = "According to Smith & Peterson 2016 56, the translation reveals patterns that suggest underlying semantic shifts."
inputs = tokenizer(text, return_tensors="pt")

outputs = model.generate(**inputs, max_new_tokens=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

## Use-Cases

The model can perform automatic corrections of various referencing errors, including:

### 1. Incorrect Citation Type (Parenthetical vs. Narrative)

*Example of mistake:* (Lopez 2018; Chen 2012) found that cultural context strongly influences translation strategies.

*Example of correction:* Lopez (2018) and Chen (2012) found that cultural context strongly influences translation strategies.


*Example of mistake:* This topic has been widely researched Baker (2006).

*Example of correction:* This topic has been widely researched (Baker 2006).

---

### 2. Incorrect Citation for Two Authors

*Example of mistake:* The concept of functional equivalence was analyzed by Baker & Green (2007).

*Example of correction:* The concept of functional equivalence was analyzed by Baker and Green (2007).


*Example of mistake:* Previous research (Müller, Schmidt 2001) highlights challenges in literary translation.

*Example of correction:* Previous research (Müller & Schmidt 2001) highlights challenges in literary translation.

---

### 3. Incorrect Placement of Citations

*Example of mistake:* According to Williams, translation theory continues to evolve (2011: 77).

*Example of correction:* According to Williams (2011: 77), translation theory continues to evolve.

---

### 4. Redundant Entities

*Example of mistake:* As Lee (2009) explains, equivalence is central in translation (Lee 2009).

*Example of correction:* As Lee (2009) explains, equivalence is central in translation.