algorembrant commited on
Commit
2b97944
Β·
verified Β·
1 Parent(s): 0abf3a6

Upload 5 files

Browse files
Files changed (5) hide show
  1. .gitignore +1 -0
  2. README.md +116 -0
  3. generate_permutations_colab.py +221 -0
  4. generate_typos.py +157 -0
  5. words.txt +0 -0
.gitignore ADDED
@@ -0,0 +1 @@
 
 
1
+ misspellings_permutations.txt
README.md ADDED
@@ -0,0 +1,116 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Misspelling Generator
2
+
3
+ A misspelling words generator of the 466k words from data [provider](https://github.com/dwyl/english-words) written `words.txt`. For demonstation, we only use 7 letters combination minimum to generate:
4
+ ```
5
+ Words processed : 125,414
6
+ Lines written : 173,110,626
7
+ Output file : misspellings_permutations.txt
8
+ File size : 2.53 GB
9
+ ```
10
+ depending your storage you could do more litter combination limit, just configure the `MAX_WORD_LEN` in the script.
11
+
12
+ ### Option 1 β€” use `generate_typos.py` (Run Locally)
13
+
14
+ Generates **realistic typo variants** using 4 strategies:
15
+
16
+ | Strategy | Example (`hello`) | Variants |
17
+ |---|---|---|
18
+ | Adjacent swap | `hlelo`, `helol` | nβˆ’1 per word |
19
+ | Char deletion | `hllo`, `helo`, `hell` | n per word |
20
+ | Char duplication | `hhello`, `heello` | n per word |
21
+ | Keyboard proximity | `gello`, `jello`, `hwllo` | varies |
22
+
23
+ - Processes only **pure-alpha words** with length β‰₯ 3
24
+ - Produces roughly **10–50 typos per word** β†’ ~5M–20M lines total
25
+ - Output: data/misspellings.txt in `misspelling=correction` format
26
+
27
+ **To run:**
28
+ ```
29
+ python generate_typos.py
30
+ ```
31
+
32
+ ---
33
+
34
+ ### Option 2 β€” use `generate_permutations_colab.py` (Google Colab)
35
+
36
+ Generates **ALL letter permutations** of each word. Key config at the top of the file:
37
+
38
+ ```python
39
+ MAX_WORD_LEN = 7 # ← CRITICAL control knob
40
+ ```
41
+
42
+ ---
43
+
44
+ ## Google Colab Education
45
+
46
+ ### What Is Google Colab?
47
+
48
+ Google Colab gives you a **free Linux VM** with Python pre-installed. You get:
49
+
50
+ | Resource | Free Tier | Colab Pro ($12/mo) |
51
+ |---|---|---|
52
+ | **Disk** | ~78 GB (temporary) | ~225 GB (temporary) |
53
+ | **RAM** | ~12 GB | ~25-50 GB |
54
+ | **GPU** | T4 (limited) | A100/V100 |
55
+ | **Runtime limit** | ~12 hours, then VM resets | ~24 hours |
56
+ | **Google Drive** | 15 GB (persistent) | 15 GB (same) |
57
+
58
+ > [!IMPORTANT]
59
+ > Colab disk is **ephemeral** β€” when the runtime disconnects, all files on the VM are deleted. Only Google Drive persists.
60
+
61
+ ### Step-by-Step: Running Option 2 on Colab
62
+
63
+ **Step 1 β€” Open Colab**
64
+ Go to [colab.research.google.com](https://colab.research.google.com) β†’ **New Notebook**
65
+
66
+ **Step 2 β€” Upload [words.txt]**
67
+ ```python
68
+ # Cell 1
69
+ from google.colab import files
70
+ uploaded = files.upload() # select words.txt from your PC
71
+ ```
72
+
73
+ **Step 3 β€” (Optional) Mount Google Drive for persistent storage**
74
+ ```python
75
+ # Cell 2
76
+ from google.colab import drive
77
+ drive.mount('/content/drive')
78
+
79
+ # Then change OUTPUT_PATH in the script to:
80
+ # '/content/drive/MyDrive/misspellings_permutations.txt'
81
+ ```
82
+
83
+ **Step 4 β€” Paste & run the script**
84
+ Copy the entire contents of `generate_permutations_colab.py` into a new cell. Adjust `MAX_WORD_LEN` as needed, then run.
85
+
86
+ **Step 5 β€” Download the result**
87
+ ```python
88
+ # If saved to VM disk:
89
+ files.download('misspellings_permutations.txt')
90
+
91
+ # If saved to Google Drive: just access it from drive.google.com
92
+ ```
93
+
94
+ ### Scale Reference
95
+
96
+ > [!CAUTION]
97
+ > Full permutations grow at **n! (factorial)** rate. Here's what to expect:
98
+
99
+ | `MAX_WORD_LEN` | Max perms/word | Est. total output |
100
+ |---|---|---|
101
+ | 5 | 120 | ~200 MB |
102
+ | 6 | 720 | ~1–2 GB |
103
+ | **7** | **5,040** | **~5–15 GB** ← recommended start |
104
+ | 8 | 40,320 | ~50–150 GB |
105
+ | 9 | 362,880 | ~500 GB – 1 TB |
106
+ | 10 | 3,628,800 | ~5–50 TB ← impossible |
107
+
108
+ > [!TIP]
109
+ > **Start with `MAX_WORD_LEN = 6` or `7`**, check the output size, then decide if you want to go higher. The script has a built-in safety check that aborts if the estimated size exceeds 70 GB.
110
+
111
+ ### Pro Tips for Colab
112
+
113
+ - **Keep the browser tab open** β€” Colab disconnects if idle too long
114
+ - **Use `Ctrl+Shift+I` β†’ Console** and paste `setInterval(function(){document.querySelector("colab-connect-button").click()}, 60000)` to prevent idle disconnects
115
+ - **For very large outputs**, write directly to Google Drive so you don't lose data on disconnect
116
+ - **CPU-only is fine** for this script β€” permutation generation is CPU-bound, not GPU
generate_permutations_colab.py ADDED
@@ -0,0 +1,221 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ =============================================================================
3
+ FULL PERMUTATION MISSPELLINGS GENERATOR (Google Colab Edition)
4
+ =============================================================================
5
+
6
+ Purpose:
7
+ Generate ALL possible letter permutations of each word from words.txt
8
+ and write them as misspelling=correction pairs.
9
+
10
+ WARNING β€” READ BEFORE RUNNING
11
+ This is computationally EXTREME. A single 10-letter word has 3,628,800
12
+ permutations. A 12-letter word has 479,001,600. For 466k words, the full
13
+ output could be PETABYTES. You WILL need to limit word length.
14
+
15
+ =============================================================================
16
+ HOW TO USE ON GOOGLE COLAB
17
+ =============================================================================
18
+
19
+ 1. Open Google Colab β†’ https://colab.research.google.com
20
+ 2. Create a new notebook (Python 3)
21
+
22
+ 3. Upload your words.txt:
23
+ ─────────────────────────────────────
24
+ # CELL 1: Upload words.txt
25
+ from google.colab import files
26
+ uploaded = files.upload() # click "Choose Files" β†’ select words.txt
27
+ ─────────────────────────────────────
28
+
29
+ 4. Copy-paste this ENTIRE script into a new cell and run it.
30
+
31
+ 5. Download the result:
32
+ ─────────────────────────────────────
33
+ # CELL 3: Download the output
34
+ files.download('misspellings_permutations.txt')
35
+ ─────────────────────────────────────
36
+
37
+ =============================================================================
38
+ OR: Use Google Drive for large files
39
+ =============================================================================
40
+
41
+ # Mount Google Drive (you get 15 GB free)
42
+ from google.colab import drive
43
+ drive.mount('/content/drive')
44
+
45
+ # Then set OUTPUT_PATH below to:
46
+ OUTPUT_PATH = '/content/drive/MyDrive/misspellings_permutations.txt'
47
+
48
+ =============================================================================
49
+ CONFIGURATION β€” Adjust these before running!
50
+ =============================================================================
51
+ """
52
+
53
+ import os
54
+ import sys
55
+ import time
56
+ import math
57
+ from itertools import permutations
58
+
59
+ # ── CONFIGURATION ───────────────────────────────────────────────────────────
60
+
61
+ WORDS_PATH = 'words.txt' # path to your words.txt
62
+ OUTPUT_PATH = 'misspellings_permutations.txt' # output file path
63
+
64
+ MIN_WORD_LEN = 3 # skip words shorter than this
65
+ MAX_WORD_LEN = 7 # CRITICAL: max word length to permute
66
+ # 7 β†’ max 5,040 perms/word (manageable)
67
+ # 8 β†’ max 40,320 perms/word (large)
68
+ # 9 β†’ max 362,880 perms/word (very large)
69
+ # 10 β†’ max 3,628,800 perms/word (EXTREME)
70
+ # Increase at your own risk!
71
+
72
+ ONLY_ALPHA = True # only process pure-alphabetical words
73
+ BATCH_LOG = 5000 # print progress every N words
74
+
75
+ # ── ESTIMATION TABLE ────────────────────────────────────────────────────────
76
+ # Here's roughly how big the output gets at each MAX_WORD_LEN setting,
77
+ # assuming ~200k qualifying words at each length bracket:
78
+ #
79
+ # MAX_WORD_LEN β”‚ Perms per word (worst) β”‚ Rough output size
80
+ # ─────────────┼────────────────────────┼──────────────────
81
+ # 5 β”‚ 120 β”‚ ~200 MB
82
+ # 6 β”‚ 720 β”‚ ~1-2 GB
83
+ # 7 β”‚ 5,040 β”‚ ~5-15 GB
84
+ # 8 β”‚ 40,320 β”‚ ~50-150 GB
85
+ # 9 β”‚ 362,880 β”‚ ~500 GB - 1 TB
86
+ # 10 β”‚ 3,628,800 β”‚ ~5-50 TB ← won't fit anywhere
87
+ #
88
+ # Google Colab free tier gives you:
89
+ # β€’ ~78 GB disk on the VM (temporary, lost on disconnect)
90
+ # β€’ 15 GB Google Drive (persistent)
91
+ # β€’ Colab Pro: 225 GB disk, longer runtimes
92
+ #
93
+ # RECOMMENDATION: Start with MAX_WORD_LEN = 6 or 7, see the size,
94
+ # then increase if you have space.
95
+ # ────────────────────────────────────────────────────────────────────────────
96
+
97
+
98
+ def estimate_output(words):
99
+ """Estimate total permutations and file size before generating."""
100
+ total_perms = 0
101
+ for w in words:
102
+ n = len(w)
103
+ # Account for duplicate letters: n! / (c1! * c2! * ...)
104
+ freq = {}
105
+ for ch in w.lower():
106
+ freq[ch] = freq.get(ch, 0) + 1
107
+ unique_perms = math.factorial(n)
108
+ for count in freq.values():
109
+ unique_perms //= math.factorial(count)
110
+ total_perms += unique_perms - 1 # subtract the original word
111
+
112
+ # Estimate ~15 bytes per line (avg) β†’ "typo=word\n"
113
+ avg_bytes_per_line = 15
114
+ est_bytes = total_perms * avg_bytes_per_line
115
+ est_gb = est_bytes / (1024 ** 3)
116
+
117
+ return total_perms, est_gb
118
+
119
+
120
+ def generate_unique_permutations(word):
121
+ """
122
+ Generate all unique permutations of a word's letters,
123
+ excluding the original word itself.
124
+
125
+ Uses set() to deduplicate (handles repeated letters efficiently).
126
+ """
127
+ lower = word.lower()
128
+ perms = set(''.join(p) for p in permutations(lower))
129
+ perms.discard(lower) # remove the correctly-spelled word
130
+ return perms
131
+
132
+
133
+ def is_pure_alpha(word):
134
+ return word.isalpha()
135
+
136
+
137
+ def main():
138
+ if not os.path.exists(WORDS_PATH):
139
+ print(f"ERROR: '{WORDS_PATH}' not found!")
140
+ print("Make sure you uploaded words.txt or set WORDS_PATH correctly.")
141
+ sys.exit(1)
142
+
143
+ # ── Read words ──────────────────────────────────────────────
144
+ print(f"Reading words from: {WORDS_PATH}")
145
+ with open(WORDS_PATH, 'r', encoding='utf-8', errors='replace') as f:
146
+ raw_words = [line.strip() for line in f if line.strip()]
147
+
148
+ print(f"Total raw entries: {len(raw_words):,}")
149
+
150
+ # Filter
151
+ words = []
152
+ for w in raw_words:
153
+ if ONLY_ALPHA and not is_pure_alpha(w):
154
+ continue
155
+ if len(w) < MIN_WORD_LEN or len(w) > MAX_WORD_LEN:
156
+ continue
157
+ words.append(w)
158
+
159
+ print(f"Filtered to {len(words):,} words (alpha-only, len {MIN_WORD_LEN}-{MAX_WORD_LEN})")
160
+
161
+ if len(words) == 0:
162
+ print("No words matched the filter. Adjust MIN/MAX_WORD_LEN.")
163
+ sys.exit(1)
164
+
165
+ # ── Estimate ────────────────────────────────────────────────
166
+ print("\nEstimating output size (this may take a moment)...")
167
+ total_perms, est_gb = estimate_output(words)
168
+ print(f" Estimated permutations : {total_perms:,}")
169
+ print(f" Estimated file size : {est_gb:.2f} GB")
170
+
171
+ # Safety check
172
+ if est_gb > 70:
173
+ print(f"\n WARNING: Estimated output ({est_gb:.1f} GB) exceeds Colab disk (~78 GB).")
174
+ print(" Reduce MAX_WORD_LEN or the script will crash when disk fills up.")
175
+ print(" Aborting. Set MAX_WORD_LEN lower and re-run.")
176
+ sys.exit(1)
177
+
178
+ print(f"\nProceeding with generation β†’ {OUTPUT_PATH}")
179
+ print("=" * 60)
180
+
181
+ # ── Generate ────────────────────────────────────────────────
182
+ start = time.time()
183
+ total_written = 0
184
+
185
+ with open(OUTPUT_PATH, 'w', encoding='utf-8') as out:
186
+ out.write("# Auto-generated FULL PERMUTATION misspellings\n")
187
+ out.write(f"# Config: word length {MIN_WORD_LEN}-{MAX_WORD_LEN}\n")
188
+ out.write("# Format: misspelling=correction\n\n")
189
+
190
+ for idx, word in enumerate(words):
191
+ perms = generate_unique_permutations(word)
192
+
193
+ for typo in sorted(perms):
194
+ out.write(f"{typo}={word}\n")
195
+ total_written += 1
196
+
197
+ # Progress
198
+ if (idx + 1) % BATCH_LOG == 0:
199
+ elapsed = time.time() - start
200
+ pct = (idx + 1) / len(words) * 100
201
+ rate = (idx + 1) / elapsed if elapsed > 0 else 0
202
+ cur_size = os.path.getsize(OUTPUT_PATH) / (1024 ** 3)
203
+ print(f" [{pct:5.1f}%] {idx+1:>7,}/{len(words):,} words |"
204
+ f" {total_written:>12,} lines | {cur_size:.2f} GB |"
205
+ f" {rate:.0f} words/sec")
206
+
207
+ elapsed = time.time() - start
208
+ final_size = os.path.getsize(OUTPUT_PATH) / (1024 ** 3)
209
+
210
+ print()
211
+ print("=" * 60)
212
+ print(f" DONE in {elapsed:.1f}s ({elapsed/60:.1f} min)")
213
+ print(f" Words processed : {len(words):,}")
214
+ print(f" Lines written : {total_written:,}")
215
+ print(f" Output file : {OUTPUT_PATH}")
216
+ print(f" File size : {final_size:.2f} GB")
217
+ print("=" * 60)
218
+
219
+
220
+ if __name__ == '__main__':
221
+ main()
generate_typos.py ADDED
@@ -0,0 +1,157 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Generate realistic typo-based misspellings from words.txt β†’ misspellings.txt
3
+
4
+ Typo strategies:
5
+ 1. Adjacent letter swaps ("hello" β†’ "hlelo", "helol")
6
+ 2. Single character deletion ("hello" β†’ "hllo", "helo")
7
+ 3. Single character duplication ("hello" β†’ "hhello", "heello")
8
+ 4. Nearby keyboard key sub ("hello" β†’ "gello", "jello")
9
+
10
+ Output format: misspelling=correction (one per line)
11
+ """
12
+
13
+ import sys
14
+ import os
15
+ import time
16
+
17
+ # QWERTY keyboard proximity map
18
+ KEYBOARD_NEIGHBORS = {
19
+ 'q': 'wa', 'w': 'qeas', 'e': 'wrds', 'r': 'etfs', 't': 'rygs',
20
+ 'y': 'tuhs', 'u': 'yijs', 'i': 'uoks', 'o': 'ipls', 'p': 'o',
21
+ 'a': 'qwsz', 's': 'awedxz', 'd': 'serfcx', 'f': 'drtgvc',
22
+ 'g': 'ftyhbv', 'h': 'gyujnb', 'j': 'huikmn', 'k': 'jiolm',
23
+ 'l': 'kop', 'z': 'asx', 'x': 'zsdc', 'c': 'xdfv', 'v': 'cfgb',
24
+ 'b': 'vghn', 'n': 'bhjm', 'm': 'njk',
25
+ }
26
+
27
+
28
+ def generate_adjacent_swaps(word):
29
+ """Swap each pair of adjacent characters."""
30
+ typos = []
31
+ for i in range(len(word) - 1):
32
+ chars = list(word)
33
+ chars[i], chars[i + 1] = chars[i + 1], chars[i]
34
+ typo = ''.join(chars)
35
+ if typo != word:
36
+ typos.append(typo)
37
+ return typos
38
+
39
+
40
+ def generate_deletions(word):
41
+ """Delete one character at a time."""
42
+ typos = []
43
+ for i in range(len(word)):
44
+ typo = word[:i] + word[i + 1:]
45
+ if len(typo) >= 2: # keep at least 2 chars
46
+ typos.append(typo)
47
+ return typos
48
+
49
+
50
+ def generate_duplications(word):
51
+ """Duplicate one character at a time."""
52
+ typos = []
53
+ for i in range(len(word)):
54
+ typo = word[:i] + word[i] + word[i:]
55
+ if typo != word:
56
+ typos.append(typo)
57
+ return typos
58
+
59
+
60
+ def generate_nearby_key_subs(word):
61
+ """Replace one character with a nearby keyboard key."""
62
+ typos = []
63
+ lower = word.lower()
64
+ for i in range(len(word)):
65
+ ch = lower[i]
66
+ if ch in KEYBOARD_NEIGHBORS:
67
+ for neighbor in KEYBOARD_NEIGHBORS[ch]:
68
+ typo = lower[:i] + neighbor + lower[i + 1:]
69
+ if typo != lower:
70
+ typos.append(typo)
71
+ return typos
72
+
73
+
74
+ def generate_all_typos(word):
75
+ """Generate all realistic typo variants for a word."""
76
+ typos = set()
77
+ typos.update(generate_adjacent_swaps(word))
78
+ typos.update(generate_deletions(word))
79
+ typos.update(generate_duplications(word))
80
+ typos.update(generate_nearby_key_subs(word))
81
+ typos.discard(word) # never map a word to itself
82
+ typos.discard(word.lower())
83
+ return typos
84
+
85
+
86
+ def is_pure_alpha(word):
87
+ """Only process words that are purely alphabetical (a-z)."""
88
+ return word.isalpha()
89
+
90
+
91
+ def main():
92
+ base_dir = os.path.dirname(os.path.abspath(__file__))
93
+ words_path = os.path.join(base_dir, 'data', 'words.txt')
94
+ output_path = os.path.join(base_dir, 'data', 'misspellings.txt')
95
+
96
+ if not os.path.exists(words_path):
97
+ print(f"ERROR: {words_path} not found.")
98
+ sys.exit(1)
99
+
100
+ # ── Read words ──────────────────────────────────────────────
101
+ print(f"Reading words from: {words_path}")
102
+ with open(words_path, 'r', encoding='utf-8', errors='replace') as f:
103
+ raw_words = [line.strip() for line in f if line.strip()]
104
+
105
+ print(f"Total raw entries: {len(raw_words):,}")
106
+
107
+ # Filter to pure-alpha words with length >= 3
108
+ words = [w for w in raw_words if is_pure_alpha(w) and len(w) >= 3]
109
+ print(f"Filtered to {len(words):,} alphabetical words (len >= 3)")
110
+
111
+ # ── Generate typos ──────────────────────────────────────────
112
+ start = time.time()
113
+ total_typos = 0
114
+ batch_size = 10_000
115
+
116
+ print(f"Generating typos β†’ {output_path}")
117
+ print("This may take a few minutes for 466k words...")
118
+
119
+ with open(output_path, 'w', encoding='utf-8', newline='\n') as out:
120
+ out.write("# Auto-generated misspellings database\n")
121
+ out.write("# Format: misspelling=correction\n")
122
+ out.write("# Generated by generate_typos.py\n")
123
+ out.write("#\n")
124
+ out.write("# Strategies: adjacent swaps, deletions, duplications, keyboard proximity\n")
125
+ out.write("\n")
126
+
127
+ for idx, word in enumerate(words):
128
+ correction = word # original is the correct form
129
+ typos = generate_all_typos(word.lower())
130
+
131
+ for typo in sorted(typos):
132
+ out.write(f"{typo}={correction}\n")
133
+ total_typos += 1
134
+
135
+ # Progress reporting
136
+ if (idx + 1) % batch_size == 0:
137
+ elapsed = time.time() - start
138
+ pct = (idx + 1) / len(words) * 100
139
+ rate = (idx + 1) / elapsed if elapsed > 0 else 0
140
+ print(f" [{pct:5.1f}%] {idx + 1:>7,} / {len(words):,} words |"
141
+ f" {total_typos:>10,} typos | {rate:.0f} words/sec")
142
+
143
+ elapsed = time.time() - start
144
+ file_size_mb = os.path.getsize(output_path) / (1024 * 1024)
145
+
146
+ print()
147
+ print("=" * 60)
148
+ print(f" Done in {elapsed:.1f}s")
149
+ print(f" Words processed : {len(words):,}")
150
+ print(f" Typos generated : {total_typos:,}")
151
+ print(f" Output file : {output_path}")
152
+ print(f" File size : {file_size_mb:.1f} MB")
153
+ print("=" * 60)
154
+
155
+
156
+ if __name__ == '__main__':
157
+ main()
words.txt ADDED
The diff for this file is too large to render. See raw diff