srimeenakshiks commited on
Commit
924ef59
·
verified ·
1 Parent(s): 23a3109

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +263 -9
README.md CHANGED
@@ -1,13 +1,267 @@
1
  ---
 
 
 
 
 
 
 
2
  license: mit
3
  datasets:
4
- - sentence-transformers/xsum
5
- - ccdv/cnn_dailymail
6
- - arxiv-community/arxiv_dataset
7
- - ccdv/arxiv-summarization
8
  language:
9
- - en
10
- metrics:
11
- - bleu
12
- - rouge
13
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ library_name: transformers
3
+ tags:
4
+ - english-to-braille
5
+ - braille translation
6
+ - accessibility
7
+ - educational content
8
+ - text summarization
9
  license: mit
10
  datasets:
11
+ - ccdv/arxiv-summarization
12
+ - xsum
13
+ - cnn_dailymail
 
14
  language:
15
+ - en
16
+ base_model:
17
+ - facebook/bart-large-cnn
18
+ ---
19
+
20
+ ## Model Details
21
+
22
+ ### Model Description
23
+
24
+ <!-- Provide a longer summary of what this model is. -->
25
+
26
+ The English-to-Braille Translator combines advanced natural language processing with a custom conversion algorithm. In the first stage, the model uses a pre-trained and fine-tuned version of the Facebook BART model (facebook/bart-large-cnn) to create abstractive summaries of educational materials drawn from datasets such as ccdv/arxiv-summarization, xsum, and cnn_dailymail.
27
+
28
+ In the second stage, the generated summary is converted into Braille. Instead of a neural translation approach, the system employs a handcrafted dictionary-based mapping mechanism. This mapping converts each English character—and, where applicable, certain contractions and abbreviations—into their corresponding Braille Unicode representations. Multiple versions are supported (including a baseline, an advanced context-aware variant, and our custom implementation) and are evaluated using metrics such as character accuracy, word-level precision/recall, and edit distance.
29
+
30
+
31
+ - **Developed by:** Srimeenakshi K S
32
+ - **Model type:** English-to-Braille Translation and Summarization
33
+ - **Language(s) (NLP):** English
34
+ - **License:** MIT License
35
+ - **Finetuned from model:** facebook/bart-large-cnn
36
+
37
+
38
+ ## Uses
39
+
40
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
41
+
42
+ ### Direct Use
43
+
44
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
45
+ This model can be used as a standalone tool for converting English texts into Braille. Simply input your educational document, and the model will (1) generate a concise summary and (2) translate the summary into Braille characters using the mapping dictionary.
46
+
47
+
48
+ ### Downstream Use
49
+
50
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
51
+ The model is ideal for incorporation in accessibility pipelines – for instance, as a backend service for e-learning platforms, libraries, or digital accessibility applications that aim to provide visually impaired users with Braille-compatible summaries of long educational documents.
52
+
53
+
54
+ ### Out-of-Scope Use
55
+
56
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
57
+ This model is specifically designed for educational content and might not perform well on texts that require nuanced or domain-specific translations beyond the scope of its dictionary. Its dictionary-based conversion approach does not account for context beyond a basic character and common contraction mapping; therefore, it should not be deployed for highly technical documents without additional validation.
58
+
59
+
60
+ ## Bias, Risks, and Limitations
61
+
62
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
63
+ While the summarization component is built on a well-established BART model, the Braille conversion relies on a fixed dictionary. This mapping approach may struggle with ambiguous punctuation, special formatting, or non-standard abbreviations. Users should be aware that:
64
+ - The summarization output might occasionally omit vital context.
65
+ - The dictionary mapping, while effective for most cases, is inherently limited and could misrepresent characters where multiple mappings exist.
66
+ - Evaluation metrics indicate strong performance overall, but edge cases (especially with highly technical jargon) may require manual review.
67
+
68
+
69
+ ### Recommendations
70
+
71
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
72
+
73
+ Deploy the model in contexts where the educational content adheres to a standard vocabulary and formatting. For critical applications, supplement automated outputs with human verification, particularly where accuracy in Braille representation is imperative.
74
+
75
+
76
+ ## How to Get Started with the Model
77
+
78
+ Use the code below to get started with the model.
79
+
80
+ ```
81
+ from transformers import pipeline
82
+
83
+ # Step 1: Summarize the English text using the fine-tuned BART model
84
+ summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
85
+ english_summary = summarizer("Your long educational text goes here.", max_length=200, truncation=True)[0]['summary_text']
86
+
87
+ # Step 2: Convert the summary to Braille using the custom dictionary mapping
88
+ from your_custom_braille_module import braille_to_text_map, braille_to_text # Ensure you import your conversion functions
89
+
90
+ # (For an English-to-Braille conversion, you might invert the mapping)
91
+ def text_to_braille(text, mapping):
92
+ # Invert the mapping (note: for a complete solution, handle duplicate values and contractions appropriately)
93
+ inverted = {v: k for k, v in mapping.items()}
94
+ braille = ''.join(inverted.get(char, char) for char in text.lower())
95
+ return braille
96
+
97
+ mapping = braille_to_text_map()
98
+ braille_summary = text_to_braille(english_summary, mapping)
99
+ print("English Summary:", english_summary)
100
+ print("Braille Summary:", braille_summary)
101
+ ```
102
+
103
+ ## Training Details
104
+
105
+ ### Training Data
106
+
107
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
108
+
109
+ The summarization component of this model was fine-tuned on a mix of educational and general summarization datasets:
110
+
111
+ - [ccdv/arxiv-summarization](https://huggingface.co/datasets/ccdv/arxiv-summarization)
112
+ - [xsum](https://huggingface.co/datasets/EdinburghNLP/xsum)
113
+ - [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail)
114
+
115
+ The Braille translation itself does not involve training but instead relies on a manually curated mapping between English characters (and common contractions) and Braille Unicode characters.
116
+
117
+
118
+ ## Training Procedure
119
+
120
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
121
+
122
+
123
+ #### Preprocessing
124
+
125
+ - **Text Summarization:** Standard preprocessing steps such as tokenization, truncation, and padding were employed to prepare texts for BART.
126
+ - **Braille Conversion:** The mapping was manually constructed using expert knowledge of Braille representations, with additional additions for common contractions.
127
+
128
+ #### Training Hyperparameters (for the summarization model)
129
+
130
+ - **Epochs:** 3
131
+
132
+ - **Batch size:** 4
133
+
134
+ - **Learning rate:** 5e-5
135
+
136
+ - **Precision:** fp16 mixed precision
137
+
138
+ ## Evaluation
139
+
140
+ <!-- This section describes the evaluation protocols and provides the results. -->
141
+
142
+ ### Testing Data, Factors & Metrics
143
+
144
+ #### Testing Data
145
+
146
+ <!-- This should link to a Dataset Card if possible. -->
147
+
148
+ The summarization quality was evaluated on validation splits from xsum and cnn_dailymail, while the Braille conversion was compared against baseline conversions on a set of educational excerpts.
149
+
150
+
151
+ #### Factors
152
+
153
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
154
+
155
+ - Character-level accuracy
156
+
157
+ - Word-level precision, recall, and F1 scores
158
+
159
+ - Edit distance
160
+
161
+ - Overall text similarity
162
+
163
+ #### Metrics
164
+
165
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
166
+
167
+ Evaluation of the Braille translation is based on:
168
+
169
+ - Character Accuracy
170
+
171
+ - Word Precision, Recall, and F1 Score
172
+
173
+ - Edit Distance (Levenshtein distance)
174
+
175
+ - Text Similarity
176
+
177
+
178
+ ### Results
179
+
180
+ In evaluations:
181
+
182
+ - Our custom Braille model showed high character accuracy (above 90%) on average.
183
+
184
+ - Word-level F1 scores and edit distances indicate that the advanced mapping variant performs comparably to context-aware corrections (improving simulated accuracy by approximately 10% in controlled tests).
185
+
186
+
187
+ #### Summary
188
+
189
+ The combined pipeline delivers robust summarization and effective Braille translation for standard educational texts. However, performance may vary on content with unconventional formatting or specialized vocabulary.
190
+
191
+
192
+
193
+ ## Model Examination
194
+
195
+ <!-- Relevant interpretability work for the model goes here -->
196
+
197
+ The evaluation includes detailed comparisons of three Braille conversion methods:
198
+
199
+ - **Our Custom Braille Model:** Uses full mapping with contractions.
200
+
201
+ - **Baseline Braille Translator:** Uses a simplified mapping.
202
+
203
+ - **Advanced Braille Translator:** Incorporates context-aware simulation for slight correction improvements.
204
+
205
+ Further interpretability work can analyze how minor changes in the mapping affect overall accuracy and readability, especially for borderline cases in character conversion.
206
+
207
+
208
+ ## Environmental Impact
209
+
210
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
211
+
212
+ - **Hardware Type:** NVIDIA GeForce RTX 4050
213
+ - **Hours used:** 3 hours for fine-tuning
214
+
215
+ ## Technical Specifications
216
+
217
+ ### Model Architecture and Objective
218
+
219
+ - **Architecture:** Sequence-to-sequence transformer (BART) for summarization, followed by a custom rule-based English-to-Braille mapping.
220
+
221
+ - **Objective:** Generate accessible Braille summaries from long-form educational texts.
222
+
223
+
224
+ ### Compute Infrastructure
225
+
226
+ #### Hardware
227
+
228
+ - **GPU:** NVIDIA GeForce RTX 4050
229
+ - **RAM:** 16GB
230
+
231
+ #### Software
232
+
233
+ - **Framework:** PyTorch
234
+ - **Library Version**: Hugging Face Transformers version 4.44.2
235
+ - **Additional Libraries:** nltk, datasets, rouge, wandb, and scikit-learn for evaluation
236
+
237
+ ## Citation
238
+
239
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
240
+
241
+ **BibTeX:**
242
+
243
+ @model{srimeenakshiks2025eng2braille,
244
+ title={English-to-Braille Translator for Educational Content},
245
+ author={Srimeenakshi K S},
246
+ year={2025},
247
+ publisher={Hugging Face}
248
+ }
249
+
250
+
251
+
252
+ **APA:**
253
+
254
+ Srimeenakshi K S. (2025). English-to-Braille Translator for Educational Content. Hugging Face.
255
+ ## Glossary
256
+
257
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
258
+
259
+ - **Abstractive Summarization:** The process of generating a concise summary that captures the essence of an input document using natural language generation techniques.
260
+
261
+ - **Braille Translation:** The conversion of written text into Braille, typically represented using Unicode Braille patterns.
262
+
263
+ - **Levenshtein Distance:** A metric for measuring the difference between two strings by counting the number of single-character edits required to change one string into the other.
264
+
265
+ ## Model Card Authors
266
+
267
+ - **Author:** Srimeenakshi K S