srimeenakshiks
/

Using_Abstractive_NLP_Models_for_Auto-Generating_Braille_Summaries

@@ -1,13 +1,267 @@
 ---
 license: mit
 datasets:
-- sentence-transformers/xsum
-- ccdv/cnn_dailymail
-- arxiv-community/arxiv_dataset
-- ccdv/arxiv-summarization
 language:
-- en
-metrics:
-- bleu
-- rouge
----

 ---
+library_name: transformers
+tags:
+  - english-to-braille
+  - braille translation
+  - accessibility
+  - educational content
+  - text summarization
 license: mit
 datasets:
+  - ccdv/arxiv-summarization
+  - xsum
+  - cnn_dailymail
 language:
+  - en
+base_model:
+  - facebook/bart-large-cnn
+---
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+The English-to-Braille Translator combines advanced natural language processing with a custom conversion algorithm. In the first stage, the model uses a pre-trained and fine-tuned version of the Facebook BART model (facebook/bart-large-cnn) to create abstractive summaries of educational materials drawn from datasets such as ccdv/arxiv-summarization, xsum, and cnn_dailymail.
+In the second stage, the generated summary is converted into Braille. Instead of a neural translation approach, the system employs a handcrafted dictionary-based mapping mechanism. This mapping converts each English character—and, where applicable, certain contractions and abbreviations—into their corresponding Braille Unicode representations. Multiple versions are supported (including a baseline, an advanced context-aware variant, and our custom implementation) and are evaluated using metrics such as character accuracy, word-level precision/recall, and edit distance.
+- **Developed by:** Srimeenakshi K S
+- **Model type:** English-to-Braille Translation and Summarization
+- **Language(s) (NLP):** English
+- **License:** MIT License
+- **Finetuned from model:** facebook/bart-large-cnn
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+This model can be used as a standalone tool for converting English texts into Braille. Simply input your educational document, and the model will (1) generate a concise summary and (2) translate the summary into Braille characters using the mapping dictionary.
+### Downstream Use
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+The model is ideal for incorporation in accessibility pipelines – for instance, as a backend service for e-learning platforms, libraries, or digital accessibility applications that aim to provide visually impaired users with Braille-compatible summaries of long educational documents.
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+This model is specifically designed for educational content and might not perform well on texts that require nuanced or domain-specific translations beyond the scope of its dictionary. Its dictionary-based conversion approach does not account for context beyond a basic character and common contraction mapping; therefore, it should not be deployed for highly technical documents without additional validation.
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+While the summarization component is built on a well-established BART model, the Braille conversion relies on a fixed dictionary. This mapping approach may struggle with ambiguous punctuation, special formatting, or non-standard abbreviations. Users should be aware that:
+ - The summarization output might occasionally omit vital context.
+ - The dictionary mapping, while effective for most cases, is inherently limited and could misrepresent characters where multiple mappings exist.
+ - Evaluation metrics indicate strong performance overall, but edge cases (especially with highly technical jargon) may require manual review.
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Deploy the model in contexts where the educational content adheres to a standard vocabulary and formatting. For critical applications, supplement automated outputs with human verification, particularly where accuracy in Braille representation is imperative.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+```
+from transformers import pipeline
+# Step 1: Summarize the English text using the fine-tuned BART model
+summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
+english_summary = summarizer("Your long educational text goes here.", max_length=200, truncation=True)[0]['summary_text']
+# Step 2: Convert the summary to Braille using the custom dictionary mapping
+from your_custom_braille_module import braille_to_text_map, braille_to_text  # Ensure you import your conversion functions
+# (For an English-to-Braille conversion, you might invert the mapping)
+def text_to_braille(text, mapping):
+    # Invert the mapping (note: for a complete solution, handle duplicate values and contractions appropriately)
+    inverted = {v: k for k, v in mapping.items()}
+    braille = ''.join(inverted.get(char, char) for char in text.lower())
+    return braille
+mapping = braille_to_text_map()
+braille_summary = text_to_braille(english_summary, mapping)
+print("English Summary:", english_summary)
+print("Braille Summary:", braille_summary)
+```
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+The summarization component of this model was fine-tuned on a mix of educational and general summarization datasets:
+ - [ccdv/arxiv-summarization](https://huggingface.co/datasets/ccdv/arxiv-summarization)
+ - [xsum](https://huggingface.co/datasets/EdinburghNLP/xsum)
+ - [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail)
+The Braille translation itself does not involve training but instead relies on a manually curated mapping between English characters (and common contractions) and Braille Unicode characters.
+## Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing
+ - **Text Summarization:** Standard preprocessing steps such as tokenization, truncation, and padding were employed to prepare texts for BART.
+ - **Braille Conversion:** The mapping was manually constructed using expert knowledge of Braille representations, with additional additions for common contractions.
+#### Training Hyperparameters (for the summarization model)
+- **Epochs:** 3
+- **Batch size:** 4
+- **Learning rate:** 5e-5
+- **Precision:** fp16 mixed precision
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+The summarization quality was evaluated on validation splits from xsum and cnn_dailymail, while the Braille conversion was compared against baseline conversions on a set of educational excerpts.
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+ - Character-level accuracy
+ - Word-level precision, recall, and F1 scores
+ - Edit distance
+ - Overall text similarity
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+Evaluation of the Braille translation is based on:
+ - Character Accuracy
+ - Word Precision, Recall, and F1 Score
+ - Edit Distance (Levenshtein distance)
+ - Text Similarity
+### Results
+In evaluations:
+ - Our custom Braille model showed high character accuracy (above 90%) on average.
+ - Word-level F1 scores and edit distances indicate that the advanced mapping variant performs comparably to context-aware corrections (improving simulated accuracy by approximately 10% in controlled tests).
+#### Summary
+The combined pipeline delivers robust summarization and effective Braille translation for standard educational texts. However, performance may vary on content with unconventional formatting or specialized vocabulary.
+## Model Examination
+<!-- Relevant interpretability work for the model goes here -->
+The evaluation includes detailed comparisons of three Braille conversion methods:
+ - **Our Custom Braille Model:** Uses full mapping with contractions.
+ - **Baseline Braille Translator:** Uses a simplified mapping.
+ - **Advanced Braille Translator:** Incorporates context-aware simulation for slight correction improvements.
+Further interpretability work can analyze how minor changes in the mapping affect overall accuracy and readability, especially for borderline cases in character conversion.
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+- **Hardware Type:** NVIDIA GeForce RTX 4050
+- **Hours used:** 3 hours for fine-tuning
+## Technical Specifications
+### Model Architecture and Objective
+ - **Architecture:** Sequence-to-sequence transformer (BART) for summarization, followed by a custom rule-based English-to-Braille mapping.
+ - **Objective:** Generate accessible Braille summaries from long-form educational texts.
+### Compute Infrastructure
+#### Hardware
+- **GPU:** NVIDIA GeForce RTX 4050
+- **RAM:** 16GB
+#### Software
+- **Framework:** PyTorch
+- **Library Version**: Hugging Face Transformers version 4.44.2
+- **Additional Libraries:** nltk, datasets, rouge, wandb, and scikit-learn for evaluation
+## Citation
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+@model{srimeenakshiks2025eng2braille,
+  title={English-to-Braille Translator for Educational Content},
+  author={Srimeenakshi K S},
+  year={2025},
+  publisher={Hugging Face}
+}
+**APA:**
+Srimeenakshi K S. (2025). English-to-Braille Translator for Educational Content. Hugging Face.
+## Glossary
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+ - **Abstractive Summarization:** The process of generating a concise summary that captures the essence of an input document using natural language generation techniques.
+ - **Braille Translation:** The conversion of written text into Braille, typically represented using Unicode Braille patterns.
+ - **Levenshtein Distance:** A metric for measuring the difference between two strings by counting the number of single-character edits required to change one string into the other.
+## Model Card Authors
+- **Author:** Srimeenakshi K S