A newer version of the Gradio SDK is available: 6.9.0
Dyslexic Writing-Pattern Classifier (Sinhala)
This module implements an interpretable, rule-based dyslexic writing-pattern classifier for Sinhala text.
Unlike traditional machine-learning classifiers, this component focuses on pattern inference and explainability, rather than predictive accuracy.
It is designed to analyze how dyslexic writing manifests, not merely whether dyslexia is present.
Purpose
- Identify dominant dyslexic writing patterns in Sinhala text
- Provide explainable, linguistically grounded analysis
- Support educational and research-oriented dyslexia-aware systems
This module is executed only after an essay has been identified as dyslexic by the Binary Dyslexia Detector.
Core Design Principle
Dyslexia is expressed through consistent patterns of surface-level writing errors, not isolated mistakes.
Therefore, this classifier infers patterns using rule-based dominance of error signals, rather than supervised learning.
Writing Patterns Identified
The system currently identifies the following dyslexic writing patterns:
Orthographic Instability
Frequent character omissions, additions, or diacritic lossPhonetic Confusion
Character substitutions reflecting phonetic similarityMixed Dyslexic Pattern
Co-occurrence of multiple dominant error typesNo Dominant Pattern
Absence of consistent dyslexic error behaviorWord Boundary Confusion (when applicable)
Spacing and word segmentation errors
These patterns are derived from dyslexia-related literature and adapted for Sinhala writing.
Processing Pipeline
1. Sentence-Level Analysis
For each sentence:
- Clean and dyslexic versions are compared
- Surface error features are extracted:
- Character addition
- Character omission
- Character substitution
- Diacritic loss
- Spacing issues
- A rule-based inference engine assigns a sentence-level writing pattern
2. Essay-Level Aggregation
Because the dataset does not provide explicit essay boundaries:
- Essays are approximated using fixed-size sentence windows (pseudo-essays)
- Sentence-level patterns are aggregated per essay
3. Dominant Pattern Classification
For each essay:
- The most frequent pattern is selected as the dominant pattern
- A confidence score is computed as:
[ Confidence = \frac{\text{Number of sentences supporting dominant pattern}} {\text{Total number of sentences in essay}} ]
- Dominance strength is categorized as:
- Strong Dominance
- Moderate Dominance
- Weak / Mixed
Outputs
For each essay, the classifier produces:
- Dominant dyslexic writing pattern
- Pattern dominance confidence
- Dominance strength label
- Sentence-level pattern breakdown (for explainability)
Example Output
{
"dominant_pattern": "Orthographic Instability",
"confidence": 0.6,
"dominance_strength": "Strong Dominance"
}
---
## Evaluation Strategy
This component does not use supervised evaluation metrics such as accuracy or F1-score.
Reason:
- Essay-level pattern labels are inferred, not manually annotated
- Reporting accuracy would result in label leakage
Instead, evaluation is performed using:
- Pattern distribution analysis
- Confidence distribution statistics
- Qualitative case studies with sentence-level evidence
This approach aligns with best practices in dyslexia-related linguistic analysis.
## Notebooks
notebooks/
βββ 01_surface_feature_extraction_and_pattern_inference_v3.ipynb
βββ 02_essay_level_dyslexic_pattern_profiling.ipynb
These notebooks document the full development and validation process.
## Limitations
Essay boundaries are approximated using fixed-size sentence windows
The system does not perform clinical diagnosis
Pattern definitions may evolve with expert validation
## Role in the Overall System
(Binary Dyslexia Detector)
β
Dyslexic Essay
β
Writing-Pattern Classifier
β
Pattern Profile + Confidence
## Disclaimer
This module is intended for research and educational purposes only and should not be used for clinical diagnosis.
Generated CSV artifacts are intentionally excluded from version control and can be reproduced by executing the notebooks or pipeline.