For Russian Gutenberg/other sources:
Strip editorial notes, footnotes, chapter headers, and license boilerplateDetect and remove OCR artifacts (e.g., weird hyphenation, broken Cyrillic). This matters more for Russian because OCR noise can be substantial.
· Sign up or log in to comment