Spaces:
Runtime error
Runtime error
| title: Multilingual Punctuation Capitalization Correction | |
| emoji: π | |
| colorFrom: blue | |
| colorTo: purple | |
| sdk: gradio | |
| sdk_version: 4.44.0 | |
| app_file: app.py | |
| pinned: false | |
| license: apache-2.0 | |
| # π Multilingual Punctuation & Capitalization Correction | |
| This Space provides an interactive interface for restoring punctuation, fixing capitalization, and detecting sentence boundaries in text across **47 languages**. | |
| ## Features | |
| - **Multi-language support**: Works with 47 languages including English, French, Spanish, German, Italian, Portuguese, Russian, Turkish, Chinese, Japanese, Arabic, and more | |
| - **Three correction modes**: | |
| - π **Conservative**: Minimal changes, preserves original flow | |
| - π **With Sentence Boundaries**: Splits text into clear sentences | |
| - βοΈ **Balanced**: Smart chunking for longer texts | |
| - **Interactive UI**: Compare different correction styles and select the best one | |
| - **Copy functionality**: Easy clipboard access for each version | |
| ## Model | |
| This application uses the [1-800-BAD-CODE/xlm-roberta_punctuation_fullstop_truecase](https://huggingface.co/1-800-BAD-CODE/xlm-roberta_punctuation_fullstop_truecase) model, which is an XLM-RoBERTa model fine-tuned for: | |
| - Punctuation restoration | |
| - True-casing (capitalization) | |
| - Sentence boundary detection | |
| ## Usage | |
| 1. Enter text without proper punctuation or capitalization | |
| 2. Click "Add Punctuation & Capitalization" | |
| 3. Review the three different correction styles | |
| 4. Select and copy the version that best fits your needs | |
| ## Examples | |
| Try these example inputs: | |
| - English: "hello there how are you doing today i hope everything is going well" | |
| - French: "bonjour comment allez vous aujourdhui jespere que tout va bien" | |
| - Spanish: "hola como estas espero que todo este bien contigo y tu familia" | |
| ## Technical Details | |
| - **Base Model**: XLM-RoBERTa | |
| - **Languages Supported**: 47 | |
| - **Tasks**: Punctuation restoration, capitalization, sentence boundary detection | |
| - **Framework**: Gradio interface with ONNX runtime for efficient inference | |
| ## Limitations | |
| - Model was primarily trained on news data | |
| - May not perform optimally on conversational or informal text | |
| - Some languages may have better performance than others based on training data distribution |