Spaces:
Runtime error
Runtime error
File size: 2,242 Bytes
2435057 06027df 2435057 06027df 2435057 06027df 2435057 06027df |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 |
---
title: Multilingual Punctuation Capitalization Correction
emoji: π
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
license: apache-2.0
---
# π Multilingual Punctuation & Capitalization Correction
This Space provides an interactive interface for restoring punctuation, fixing capitalization, and detecting sentence boundaries in text across **47 languages**.
## Features
- **Multi-language support**: Works with 47 languages including English, French, Spanish, German, Italian, Portuguese, Russian, Turkish, Chinese, Japanese, Arabic, and more
- **Three correction modes**:
- π **Conservative**: Minimal changes, preserves original flow
- π **With Sentence Boundaries**: Splits text into clear sentences
- βοΈ **Balanced**: Smart chunking for longer texts
- **Interactive UI**: Compare different correction styles and select the best one
- **Copy functionality**: Easy clipboard access for each version
## Model
This application uses the [1-800-BAD-CODE/xlm-roberta_punctuation_fullstop_truecase](https://huggingface.co/1-800-BAD-CODE/xlm-roberta_punctuation_fullstop_truecase) model, which is an XLM-RoBERTa model fine-tuned for:
- Punctuation restoration
- True-casing (capitalization)
- Sentence boundary detection
## Usage
1. Enter text without proper punctuation or capitalization
2. Click "Add Punctuation & Capitalization"
3. Review the three different correction styles
4. Select and copy the version that best fits your needs
## Examples
Try these example inputs:
- English: "hello there how are you doing today i hope everything is going well"
- French: "bonjour comment allez vous aujourdhui jespere que tout va bien"
- Spanish: "hola como estas espero que todo este bien contigo y tu familia"
## Technical Details
- **Base Model**: XLM-RoBERTa
- **Languages Supported**: 47
- **Tasks**: Punctuation restoration, capitalization, sentence boundary detection
- **Framework**: Gradio interface with ONNX runtime for efficient inference
## Limitations
- Model was primarily trained on news data
- May not perform optimally on conversational or informal text
- Some languages may have better performance than others based on training data distribution |