asdfasdfdsafdsa's picture
Upload 3 files
06027df verified
---
title: Multilingual Punctuation Capitalization Correction
emoji: 🌍
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
license: apache-2.0
---
# 🌍 Multilingual Punctuation & Capitalization Correction
This Space provides an interactive interface for restoring punctuation, fixing capitalization, and detecting sentence boundaries in text across **47 languages**.
## Features
- **Multi-language support**: Works with 47 languages including English, French, Spanish, German, Italian, Portuguese, Russian, Turkish, Chinese, Japanese, Arabic, and more
- **Three correction modes**:
- πŸ“ **Conservative**: Minimal changes, preserves original flow
- πŸ“– **With Sentence Boundaries**: Splits text into clear sentences
- βš–οΈ **Balanced**: Smart chunking for longer texts
- **Interactive UI**: Compare different correction styles and select the best one
- **Copy functionality**: Easy clipboard access for each version
## Model
This application uses the [1-800-BAD-CODE/xlm-roberta_punctuation_fullstop_truecase](https://huggingface.co/1-800-BAD-CODE/xlm-roberta_punctuation_fullstop_truecase) model, which is an XLM-RoBERTa model fine-tuned for:
- Punctuation restoration
- True-casing (capitalization)
- Sentence boundary detection
## Usage
1. Enter text without proper punctuation or capitalization
2. Click "Add Punctuation & Capitalization"
3. Review the three different correction styles
4. Select and copy the version that best fits your needs
## Examples
Try these example inputs:
- English: "hello there how are you doing today i hope everything is going well"
- French: "bonjour comment allez vous aujourdhui jespere que tout va bien"
- Spanish: "hola como estas espero que todo este bien contigo y tu familia"
## Technical Details
- **Base Model**: XLM-RoBERTa
- **Languages Supported**: 47
- **Tasks**: Punctuation restoration, capitalization, sentence boundary detection
- **Framework**: Gradio interface with ONNX runtime for efficient inference
## Limitations
- Model was primarily trained on news data
- May not perform optimally on conversational or informal text
- Some languages may have better performance than others based on training data distribution