---
title: Multilingual Punctuation Capitalization Correction
emoji: 🌍
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
license: apache-2.0
---

# 🌍 Multilingual Punctuation & Capitalization Correction

This Space provides an interactive interface for restoring punctuation, fixing capitalization, and detecting sentence boundaries in text across **47 languages**.

## Features

- **Multi-language support**: Works with 47 languages including English, French, Spanish, German, Italian, Portuguese, Russian, Turkish, Chinese, Japanese, Arabic, and more
- **Three correction modes**:
  - 📝 **Conservative**: Minimal changes, preserves original flow
  - 📖 **With Sentence Boundaries**: Splits text into clear sentences
  - ⚖️ **Balanced**: Smart chunking for longer texts
- **Interactive UI**: Compare different correction styles and select the best one
- **Copy functionality**: Easy clipboard access for each version

## Model

This application uses the [1-800-BAD-CODE/xlm-roberta_punctuation_fullstop_truecase](https://huggingface.co/1-800-BAD-CODE/xlm-roberta_punctuation_fullstop_truecase) model, which is an XLM-RoBERTa model fine-tuned for:
- Punctuation restoration
- True-casing (capitalization)
- Sentence boundary detection

## Usage

1. Enter text without proper punctuation or capitalization
2. Click "Add Punctuation & Capitalization"
3. Review the three different correction styles
4. Select and copy the version that best fits your needs

## Examples

Try these example inputs:
- English: "hello there how are you doing today i hope everything is going well"
- French: "bonjour comment allez vous aujourdhui jespere que tout va bien"
- Spanish: "hola como estas espero que todo este bien contigo y tu familia"

## Technical Details

- **Base Model**: XLM-RoBERTa
- **Languages Supported**: 47
- **Tasks**: Punctuation restoration, capitalization, sentence boundary detection
- **Framework**: Gradio interface with ONNX runtime for efficient inference

## Limitations

- Model was primarily trained on news data
- May not perform optimally on conversational or informal text
- Some languages may have better performance than others based on training data distribution