Spaces:
Runtime error
Runtime error
A newer version of the Gradio SDK is available:
6.3.0
metadata
title: Multilingual Punctuation Capitalization Correction
emoji: π
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
license: apache-2.0
π Multilingual Punctuation & Capitalization Correction
This Space provides an interactive interface for restoring punctuation, fixing capitalization, and detecting sentence boundaries in text across 47 languages.
Features
- Multi-language support: Works with 47 languages including English, French, Spanish, German, Italian, Portuguese, Russian, Turkish, Chinese, Japanese, Arabic, and more
- Three correction modes:
- π Conservative: Minimal changes, preserves original flow
- π With Sentence Boundaries: Splits text into clear sentences
- βοΈ Balanced: Smart chunking for longer texts
- Interactive UI: Compare different correction styles and select the best one
- Copy functionality: Easy clipboard access for each version
Model
This application uses the 1-800-BAD-CODE/xlm-roberta_punctuation_fullstop_truecase model, which is an XLM-RoBERTa model fine-tuned for:
- Punctuation restoration
- True-casing (capitalization)
- Sentence boundary detection
Usage
- Enter text without proper punctuation or capitalization
- Click "Add Punctuation & Capitalization"
- Review the three different correction styles
- Select and copy the version that best fits your needs
Examples
Try these example inputs:
- English: "hello there how are you doing today i hope everything is going well"
- French: "bonjour comment allez vous aujourdhui jespere que tout va bien"
- Spanish: "hola como estas espero que todo este bien contigo y tu familia"
Technical Details
- Base Model: XLM-RoBERTa
- Languages Supported: 47
- Tasks: Punctuation restoration, capitalization, sentence boundary detection
- Framework: Gradio interface with ONNX runtime for efficient inference
Limitations
- Model was primarily trained on news data
- May not perform optimally on conversational or informal text
- Some languages may have better performance than others based on training data distribution