File size: 2,242 Bytes
2435057
06027df
 
 
 
2435057
06027df
2435057
 
06027df
2435057
 
06027df
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
---
title: Multilingual Punctuation Capitalization Correction
emoji: 🌍
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
license: apache-2.0
---

# 🌍 Multilingual Punctuation & Capitalization Correction

This Space provides an interactive interface for restoring punctuation, fixing capitalization, and detecting sentence boundaries in text across **47 languages**.

## Features

- **Multi-language support**: Works with 47 languages including English, French, Spanish, German, Italian, Portuguese, Russian, Turkish, Chinese, Japanese, Arabic, and more
- **Three correction modes**:
  - πŸ“ **Conservative**: Minimal changes, preserves original flow
  - πŸ“– **With Sentence Boundaries**: Splits text into clear sentences
  - βš–οΈ **Balanced**: Smart chunking for longer texts
- **Interactive UI**: Compare different correction styles and select the best one
- **Copy functionality**: Easy clipboard access for each version

## Model

This application uses the [1-800-BAD-CODE/xlm-roberta_punctuation_fullstop_truecase](https://huggingface.co/1-800-BAD-CODE/xlm-roberta_punctuation_fullstop_truecase) model, which is an XLM-RoBERTa model fine-tuned for:
- Punctuation restoration
- True-casing (capitalization)
- Sentence boundary detection

## Usage

1. Enter text without proper punctuation or capitalization
2. Click "Add Punctuation & Capitalization"
3. Review the three different correction styles
4. Select and copy the version that best fits your needs

## Examples

Try these example inputs:
- English: "hello there how are you doing today i hope everything is going well"
- French: "bonjour comment allez vous aujourdhui jespere que tout va bien"
- Spanish: "hola como estas espero que todo este bien contigo y tu familia"

## Technical Details

- **Base Model**: XLM-RoBERTa
- **Languages Supported**: 47
- **Tasks**: Punctuation restoration, capitalization, sentence boundary detection
- **Framework**: Gradio interface with ONNX runtime for efficient inference

## Limitations

- Model was primarily trained on news data
- May not perform optimally on conversational or informal text
- Some languages may have better performance than others based on training data distribution