|
|
--- |
|
|
license: apache-2.0 |
|
|
tags: |
|
|
- paddleocr |
|
|
- ocr |
|
|
- vision-language-model |
|
|
- ernie-kit |
|
|
- historical-document-processing |
|
|
- handwriting-recognition |
|
|
- gothic-script |
|
|
- paleography |
|
|
base_model: PaddlePaddle/PaddleOCR-VL-0.9B |
|
|
language: |
|
|
- es |
|
|
pipeline_tag: image-to-text |
|
|
--- |
|
|
|
|
|
# π Chronos-VL: The 1545 Resurrection Engine |
|
|
|
|
|
> **π Baidu ERNIE AI Developer Challenge Submission** |
|
|
|
|
|
**Chronos-VL** is a specialized fine-tune of **PaddleOCR-VL-0.9B**, engineered to decipher Early Modern Spanish Gothic script (c. 1545). Trained on the **RODRIGO Corpus** using Baidu's **ERNIEKit** on an NVIDIA A100 GPU, this model bridges the 500-year gap between ancient archives and modern AI. |
|
|
|
|
|
While standard OCR models fail on these historical manuscripts due to complex calligraphy, ligatures, and ink degradation, Chronos-VL achieves near-perfect transcription for clear text lines. |
|
|
|
|
|
## π Performance Benchmark |
|
|
|
|
|
We conducted a side-by-side evaluation on 100 unseen historical samples using a custom A/B testing framework. |
|
|
|
|
|
| Metric | Baseline (Standard PaddleOCR) | Chronos-VL (Ours) | Improvement | |
|
|
| :--- | :--- | :--- | :--- | |
|
|
| **Median Character Error Rate (CER)** | 19.82% | **1.64%** | **12x Better** | |
|
|
| **Excellent Predictions (<5% Error)** | 1% | **77%** | **76x Increase** | |
|
|
| **Word Error Rate (WER)** | 74.44% | **17.35%** | **4x Better** | |
|
|
|
|
|
|
|
|
## π Interactive Demo (Colab) |
|
|
|
|
|
Don't just take our word for it. Run the **Chronos System** yourself. |
|
|
Our interactive Gradio app allows you to: |
|
|
1. **Compare** Baseline vs. Chronos-VL side-by-side. |
|
|
2. **Visualize** the "X-Ray" overlay (Visual Restoration). |
|
|
3. **Translate** the archaic text to Modern Spanish and English. |
|
|
|
|
|
[](https://colab.research.google.com/drive/12ccCTTvJc9G6AfyvGPg0pG528bCXelaK?usp=sharing) |
|
|
|
|
|
## π» Usage (Python) |
|
|
|
|
|
To use this model in your own code, you need `paddleocr` and `huggingface_hub`. |
|
|
|
|
|
```python |
|
|
from huggingface_hub import snapshot_download |
|
|
from paddleocr import PaddleOCR |
|
|
|
|
|
# 1. Download the Fine-Tuned Weights |
|
|
local_dir = snapshot_download(repo_id="Deepesh-001/rodrigo-ocr-model") |
|
|
|
|
|
# 2. Initialize the Engine |
|
|
# We use use_angle_cls=True to handle rotated manuscript lines |
|
|
ocr = PaddleOCR( |
|
|
rec_model_dir=local_dir, |
|
|
use_angle_cls=True, |
|
|
use_gpu=True |
|
|
) |
|
|
|
|
|
# 3. Run Inference on a 1545 Manuscript |
|
|
image_path = "rodrigo_sample.png" |
|
|
result = ocr.ocr(image_path, cls=True) |
|
|
|
|
|
for line in result[0]: |
|
|
text = line[1][0] |
|
|
confidence = line[1][1] |
|
|
print(f"Detected: {text} | Confidence: {confidence:.2f}") |
|
|
``` |
|
|
|
|
|
## π§ The Chronos Pipeline (System Design) |
|
|
|
|
|
This model is the core perception layer of the broader **Chronos System**: |
|
|
|
|
|
1. **Visual Perception (AI):** Chronos-VL extracts raw Gothic text (e.g., *"dixo estonces"*). |
|
|
2. **Semantic Normalization (Logic):** A post-processing engine normalizes Archaic Castilian spelling to Modern Spanish (e.g., *"dijo entonces"*). |
|
|
3. **Global Access (Translation):** Automated translation to English, making Spanish heritage accessible to non-Spanish speakers. |
|
|
|
|
|
## π Dataset Info |
|
|
Trained on the **RODRIGO Corpus** (Spanish State Archives). |
|
|
- **Era:** 1545 |
|
|
- **Script:** Gothic Cursive |
|
|
- **Size:** 9,000 text lines (80/20 Split) |
|
|
- **Format:** Page-XML converted to ERNIEKit JSONL |
|
|
|
|
|
## π Links |
|
|
- **Code Repository:** [ https://github.com/deepeshahlawat/Chronos-VL ] |
|
|
- **Project Video:** [https://www.youtube.com/watch?v=PaK24VT_3Jk] |
|
|
|
|
|
*Built with β€οΈ using PaddlePaddle and ERNIEKit.* |