Florence-2-base Fine-tuned for One Piece OCR 🏴☠️
This model is a fine-tuned version of microsoft/Florence-2-base specialized in OCR for manga bubbles, specifically trained on One Piece French scans. It has been optimized for high accuracy on stylized text and integrated with ONNX for seamless browser-based execution.
🚀 Key Features
- Specialized OCR: Trained to handle manga typography, speech bubbles, and complex backgrounds.
- Transformers.js Ready: Includes specialized ONNX weights optimized for WebGPU and WASM.
- High Recall: Fine-tuned for 7 epochs with a focus on capturing every word in dense action scenes.
📊 Evaluation Results
The model was evaluated against the base Florence-2-base model on a test set of 150 One Piece manga panels. This version uses Full Fine-Tuning (FFT).
| Metric | Base Model | Fine-Tuned (FFT) | Total Improvement |
|---|---|---|---|
| CER (Character Error Rate) | 78.77% | 3.13% | +75.64 pts |
| WER (Word Error Rate) | 99.57% | 22.34% | +77.23 pts |
Why the upgrade to FFT?
While LoRA was a good start, Full Fine-Tuning allowed the model to adapt its vision encoder specifically to the One Piece typography. This resulted in a significantly more robust model with near-perfect accuracy on speech bubbles.
🛠️ Usage (Python)
from transformers import AutoProcessor, AutoModelForCausalLM
from PIL import Image
model_id = "Remidesbois/florence2-onepiece-ocr"
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
image = Image.open("manga_panel.png").convert("RGB")
prompt = "<OCR>"
inputs = processor(text=prompt, images=image, return_tensors="pt")
generated_ids = model.generate(
input_ids=inputs["input_ids"],
pixel_values=inputs["pixel_values"],
max_new_tokens=1024,
num_beams=3
)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(generated_text)
🌐 Web Integration (Transformers.js)
The model is compatible with transformers.js (v3+). It includes custom ONNX exports of the vision encoder and decoder.
import { Florence2ForConditionalGeneration, AutoProcessor, RawImage } from '@huggingface/transformers';
const model = await Florence2ForConditionalGeneration.from_pretrained('Remidesbois/florence2-onepiece-ocr', {
dtype: 'fp32',
device: 'webgpu',
});
const processor = await AutoProcessor.from_pretrained('Remidesbois/florence2-onepiece-ocr');
// Use task '<OCR>' for best results
📝 Training Details
- Dataset: ~1000 manually annotated One Piece French scan bubbles.
- Hardware: Trained on NVIDIA RTX GPU.
- Optimization: LoRA fine-tuning with 8-rank adapters, merged into the base model.
- Learning Rate: 5e-5
- Optimizer: AdamW
Created with ❤️ for the One Piece community.
- Downloads last month
- 70
Model tree for Remidesbois/florence2-onepiece-ocr
Base model
microsoft/Florence-2-base