Florence-2-base Fine-tuned for One Piece OCR 🏴‍☠️

This model is a fine-tuned version of microsoft/Florence-2-base specialized in OCR for manga bubbles, specifically trained on One Piece French scans. It has been optimized for high accuracy on stylized text and integrated with ONNX for seamless browser-based execution.

🚀 Key Features

Specialized OCR: Trained to handle manga typography, speech bubbles, and complex backgrounds.
Transformers.js Ready: Includes specialized ONNX weights optimized for WebGPU and WASM.
High Recall: Fine-tuned for 7 epochs with a focus on capturing every word in dense action scenes.

📊 Evaluation Results

The model was evaluated against the base Florence-2-base model on a test set of 150 One Piece manga panels. This version uses Full Fine-Tuning (FFT).

Metric	Base Model	Fine-Tuned (FFT)	Total Improvement
CER (Character Error Rate)	78.77%	3.13%	+75.64 pts
WER (Word Error Rate)	99.57%	22.34%	+77.23 pts

Why the upgrade to FFT?

While LoRA was a good start, Full Fine-Tuning allowed the model to adapt its vision encoder specifically to the One Piece typography. This resulted in a significantly more robust model with near-perfect accuracy on speech bubbles.

🛠️ Usage (Python)

from transformers import AutoProcessor, AutoModelForCausalLM
from PIL import Image

model_id = "Remidesbois/florence2-onepiece-ocr"
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)

image = Image.open("manga_panel.png").convert("RGB")
prompt = "<OCR>"

inputs = processor(text=prompt, images=image, return_tensors="pt")
generated_ids = model.generate(
    input_ids=inputs["input_ids"],
    pixel_values=inputs["pixel_values"],
    max_new_tokens=1024,
    num_beams=3
)

generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(generated_text)

🌐 Web Integration (Transformers.js)

The model is compatible with transformers.js (v3+). It includes custom ONNX exports of the vision encoder and decoder.

import { Florence2ForConditionalGeneration, AutoProcessor, RawImage } from '@huggingface/transformers';

const model = await Florence2ForConditionalGeneration.from_pretrained('Remidesbois/florence2-onepiece-ocr', {
    dtype: 'fp32',
    device: 'webgpu',
});
const processor = await AutoProcessor.from_pretrained('Remidesbois/florence2-onepiece-ocr');

// Use task '<OCR>' for best results

📝 Training Details

Dataset: ~1000 manually annotated One Piece French scan bubbles.
Hardware: Trained on NVIDIA RTX GPU.
Optimization: LoRA fine-tuning with 8-rank adapters, merged into the base model.
Learning Rate: 5e-5
Optimizer: AdamW

Created with ❤️ for the One Piece community.

Downloads last month: 18

Safetensors

Model size

0.2B params

Tensor type

BF16

Model tree for Remidesbois/florence2-onepiece-ocr

Base model

microsoft/Florence-2-base

Quantized

(3)

this model