Florence-2-base Fine-tuned for One Piece OCR 🏴‍☠️

This model is a fine-tuned version of microsoft/Florence-2-base specialized in OCR for manga bubbles, specifically trained on One Piece French scans. It has been optimized for high accuracy on stylized text and integrated with ONNX for seamless browser-based execution.

🚀 Key Features

  • Specialized OCR: Trained to handle manga typography, speech bubbles, and complex backgrounds.
  • Transformers.js Ready: Includes specialized ONNX weights optimized for WebGPU and WASM.
  • High Recall: Fine-tuned for 7 epochs with a focus on capturing every word in dense action scenes.

📊 Evaluation Results

The model was evaluated against the base Florence-2-base model on a test set of 150 One Piece manga panels. This version uses Full Fine-Tuning (FFT).

Metric Base Model Fine-Tuned (FFT) Total Improvement
CER (Character Error Rate) 78.77% 3.13% +75.64 pts
WER (Word Error Rate) 99.57% 22.34% +77.23 pts

Why the upgrade to FFT?

While LoRA was a good start, Full Fine-Tuning allowed the model to adapt its vision encoder specifically to the One Piece typography. This resulted in a significantly more robust model with near-perfect accuracy on speech bubbles.

🛠️ Usage (Python)

from transformers import AutoProcessor, AutoModelForCausalLM
from PIL import Image

model_id = "Remidesbois/florence2-onepiece-ocr"
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)

image = Image.open("manga_panel.png").convert("RGB")
prompt = "<OCR>"

inputs = processor(text=prompt, images=image, return_tensors="pt")
generated_ids = model.generate(
    input_ids=inputs["input_ids"],
    pixel_values=inputs["pixel_values"],
    max_new_tokens=1024,
    num_beams=3
)

generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(generated_text)

🌐 Web Integration (Transformers.js)

The model is compatible with transformers.js (v3+). It includes custom ONNX exports of the vision encoder and decoder.

import { Florence2ForConditionalGeneration, AutoProcessor, RawImage } from '@huggingface/transformers';

const model = await Florence2ForConditionalGeneration.from_pretrained('Remidesbois/florence2-onepiece-ocr', {
    dtype: 'fp32',
    device: 'webgpu',
});
const processor = await AutoProcessor.from_pretrained('Remidesbois/florence2-onepiece-ocr');

// Use task '<OCR>' for best results

📝 Training Details

  • Dataset: ~1000 manually annotated One Piece French scan bubbles.
  • Hardware: Trained on NVIDIA RTX GPU.
  • Optimization: LoRA fine-tuning with 8-rank adapters, merged into the base model.
  • Learning Rate: 5e-5
  • Optimizer: AdamW

Created with ❤️ for the One Piece community.

Downloads last month
70
Safetensors
Model size
0.2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Remidesbois/florence2-onepiece-ocr

Quantized
(3)
this model