CogniOCR LoRA

This repository contains a LoRA adapter fine-tuned for OCR from the base model unsloth/Qwen3.5-0.8B.

The adapter was trained with Unsloth and TRL on Vietnamese handwriting OCR data. It expects an image prompt and returns the visible text, preserving line breaks when possible.

Load With Unsloth

from unsloth import FastVisionModel

model, tokenizer = FastVisionModel.from_pretrained(
    model_name="tungedng2710/cogniocr",
    max_seq_length=2048,
    load_in_4bit=False,
    load_in_16bit=True,
)

Training

  • Base model: unsloth/Qwen3.5-0.8B
  • Adapter type: LoRA
  • LoRA rank: 16
  • LoRA alpha: 16
  • LoRA dropout: 0
  • Fine-tuned layers: vision layers, language layers, attention modules, and MLP modules

Framework Versions

  • PEFT 0.18.1
  • TRL 0.24.0
  • Transformers 5.7.0
  • PyTorch 2.9.0
  • Datasets 4.3.0
  • Tokenizers 0.22.2

Example Usage

from pathlib import Path

from predict import (
    OCR_PROMPT,
    build_inputs,
    decode_new_tokens,
    load_image,
)


SCRIPT_DIR = Path(__file__).resolve().parent
MODEL_ID = "tungedng2710/cogniocr"
IMAGE_PATH = SCRIPT_DIR / "test1.png"
MAX_SEQ_LENGTH = 2048
MAX_NEW_TOKENS = 512


def main():
    try:
        import torch
        from unsloth import FastVisionModel, is_bfloat16_supported
    except ImportError as exc:
        raise SystemExit(
            "Missing dependency: unsloth/torch. Install the VLM dependencies first:\n"
            "  pip install --upgrade --force-reinstall --no-cache-dir unsloth "
            "unsloth_zoo\n"
            "  pip install -U torch pillow torchvision"
        ) from exc

    image = load_image(IMAGE_PATH)
    dtype = torch.bfloat16 if is_bfloat16_supported() else torch.float16

    print(f"Loading model: {MODEL_ID}")
    model, tokenizer = FastVisionModel.from_pretrained(
        model_name=MODEL_ID,
        max_seq_length=MAX_SEQ_LENGTH,
        dtype=dtype,
        load_in_4bit=False,
        load_in_16bit=True,
    )

    FastVisionModel.for_inference(model)

    device = next(model.parameters()).device
    inputs = build_inputs(tokenizer, image, OCR_PROMPT, device)
    prompt_token_count = inputs["input_ids"].shape[-1]

    with torch.inference_mode():
        outputs = model.generate(
            **inputs,
            max_new_tokens=MAX_NEW_TOKENS,
            use_cache=True,
            do_sample=False,
        )

    text = decode_new_tokens(tokenizer, outputs, prompt_token_count)
    print("\nOCR result:")
    print(text)


if __name__ == "__main__":
    main()
Downloads last month
35
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tungedng2710/cogniocr

Adapter
(21)
this model