Musa07's picture
Update README.md
b92d88a verified
|
raw
history blame
3.94 kB
metadata
license: mit
base_model: microsoft/Florence-2-large-ft
tags:
  - image-to-text
  - generated_from_trainer
model-index:
  - name: Florence-2-large-FormClassification-ft
    results: []

Florence-2-large-FormClassification-ft

This model is a fine-tuned version of microsoft/Florence-2-large-ft on an Musa07/Florence_ft dataset. It achieves the following results on the evaluation set:

  • Loss: 0.2107

Inference Code versions

Code

from transformers import AutoProcessor, AutoModelForCausalLM
import matplotlib.pyplot as plt
import matplotlib.patches as patches

model = AutoModelForCausalLM.from_pretrained("Musa07/Florence-2-large-FormClassification-ft", trust_remote_code=True, device_map='cuda') # Load the model on GPU if available
processor = AutoProcessor.from_pretrained("Musa07/Florence-2-large-FormClassification-ft", trust_remote_code=True)

def run_example(task_prompt, image, max_new_tokens=128):

  prompt = task_prompt
  inputs = processor(text=prompt, images=image, return_tensors="pt")
  generated_ids = model.generate(
    input_ids=inputs["input_ids"].cuda(),
    pixel_values=inputs["pixel_values"].cuda(),
    max_new_tokens=max_new_tokens,
    early_stopping=False,
    do_sample=False,
    num_beams=3,
  )
  generated_text = processor.batch_decode(generated_ids, skip_special_tokens=False)[0]
  parsed_answer = processor.post_process_generation(
      generated_text,
      task=task_prompt,
      image_size=(image.width, image.height)
  )
  return parsed_answer

def plot_bbox(image, data):

  fig, ax = plt.subplots()

  # Display the image
  ax.imshow(image)

  # Plot each bounding box
  for bbox, label in zip(data['bboxes'], data['labels']):
      # Unpack the bounding box coordinates
      x1, y1, x2, y2 = bbox
      # Create a Rectangle patch
      rect = patches.Rectangle((x1, y1), x2-x1, y2-y1, linewidth=1, edgecolor='r', facecolor='none')
      # Add the rectangle to the Axes
      ax.add_patch(rect)
      # Annotate the label
      plt.text(x1, y1, label, color='white', fontsize=8, bbox=dict(facecolor='red', alpha=0.5))

  # Remove the axis ticks and labels
  ax.axis('off')

  # Show the plot
  plt.show()
  

image = Image.open('1.jpeg') parsed_answer = run_example("", image=image) print(parsed_answer) plot_bbox(image, parsed_answer[""])

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 24
  • eval_batch_size: 24
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 10

Training results

Training Loss Epoch Step Validation Loss
0.0188 1.0 23 0.2151
0.0127 2.0 46 0.2113
0.0078 3.0 69 0.2061
0.0047 4.0 92 0.2102
0.0042 5.0 115 0.2078
0.003 6.0 138 0.2108
0.0022 7.0 161 0.2110
0.0029 8.0 184 0.2117
0.0019 9.0 207 0.2114
0.0023 10.0 230 0.2107

Framework versions

  • Transformers 4.44.0.dev0
  • Pytorch 2.3.1+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1