Musa07
/

Florence-2-large-FormClassification-ft

Generated from Trainer

Model card Files Files and versions

Metrics Training metrics Community

Florence-2-large-FormClassification-ft / README.md

Musa07's picture

Update README.md

b92d88a verified over 1 year ago

|

3.94 kB

	---
	license: mit
	base_model: microsoft/Florence-2-large-ft
	tags:
	- image-to-text
	- generated_from_trainer
	model-index:
	- name: Florence-2-large-FormClassification-ft
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# Florence-2-large-FormClassification-ft

	This model is a fine-tuned version of [microsoft/Florence-2-large-ft](https://huggingface.co/microsoft/Florence-2-large-ft) on an Musa07/Florence_ft dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.2107

	### Inference Code versions

	# Code
	from transformers import AutoProcessor, AutoModelForCausalLM
	import matplotlib.pyplot as plt
	import matplotlib.patches as patches

	model = AutoModelForCausalLM.from_pretrained("Musa07/Florence-2-large-FormClassification-ft", trust_remote_code=True, device_map='cuda') # Load the model on GPU if available
	processor = AutoProcessor.from_pretrained("Musa07/Florence-2-large-FormClassification-ft", trust_remote_code=True)

	def run_example(task_prompt, image, max_new_tokens=128):

	prompt = task_prompt
	inputs = processor(text=prompt, images=image, return_tensors="pt")
	generated_ids = model.generate(
	input_ids=inputs["input_ids"].cuda(),
	pixel_values=inputs["pixel_values"].cuda(),
	max_new_tokens=max_new_tokens,
	early_stopping=False,
	do_sample=False,
	num_beams=3,
	)
	generated_text = processor.batch_decode(generated_ids, skip_special_tokens=False)[0]
	parsed_answer = processor.post_process_generation(
	generated_text,
	task=task_prompt,
	image_size=(image.width, image.height)
	)
	return parsed_answer

	def plot_bbox(image, data):

	fig, ax = plt.subplots()

	# Display the image
	ax.imshow(image)

	# Plot each bounding box
	for bbox, label in zip(data['bboxes'], data['labels']):
	# Unpack the bounding box coordinates
	x1, y1, x2, y2 = bbox
	# Create a Rectangle patch
	rect = patches.Rectangle((x1, y1), x2-x1, y2-y1, linewidth=1, edgecolor='r', facecolor='none')
	# Add the rectangle to the Axes
	ax.add_patch(rect)
	# Annotate the label
	plt.text(x1, y1, label, color='white', fontsize=8, bbox=dict(facecolor='red', alpha=0.5))

	# Remove the axis ticks and labels
	ax.axis('off')

	# Show the plot
	plt.show()

	image = Image.open('1.jpeg')
	parsed_answer = run_example("<OD>", image=image)
	print(parsed_answer)
	plot_bbox(image, parsed_answer["<OD>"])



	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 1e-06
	- train_batch_size: 24
	- eval_batch_size: 24
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- num_epochs: 10

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|
	\| 0.0188 \| 1.0 \| 23 \| 0.2151 \|
	\| 0.0127 \| 2.0 \| 46 \| 0.2113 \|
	\| 0.0078 \| 3.0 \| 69 \| 0.2061 \|
	\| 0.0047 \| 4.0 \| 92 \| 0.2102 \|
	\| 0.0042 \| 5.0 \| 115 \| 0.2078 \|
	\| 0.003 \| 6.0 \| 138 \| 0.2108 \|
	\| 0.0022 \| 7.0 \| 161 \| 0.2110 \|
	\| 0.0029 \| 8.0 \| 184 \| 0.2117 \|
	\| 0.0019 \| 9.0 \| 207 \| 0.2114 \|
	\| 0.0023 \| 10.0 \| 230 \| 0.2107 \|


	### Framework versions

	- Transformers 4.44.0.dev0
	- Pytorch 2.3.1+cu121
	- Datasets 2.20.0
	- Tokenizers 0.19.1