Nefflymicn
/

gemma3-4b-bean-captioning

Image Classification

Model card Files Files and versions

gemma3-4b-bean-captioning / README.md

Nefflymicn's picture

Upload README.md with huggingface_hub

0d44bdc verified 20 days ago

|

history blame contribute delete

2.95 kB

	---
	library_name: peft
	base_model: google/gemma-3-4b-it
	tags:
	- vision
	- image-classification
	- beans
	- plant-disease
	- gemma-3
	- lora
	- fine-tuned
	license: gemma
	---

	# Gemma-3-4B Fine-tuned for Bean Disease Classification

	This model is a fine-tuned version of [google/gemma-3-4b-it](https://huggingface.co/google/gemma-3-4b-it) for classifying bean plant diseases.

	## Model Description

	- Base Model: Gemma-3-4B-IT (Vision)
	- Fine-tuning Method: LoRA (r=8, alpha=16)
	- Dataset: [beans](https://huggingface.co/datasets/beans) (100 samples)
	- Task: Image captioning / disease classification
	- Final Validation Loss: 0.001 (excellent!)

	## Classes

	1. Healthy bean plant
	2. Angular leaf spot disease
	3. Bean rust disease

	## Usage

	```python
	from transformers import AutoProcessor, Gemma3ForConditionalGeneration
	from peft import PeftModel
	from PIL import Image
	import torch

	# Load base model
	base_model = Gemma3ForConditionalGeneration.from_pretrained(
	"google/gemma-3-4b-it",
	torch_dtype=torch.bfloat16,
	device_map="auto"
	)

	# Load LoRA adapter
	model = PeftModel.from_pretrained(base_model, "Nefflymicn/gemma3-4b-bean-captioning")
	processor = AutoProcessor.from_pretrained("Nefflymicn/gemma3-4b-bean-captioning")

	# Prepare input
	image = Image.open("bean_plant.jpg")
	messages = [
	{
	"role": "user",
	"content": [
	{"type": "image"},
	{"type": "text", "text": "Describe this plant image."}
	]
	}
	]

	text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
	inputs = processor(text=text, images=image, return_tensors="pt").to(model.device)

	# Generate
	outputs = model.generate(**inputs, max_new_tokens=50, do_sample=False)
	response = processor.decode(outputs[0], skip_special_tokens=True)
	print(response)
	```

	## Training Details

	- Epochs: 10
	- Batch Size: 1 (effective: 4 with gradient accumulation)
	- Learning Rate: 5e-5
	- Precision: FP16
	- Hardware: NVIDIA T4 GPU
	- Training Time: ~25 minutes
	- Max Sequence Length: 512 tokens

	## Performance

	- Final Training Loss: 0.69
	- Final Validation Loss: 0.001
	- Accuracy: Very high (based on validation loss)

	## Limitations

	- Trained on 100 images for demonstration purposes
	- Best suited for the 3 specific bean disease types in the training data
	- May not generalize to other bean varieties or diseases
	- Should be validated on real-world data before production use

	## Citation

	If you use this model, please cite:

	```bibtex
	@misc{gemma3-bean-captioning,
	author = {younaice},
	title = {Gemma-3-4B Fine-tuned for Bean Disease Classification},
	year = {2024},
	publisher = {Hugging Face},
	howpublished = {\url{https://huggingface.co/Nefflymicn/gemma3-4b-bean-captioning}}
	}
	```

	## License

	This model inherits the Gemma license from the base model. Please refer to the [Gemma license](https://ai.google.dev/gemma/terms) for usage terms.