arshjeevs
/

FinalTry

Image-Text-to-Text

vision-language-model

medical-imaging

Model card Files Files and versions

FinalTry / README.md

arshjeevs's picture

Upload folder using huggingface_hub

1910736 verified 26 days ago

|

history blame contribute delete

727 Bytes

	---
	language:
	- en

	license: apache-2.0

	library_name: llama.cpp

	tags:
	- gguf
	- llama.cpp
	- multimodal
	- vision-language-model
	- smolvlm
	- cytology
	- medical-imaging

	pipeline_tag: image-text-to-text

	base_model:
	- HuggingFaceTB/SmolVLM-500M-Instruct

	---

	# SmolVLM Cytology GGUF

	Fine-tuned SmolVLM multimodal model for cytology image analysis.

	## Files

	- SmolVLM-Cytology-Q4_K_M.gguf
	- mmproj-SmolVLM-Cytology-f16.gguf

	## Usage

	```bash
	llama-mtmd-cli \
	-m SmolVLM-Cytology-Q4_K_M.gguf \
	--mmproj mmproj-SmolVLM-Cytology-f16.gguf \
	--image test.png \
	-p "<image> Describe this image"
	```

	## Notes

	- Quantized using llama.cpp
	- Compatible with llama-mtmd-cli
	- Vision encoder exported separately as mmproj GGUF