bazaar-research
/

sweepgpm

Model card Files Files and versions

sweepgpm / README.md

bazaar-research's picture

bazaar-research

Upload folder using huggingface_hub

7b3209b verified 10 days ago

|

history blame contribute delete

1.61 kB

	---
	language:
	- zh
	- en
	tags:
	- sweepgpm
	- sweepmm
	- chatglm
	- multimodal
	- sweeping-robot
	- lora
	- blip2
	license: mit
	---
	# SweepGPM

	SweepGPM is a multimodal dialogue model for sweeping robots in home scenarios, fine-tuned from [VisualGLM-6B](https://github.com/THUDM/VisualGLM-6B). The language model is based on [ChatGLM-6B](https://github.com/THUDM/ChatGLM-6B) (6.2B parameters, frozen), and the image encoder uses [CLIP ViT-L/14](https://github.com/openai/CLIP) (frozen). The Q-Former, fully connected projection layer, and LoRA adapters (rank=4, last 2 layers only) are trained to adapt the model to the domain knowledge of sweeping robots.


	## Performance

	\| Downstream Task \| Metric \| SweepGPM \|
	\|----------------\|--------\|----------\|
	\| Room Type Classification \| Mean Accuracy \| 84.3% \|
	\| Obstacle Detection \| mAP@0.5 \| 86.1% \|
	\| Lost Item Search \| Mean Recall \| 80.2% \|

	## Usage

	```python
	from transformers import AutoTokenizer, AutoModel

	tokenizer = AutoTokenizer.from_pretrained("bazaar-research/sweepgpm", trust_remote_code=True)
	model = AutoModel.from_pretrained("bazaar-research/sweepgpm", trust_remote_code=True).half().cuda()

	image_path = "your_image.jpg"
	response, history = model.chat(tokenizer, image_path, "Give the room type in the image.", history=[])
	print(response)

	response, history = model.chat(tokenizer, image_path, "Provide fine-grained bounding boxes for all objects in the image.", history=history)
	print(response)
	```

	## Dependencies

	```bash
	pip install SwissArmyTransformer>=0.3.6 torch>=2.0.1 torchvision transformers>=4.31.0 cpm_kernels peft>=0.4.0
	```