CASLIE-L / README.md

Improve model card: Add pipeline tag, library name, links, and sample usage

276d113 verified 3 months ago

2.21 kB

	---
	base_model:
	- meta-llama/Llama-2-13b-chat-hf
	datasets:
	- NingLab/MMECInstruct
	license: cc-by-4.0
	library_name: transformers
	pipeline_tag: image-text-to-text
	---

	# CASLIE-L

	This repository contains the models for "[Captions Speak Louder than Images: Generalizing Foundation Models for E-commerce from High-quality Multimodal Instruction Data](https://huggingface.co/papers/2410.17337)".

	Project Page: [https://ninglab.github.io/CASLIE/](https://ninglab.github.io/CASLIE/)
	Code Repository: [https://github.com/ninglab/CASLIE](https://github.com/ninglab/CASLIE)

	## Introduction
	Leveraging multimodal data to drive breakthroughs in e-commerce applications through Multimodal Foundation Models (MFMs) is gaining increasing attention. This work introduces [MMECInstruct](https://huggingface.co/datasets/NingLab/MMECInstruct), the first-ever, large-scale, and high-quality multimodal instruction dataset for e-commerce. We also develop CASLIE, a simple, lightweight, yet effective framework for integrating multimodal information for e-commerce. Leveraging MMECInstruct, we fine-tune a series of e-commerce MFMs within CASLIE, denoted as CASLIE models.

	## CASLIE Models
	The CASLIE-L model is instruction-tuned from the large base model [Llama-2-13b-chat](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf).

	## Sample Usage (Modality-unified Inference)
	To conduct inference with the CASLIE models, refer to the following example directly from the [official GitHub repository](https://github.com/ninglab/CASLIE#modality-unified-inference).

	`$model_path` is the path of the instruction-tuned model.

	`$task` specifies the task to be tested.

	`$output_path` specifies the path where you want to save the inference output.

	Example:
	```
	python inference.py --model_path NingLab/CASLIE-M --task answerability_prediction --output_path ap.json
	```

	## Citation
	```bibtex
	@article{ling2024captions,
	title={Captions Speak Louder than Images (CASLIE): Generalizing Foundation Models for E-commerce from High-quality Multimodal Instruction Data},
	author={Ling, Xinyi and Peng, Bo and Du, Hanwen and Zhu, Zhihui and Ning, Xia},
	journal={arXiv preprint arXiv:2410.17337},
	year={2024}
	}
	```