Improve model card: Add pipeline tag, library name, links, and sample usage

This PR significantly enhances the model card for **CASLIE-L** by:

- Adding the `pipeline_tag: image-text-to-text` metadata to accurately reflect its multimodal capabilities (image and text input, text output) and improve discoverability on the Hugging Face Hub.
- Specifying `library_name: transformers` metadata, as evidenced by the `config.json` (`LlamaForCausalLM`, `transformers_version`) and the GitHub requirements, enabling the automated "how to use" widget.
- Including direct links to the paper ([Captions Speak Louder than Images: Generalizing Foundation Models for E-commerce from High-quality Multimodal Instruction Data](https://huggingface.co/papers/2410.17337)), the official [project page](https://ninglab.github.io/CASLIE/), and the [GitHub repository](https://github.com/ninglab/CASLIE).
- Integrating an "Introduction" section from the paper's abstract and GitHub README to provide more context about the model and the MMECInstruct dataset.
- Adding a "Sample Usage" section that directly presents the "Modality-unified Inference" command-line example from the GitHub README, ensuring that no code is made up and users have an accurate starting point for interacting with the model.

These improvements aim to make the model more informative, discoverable, and user-friendly for the community.

Files changed (1) hide show

README.md +26 -4

README.md CHANGED Viewed

@@ -1,18 +1,40 @@
 ---
-license: cc-by-4.0
-datasets:
-- NingLab/MMECInstruct
 base_model:
 - meta-llama/Llama-2-13b-chat-hf
 ---
 # CASLIE-L
-This repo contains the models for "Captions Speak Louder than Images (CASLIE): Generalizing Foundation Models for E-commerce from High-quality Multimodal Instruction Data"
 ## CASLIE Models
 The CASLIE-L model is instruction-tuned from the large base model [Llama-2-13b-chat](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf).
 ## Citation
 ```bibtex
 @article{ling2024captions,

 ---
 base_model:
 - meta-llama/Llama-2-13b-chat-hf
+datasets:
+- NingLab/MMECInstruct
+license: cc-by-4.0
+library_name: transformers
+pipeline_tag: image-text-to-text
 ---
 # CASLIE-L
+This repository contains the models for "[Captions Speak Louder than Images: Generalizing Foundation Models for E-commerce from High-quality Multimodal Instruction Data](https://huggingface.co/papers/2410.17337)".
+**Project Page**: [https://ninglab.github.io/CASLIE/](https://ninglab.github.io/CASLIE/)
+**Code Repository**: [https://github.com/ninglab/CASLIE](https://github.com/ninglab/CASLIE)
+## Introduction
+Leveraging multimodal data to drive breakthroughs in e-commerce applications through Multimodal Foundation Models (MFMs) is gaining increasing attention. This work introduces [MMECInstruct](https://huggingface.co/datasets/NingLab/MMECInstruct), the first-ever, large-scale, and high-quality multimodal instruction dataset for e-commerce. We also develop CASLIE, a simple, lightweight, yet effective framework for integrating multimodal information for e-commerce. Leveraging MMECInstruct, we fine-tune a series of e-commerce MFMs within CASLIE, denoted as CASLIE models.
 ## CASLIE Models
 The CASLIE-L model is instruction-tuned from the large base model [Llama-2-13b-chat](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf).
+## Sample Usage (Modality-unified Inference)
+To conduct inference with the CASLIE models, refer to the following example directly from the [official GitHub repository](https://github.com/ninglab/CASLIE#modality-unified-inference).
+`$model_path` is the path of the instruction-tuned model.
+`$task` specifies the task to be tested.
+`$output_path` specifies the path where you want to save the inference output.
+Example:
+```
+python inference.py --model_path NingLab/CASLIE-M --task answerability_prediction --output_path ap.json
+```
 ## Citation
 ```bibtex
 @article{ling2024captions,