| | --- |
| | license: cc-by-4.0 |
| | datasets: |
| | - NingLab/MMECInstruct |
| | base_model: |
| | - meta-llama/Llama-2-13b-chat-hf |
| | --- |
| | |
| | # CASLIE-L |
| |
|
| | This repo contains the models for "Captions Speak Louder than Images (CASLIE): Generalizing Foundation Models for E-commerce from High-quality Multimodal Instruction Data" |
| |
|
| | ## CASLIE Models |
| | The CASLIE-L model is instruction-tuned from the large base model [Llama-2-13b-chat](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf). |
| |
|
| | ## Citation |
| | ```bibtex |
| | @article{ling2024captions, |
| | title={Captions Speak Louder than Images (CASLIE): Generalizing Foundation Models for E-commerce from High-quality Multimodal Instruction Data}, |
| | author={Ling, Xinyi and Peng, Bo and Du, Hanwen and Zhu, Zhihui and Ning, Xia}, |
| | journal={arXiv preprint arXiv:2410.17337}, |
| | year={2024} |
| | } |
| | ``` |