Neroism8422's picture
This is the basic llava version of the original mol-instruct model, none tuned with only vision encoder of CLIP add on.
aea55e2 verified