DataFinder: Scientific Dataset Recommendation from Natural Language Descriptions
Paper • 2305.16636 • Published
# Load model directly
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("viswavi/datafinder-scibert-nl-queries")
model = AutoModel.from_pretrained("viswavi/datafinder-scibert-nl-queries")This is a version of the SciBERT encoder trained for the purpose of retrieving datasets by textual description given a natural language query.
If useful, please cite
@inproceedings{viswanathan23acl,
title = {DataFinder: Scientific Dataset Recommendation from Natural Language Descriptions},
author = {Vijay Viswanathan and Luyu Gao and Tongshuang Wu and Pengfei Liu and Graham Neubig},
booktitle = {Annual Conference of the Association for Computational Linguistics (ACL)},
address = {Toronto, Canada},
month = {July},
url = {https://arxiv.org/abs/2305.16636},
year = {2023}
}
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("fill-mask", model="viswavi/datafinder-scibert-nl-queries")