This model will convert a query / instruction / prompt into a generic, instruction template in the format of FineTemplates.
The output will be a JSON object.
Simple Usage Example
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
tokenizer = AutoTokenizer.from_pretrained('fineinstructions/query_templatizer', revision=None)
tokenizer.padding_side = 'left'
model = AutoModelForCausalLM.from_pretrained('fineinstructions/query_templatizer', revision=None)
pipe = pipeline('text-generation', model=model, tokenizer=tokenizer, pad_token_id=tokenizer.pad_token_id, return_full_text=False)
inputs = ["What volleyball exercises should I do I'm almost in high school and i do volleyball excellence five times a week (basically an advanced class in school with experienced volleyball coaches) , we have 2-3 skill training sessions a week which i feel like isn't enough for me as I would like to improve my skills almost every day.\n\n​\n\nWhat i wanted to know was what setting, digging, serving and spiking exercises could i do that would help me improve all of my skills (I have a large area to practice all these things so space isn't an issue)."]
prompts = [tokenizer.apply_chat_template([{'role': 'user', 'content': i}], tokenize=False, add_generation_prompt=True) for i in inputs]
generations = pipe(prompts, max_length=131072, truncation=True, temperature=None, top_p=None, do_sample=False)
output = generations[0][0]['generated_text']
print(output)
This model was trained with a synthetic dataset with DataDreamer 🤖💤. The synthetic dataset card and model card can be found here. The training arguments can be found here.
This is a work-in-progress. If you use this project in your research please cite:
@article{patel2025fineinstructions,
title = {FineInstructions: A Web-Scale Instructions Dataset},
author = {Patel, Ajay and Raffel, Colin and Callison-Burch, Chris},
year = {2025},
month = aug,
day = {11},
note = {Work in progress},
}