fineinstructions
/

query_templatizer

@@ -3,6 +3,8 @@ base_model:
 - meta-llama/Llama-3.2-1B-Instruct
 datasets:
 - fineinstructions/templates_raw_subsample
 tags:
 - datadreamer
 - datadreamer-0.39.0
@@ -16,6 +18,8 @@ tags:
 ----
 This model will convert a query / instruction / prompt into a generic, instruction template in the format of [FineTemplates](https://huggingface.co/datasets/fineinstructions/finetemplates).
 The output will be a JSON object.
@@ -32,7 +36,11 @@ model = AutoModelForCausalLM.from_pretrained('fineinstructions/query_templatizer
 pipe = pipeline('text-generation', model=model, tokenizer=tokenizer, pad_token_id=tokenizer.pad_token_id, return_full_text=False)
 # Run inference to templatize the query
-inputs = ["What volleyball exercises should I do I'm almost in high school and i do volleyball excellence five times a week (basically an advanced class in school with experienced volleyball coaches) , we have 2-3 skill training sessions a week which i feel like isn't enough for me as I would like to improve my skills almost every day.\n\n&amp;#x200B;\n\nWhat i wanted to know was what setting, digging, serving and spiking exercises could i do that would help me improve all of my skills (I have a large area to practice all these things so space isn't an issue)."]
 prompts = [tokenizer.apply_chat_template([{'role': 'user', 'content': i}], tokenize=False, add_generation_prompt=True) for i in inputs]
 generations = pipe(prompts, max_length=131072, truncation=True, temperature=None, top_p=None, do_sample=False)
 output = generations[0][0]['generated_text']
@@ -84,5 +92,4 @@ If you use this project in your research please cite:
   primaryClass={cs.CL},
   doi={10.48550/arXiv.2601.22146}
 }
-```

 - meta-llama/Llama-3.2-1B-Instruct
 datasets:
 - fineinstructions/templates_raw_subsample
+library_name: transformers
+pipeline_tag: text-generation
 tags:
 - datadreamer
 - datadreamer-0.39.0
 ----
+This model is part of the research presented in the paper [FineInstructions: Scaling Synthetic Instructions to Pre-Training Scale](https://huggingface.co/papers/2601.22146).
 This model will convert a query / instruction / prompt into a generic, instruction template in the format of [FineTemplates](https://huggingface.co/datasets/fineinstructions/finetemplates).
 The output will be a JSON object.
 pipe = pipeline('text-generation', model=model, tokenizer=tokenizer, pad_token_id=tokenizer.pad_token_id, return_full_text=False)
 # Run inference to templatize the query
+inputs = ["What volleyball exercises should I do I'm almost in high school and i do volleyball excellence five times a week (basically an advanced class in school with experienced volleyball coaches) , we have 2-3 skill training sessions a week which i feel like isn't enough for me as I would like to improve my skills almost every day.
+&amp;#x200B;
+What i wanted to know was what setting, digging, serving and spiking exercises could i do that would help me improve all of my skills (I have a large area to practice all these things so space isn't an issue)."]
 prompts = [tokenizer.apply_chat_template([{'role': 'user', 'content': i}], tokenize=False, add_generation_prompt=True) for i in inputs]
 generations = pipe(prompts, max_length=131072, truncation=True, temperature=None, top_p=None, do_sample=False)
 output = generations[0][0]['generated_text']
   primaryClass={cs.CL},
   doi={10.48550/arXiv.2601.22146}
 }
+```