Add metadata and link to paper

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +10 -3
README.md CHANGED
@@ -3,6 +3,8 @@ base_model:
3
  - meta-llama/Llama-3.2-1B-Instruct
4
  datasets:
5
  - fineinstructions/templates_raw_subsample
 
 
6
  tags:
7
  - datadreamer
8
  - datadreamer-0.39.0
@@ -16,6 +18,8 @@ tags:
16
 
17
  ----
18
 
 
 
19
  This model will convert a query / instruction / prompt into a generic, instruction template in the format of [FineTemplates](https://huggingface.co/datasets/fineinstructions/finetemplates).
20
 
21
  The output will be a JSON object.
@@ -32,7 +36,11 @@ model = AutoModelForCausalLM.from_pretrained('fineinstructions/query_templatizer
32
  pipe = pipeline('text-generation', model=model, tokenizer=tokenizer, pad_token_id=tokenizer.pad_token_id, return_full_text=False)
33
 
34
  # Run inference to templatize the query
35
- inputs = ["What volleyball exercises should I do I'm almost in high school and i do volleyball excellence five times a week (basically an advanced class in school with experienced volleyball coaches) , we have 2-3 skill training sessions a week which i feel like isn't enough for me as I would like to improve my skills almost every day.\n\n​\n\nWhat i wanted to know was what setting, digging, serving and spiking exercises could i do that would help me improve all of my skills (I have a large area to practice all these things so space isn't an issue)."]
 
 
 
 
36
  prompts = [tokenizer.apply_chat_template([{'role': 'user', 'content': i}], tokenize=False, add_generation_prompt=True) for i in inputs]
37
  generations = pipe(prompts, max_length=131072, truncation=True, temperature=None, top_p=None, do_sample=False)
38
  output = generations[0][0]['generated_text']
@@ -84,5 +92,4 @@ If you use this project in your research please cite:
84
  primaryClass={cs.CL},
85
  doi={10.48550/arXiv.2601.22146}
86
  }
87
- ```
88
-
 
3
  - meta-llama/Llama-3.2-1B-Instruct
4
  datasets:
5
  - fineinstructions/templates_raw_subsample
6
+ library_name: transformers
7
+ pipeline_tag: text-generation
8
  tags:
9
  - datadreamer
10
  - datadreamer-0.39.0
 
18
 
19
  ----
20
 
21
+ This model is part of the research presented in the paper [FineInstructions: Scaling Synthetic Instructions to Pre-Training Scale](https://huggingface.co/papers/2601.22146).
22
+
23
  This model will convert a query / instruction / prompt into a generic, instruction template in the format of [FineTemplates](https://huggingface.co/datasets/fineinstructions/finetemplates).
24
 
25
  The output will be a JSON object.
 
36
  pipe = pipeline('text-generation', model=model, tokenizer=tokenizer, pad_token_id=tokenizer.pad_token_id, return_full_text=False)
37
 
38
  # Run inference to templatize the query
39
+ inputs = ["What volleyball exercises should I do I'm almost in high school and i do volleyball excellence five times a week (basically an advanced class in school with experienced volleyball coaches) , we have 2-3 skill training sessions a week which i feel like isn't enough for me as I would like to improve my skills almost every day.
40
+
41
+ ​
42
+
43
+ What i wanted to know was what setting, digging, serving and spiking exercises could i do that would help me improve all of my skills (I have a large area to practice all these things so space isn't an issue)."]
44
  prompts = [tokenizer.apply_chat_template([{'role': 'user', 'content': i}], tokenize=False, add_generation_prompt=True) for i in inputs]
45
  generations = pipe(prompts, max_length=131072, truncation=True, temperature=None, top_p=None, do_sample=False)
46
  output = generations[0][0]['generated_text']
 
92
  primaryClass={cs.CL},
93
  doi={10.48550/arXiv.2601.22146}
94
  }
95
+ ```