Instructions to use QuantFactory/instruction-synthesizer-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use QuantFactory/instruction-synthesizer-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="QuantFactory/instruction-synthesizer-GGUF", filename="instruction-synthesizer.Q2_K.gguf", )
output = llm( "Once upon a time,", max_tokens=512, echo=True ) print(output)
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use QuantFactory/instruction-synthesizer-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf QuantFactory/instruction-synthesizer-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf QuantFactory/instruction-synthesizer-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf QuantFactory/instruction-synthesizer-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf QuantFactory/instruction-synthesizer-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf QuantFactory/instruction-synthesizer-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf QuantFactory/instruction-synthesizer-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf QuantFactory/instruction-synthesizer-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf QuantFactory/instruction-synthesizer-GGUF:Q4_K_M
Use Docker
docker model run hf.co/QuantFactory/instruction-synthesizer-GGUF:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use QuantFactory/instruction-synthesizer-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "QuantFactory/instruction-synthesizer-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "QuantFactory/instruction-synthesizer-GGUF", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/QuantFactory/instruction-synthesizer-GGUF:Q4_K_M
- Ollama
How to use QuantFactory/instruction-synthesizer-GGUF with Ollama:
ollama run hf.co/QuantFactory/instruction-synthesizer-GGUF:Q4_K_M
- Unsloth Studio new
How to use QuantFactory/instruction-synthesizer-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for QuantFactory/instruction-synthesizer-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for QuantFactory/instruction-synthesizer-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for QuantFactory/instruction-synthesizer-GGUF to start chatting
- Docker Model Runner
How to use QuantFactory/instruction-synthesizer-GGUF with Docker Model Runner:
docker model run hf.co/QuantFactory/instruction-synthesizer-GGUF:Q4_K_M
- Lemonade
How to use QuantFactory/instruction-synthesizer-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull QuantFactory/instruction-synthesizer-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.instruction-synthesizer-GGUF-Q4_K_M
List all available models
lemonade list
Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf QuantFactory/instruction-synthesizer-GGUF:# Run inference directly in the terminal:
llama-cli -hf QuantFactory/instruction-synthesizer-GGUF:Use pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf QuantFactory/instruction-synthesizer-GGUF:# Run inference directly in the terminal:
./llama-cli -hf QuantFactory/instruction-synthesizer-GGUF:Build from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf QuantFactory/instruction-synthesizer-GGUF:# Run inference directly in the terminal:
./build/bin/llama-cli -hf QuantFactory/instruction-synthesizer-GGUF:Use Docker
docker model run hf.co/QuantFactory/instruction-synthesizer-GGUF:QuantFactory/instruction-synthesizer-GGUF
This is quantized version of instruction-pretrain/instruction-synthesizer created using llama.cpp
Model Description
Instruction Pre-Training: Language Models are Supervised Multitask Learners
This repo contains the context-based instruction synthesizer in our paper Instruction Pre-Training: Language Models are Supervised Multitask Learners.
We explore supervised multitask pre-training by proposing Instruction Pre-Training, a framework that scalably augments massive raw corpora with instruction-response pairs to pre-train language models. The instruction-response pairs are generated by an efficient instruction synthesizer built on open-source models. In our experiments, we synthesize 200M instruction-response pairs covering 40+ task categories to verify the effectiveness of Instruction Pre-Training. Instruction Pre-Training outperforms Vanilla Pre-training in both general pre-training from scratch and domain-adaptive continual pre-training. In pre-training from scratch, Instruction Pre-Training not only improves pre-trained base models but also benefits more from further instruction tuning. In continual pre-training, Instruction Pre-Training enables Llama3-8B to be comparable to or even outperform Llama3-70B.
Resources
🤗 We share our data and models with example usages, feel free to open any issues or discussions! 🤗
- Context-Based Instruction Synthesizer: instruction-synthesizer
- Fine-Tuning Data for the Synthesizer: ft-instruction-synthesizer-collection
- General Models Pre-Trained from Scratch:
- Domain-Specific Models Pre-Trained from Llama3-8B:
Synthesize Instruction-Response Pairs to Augment Any Raw Corpora
We conduct multitask fine-tuning on a language model to develop an instruction synthesizer capable of generating instruction-response pairs from any raw text. The fine-tuning data are available at ft-instruction-synthesizer-collection
Basic Usage: Synthesize instruction-response pairs based on a given raw text
💗 Here is an amazing demo that implements our approach: davanstrien/instruction-synthesizer 💗
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("instruction-pretrain/instruction-synthesizer")
tokenizer = AutoTokenizer.from_pretrained("instruction-pretrain/instruction-synthesizer")
# Put your raw text here:
context = '''Free Fishing Weekend in NYS Slated
This weekend (June 28th-29th) New Yorkers may fish for free without a license in any of the state's 7,500 lakes and ponds or 50,000 miles of rivers and streams. In addition, there are a number of free events and fishing clinics taking place across the state to encourage New Yorkers to enjoy the great outdoors. For more information, visit'''
def parse_pred(pred):
"""Extract the list of instruction-response pairs from the prediction"""
QA_str_list = pred.split('</END>')
if not pred.endswith('</END>'):
QA_str_list = QA_str_list[:-1]
QA_list = []
raw_questions = []
for QA_str in QA_str_list:
try:
assert len(QA_str.split('<ANS>')) == 2, f'invalid QA string: {QA_str}'
Q_str, A_str = QA_str.split('<ANS>')
Q_str, A_str = Q_str.strip(), A_str.strip()
assert Q_str.startswith('<QUE>'), f'invalid question string: {Q_str} in QA_str: {QA_str}'
assert len(A_str) > 0, f'invalid answer string in QA_str: {QA_str}'
Q_str = Q_str.replace('<QUE>', '').strip()
assert Q_str.lower() not in raw_questions, f'duplicate question: {Q_str}'
QA_list.append({'Q': Q_str, 'A': A_str})
raw_questions.append(Q_str.lower())
except:
pass
return QA_list
def get_instruction_response_pairs(context):
'''Prompt the synthesizer to generate instruction-response pairs based on the given context'''
prompt = f'<s> <CON> {context} </CON>\n\n'
inputs = tokenizer(prompt, add_special_tokens=False, return_tensors="pt").input_ids.to(model.device)
outputs = model.generate(input_ids=inputs, max_new_tokens=400, do_sample=False)[0]
pred_start = int(inputs.shape[-1])
pred = tokenizer.decode(outputs[pred_start:], skip_special_tokens=True)
return parse_pred(pred)
# Get the generated instruction-response paris
instruction_response_pairs = get_instruction_response_pairs(context)
# Print out the results
print(f'# Context:\n{context}\n')
for index, pair in enumerate(instruction_response_pairs):
print(f'## Instruction {index + 1}:\n{pair["Q"]}\n## Response {index + 1}:\n{pair["A"]}\n')
Advanced Usage: Synthesize Few-shot Examples
A one-shot example consists of a piece of raw text followed by its instruction-response pairs. You can conduct multi-round inferece to synthesize a few-shot example: the instruction-response pairs of different raw texts share the same pattern.
To accelerate synthesis, we use the vLLM framework:
Click to expand
- Set up dependencies: Install vLLM with pip or from source:
pip install vllm
- Synthesize:
from vllm import LLM, SamplingParams
# Put your list of raw texts here,
# a list of M raw texts can be coverted into an M-shot example:
text_list = [
"Genetically and medically susceptible workers.\nThe likelihood of an individual becoming ill from a hazardous material or condition is strongly influenced by both their genetic makeup and their underlying state of health. Although the past decade has seen great advances in understanding human variation in health and genetic polymorphisms and in the diagnosis and treatment of disease, much less progress has been made in effectively using this information to protect worker health. Scientific evidence for increased susceptibility often is weak and rarely satisfies legal thresholds for sufficient risk to warrant exclusion from a particular job. When public safety is a major concern, many legally mandated exclusions are not well justified. Medical opinions about fitness to work should be based upon a systematic and credible analysis of the condition, its relationship to ability and risk for a particular job, and knowledge of possible accommodations. Conclusions should reflect the limitations of scientific knowledge and guidance from antidiscrimination legislation.",
"Exclusive Breastfeeding for Twin Babies and Its Influencing Factors: A Study in East Java, Indonesia.\nThis study aimed to identify the factors that influence the success of exclusive breastfeeding in twins. This cross-sectional study was conducted on 184 mothers who had twins aged 6-23 months in Malang Raya, East Java, Indonesia and used the consecutive sampling technique. The data was collected through distributing questionnaires containing questions related to knowledge about exclusive breastfeeding, breastfeeding self-efficacy, and the support of family and certified health workers. Multinomial regression statistical test results show that the most influential factor for the success of exclusive breastfeeding with twins was breastfeeding self-efficacy (OR 0.111; 95% CI 0.033-0.387). A high level of breastfeeding self-efficacy can increase a mother's confidence to be able to provide exclusive breastfeeding for twins. This study suggests that nurses can provide breastfeeding counselling to improve breastfeeding self-efficacy."]
# Create a sampling params object.
sampling_params = SamplingParams(temperature=0, max_tokens=400)
# Load the model and tokenizer
llm = LLM(model="instruction-pretrain/instruction-synthesizer", max_model_len=4096)
# Templates (please do NOT change them)
context_template = ' <CON> {context} </CON>'
QA_template = '<QUE> {question} <ANS> {answer} </END>'
delimiter = '\n\n'
bos_token = '<s>'
eos_token = '</s>'
def cook_context(raw_context):
"""Format the context."""
return bos_token + context_template.replace('{context}', raw_context) + delimiter
def cook_instruction_response_pairs(QA_list):
"""Format downstream instruction(Q)-response(A) pairs."""
ins_res_list = []
for qa_entry in QA_list:
qa = QA_template.replace('{question}', qa_entry['Q']).replace('{answer}', qa_entry['A'])
ins_res_list.append(qa)
return delimiter.join(ins_res_list) + eos_token
def parse_pred(pred):
"""Extract the list of instruction-response pairs from the prediction"""
QA_str_list = pred.split('</END>')
if not pred.endswith('</END>'):
QA_str_list = QA_str_list[:-1]
QA_list = []
raw_questions = []
for QA_str in QA_str_list:
try:
assert len(QA_str.split('<ANS>')) == 2, f'invalid QA string: {QA_str}'
Q_str, A_str = QA_str.split('<ANS>')
Q_str, A_str = Q_str.strip(), A_str.strip()
assert Q_str.startswith('<QUE>'), f'invalid question string: {Q_str} in QA_str: {QA_str}'
assert len(A_str) > 0, f'invalid answer string in QA_str: {QA_str}'
Q_str = Q_str.replace('<QUE>', '').strip()
assert Q_str.lower() not in raw_questions, f'duplicate question: {Q_str}'
QA_list.append({'Q': Q_str, 'A': A_str})
raw_questions.append(Q_str.lower())
except:
pass
return QA_list
def get_instruction_response_pairs(context):
'''Prompt the synthesizer to generate instruction-response pairs based on the given context'''
outputs = llm.generate(context, sampling_params, use_tqdm=False)
pred = outputs[0].outputs[0].text
return parse_pred(pred)
# Process each text and generate instruction-response pairs in multi-round inference:
previous_examples = []
for cur_text in text_list:
# Prepend raw texts and instruction-response pairs of previous examples to the current text
context = ''
for previous_example in previous_examples:
context += cook_context(previous_example['text']) + cook_instruction_response_pairs(previous_example['instruction_response_pairs'])
context += cook_context(cur_text)
# Get the generated instruction-response paris
instruction_response_pairs = get_instruction_response_pairs(context)
previous_examples.append({'text': cur_text, 'instruction_response_pairs': instruction_response_pairs})
# Concatenate the raw texts and instruction-response pairs of M rounds to consititute an M-shot example
for example in previous_examples:
print(f'# Raw Text:\n{example["text"]}\n')
for index, pair in enumerate(example['instruction_response_pairs']):
print(f'## Instruction {index + 1}:\n{pair["Q"]}\n## Response {index + 1}:\n{pair["A"]}\n')
Model Citation
If you find our work helpful, please cite us:
@inproceedings{
cheng2024adapting,
title={Adapting Large Language Models via Reading Comprehension},
author={Daixuan Cheng and Shaohan Huang and Furu Wei},
booktitle={The Twelfth International Conference on Learning Representations},
year={2024},
url={https://openreview.net/forum?id=y886UXPEZ0}
}
- Downloads last month
- 346
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit
Model tree for QuantFactory/instruction-synthesizer-GGUF
Base model
instruction-pretrain/instruction-synthesizer
Install from brew
# Start a local OpenAI-compatible server with a web UI: llama-server -hf QuantFactory/instruction-synthesizer-GGUF:# Run inference directly in the terminal: llama-cli -hf QuantFactory/instruction-synthesizer-GGUF: