Instructions to use NousResearch/Genstruct-7B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use NousResearch/Genstruct-7B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="NousResearch/Genstruct-7B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("NousResearch/Genstruct-7B")
model = AutoModelForCausalLM.from_pretrained("NousResearch/Genstruct-7B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use NousResearch/Genstruct-7B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "NousResearch/Genstruct-7B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "NousResearch/Genstruct-7B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/NousResearch/Genstruct-7B

SGLang

How to use NousResearch/Genstruct-7B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "NousResearch/Genstruct-7B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "NousResearch/Genstruct-7B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "NousResearch/Genstruct-7B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "NousResearch/Genstruct-7B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use NousResearch/Genstruct-7B with Docker Model Runner:
```
docker model run hf.co/NousResearch/Genstruct-7B
```

euclaise commited on Mar 5, 2024

Commit

6c767d8

verified ·

1 Parent(s): f68a69a

Update README.md

Browse files

Files changed (1) hide show

README.md +32 -14

README.md CHANGED Viewed

@@ -1,18 +1,36 @@
 ---
-{}
 ---
 ```
-pre_text = "The following is an interaction between a user and an AI assistant that is related to the above text."
-def ds_map_fn(row):
-    input =  f"[[[Title]]] {row['title'].strip()}\n[[[Content]]] {row['context'].strip()}\n\n" + pre_text + "\n\n[[[User]]] "
-    output = f"{row['question'].strip()}\n[[[Assistant]]] {row['answer'].strip()}"
-    input = tokenizer.encode(input, add_special_tokens=False)
-    output = tokenizer.encode(output, add_special_tokens=False)
-    input_ids = input + output + [tokenizer.eos_token_id]
-    labels = [-100]*len(input) + output + [tokenizer.eos_token_id]
-    return {'input_ids': input_ids, 'labels': labels}
-ds = ds.map(ds_map_fn, remove_columns=ds.column_names)
 ```

 ---
+license: apache-2.0
+language:
+- en
+library_name: transformers
 ---
+# Genstruct 7B
+Genstruct 7B is an instruction-generation model, inspired by [Ada-Instruct](https://arxiv.org/abs/2310.04484).
+Previous methods largely rely on in-context approaches to generate instructions, while Ada-Instruct trained a custom instruction-generation model.
+Inspired by this, we took this approach further by grounding the generations in user-provided context passages.
+Further, the model is trained to generate questions involving complex scenarios that require detailed reasoning, allowing for models trained on the generated data to reason step-by-step.
+An example notebook is provided [here](https://gist.github.com/euclaise/bb7113b9596666cbf939484156375f29), which details how to load and sample from the model.
+Alternatively, here's a minimal example:
 ```
+from transformers import AutoModelForCausalLM, AutoTokenizer
+MODEL_NAME = 'NousResearch/Genstruct-7B'
+model = AutoModelForCausalLM.from_pretrained(MODEL_NAME, device_map='cuda', load_in_8bit=True)
+tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
+msg =[{
+    'title': 'p-value',
+    'content': "The p-value is used in the context of null hypothesis testing in order to quantify the statistical significance of a result, the result being the observed value of the chosen statistic T {\displaystyle T}.[note 2] The lower the p-value is, the lower the probability of getting that result if the null hypothesis were true. A result is said to be statistically significant if it allows us to reject the null hypothesis. All other things being equal, smaller p-values are taken as stronger evidence against the null hypothesis."
+}]
+inputs = tokenizer.apply_chat_template(msg, return_tensors='pt').cuda()
+print(tokenizer.decode(model.generate(inputs, max_new_tokens=512)[0]).split(tokenizer.eos_token)[0])
 ```