Text Generation
Transformers
GGUF
conversational
aashish1904 commited on
Commit
69c0928
·
verified ·
1 Parent(s): 6a62a9c

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +95 -0
README.md ADDED
@@ -0,0 +1,95 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ---
3
+
4
+ library_name: transformers
5
+ base_model: nvidia/Mistral-NeMo-Minitron-8B-Base
6
+ datasets:
7
+ - teknium/OpenHermes-2.5
8
+ pipeline_tag: text-generation
9
+ license: other
10
+ license_name: nvidia-open-model-license
11
+ license_link: >-
12
+ https://developer.download.nvidia.com/licenses/nvidia-open-model-license-agreement-june-2024.pdf
13
+
14
+ ---
15
+
16
+ ![](https://lh7-rt.googleusercontent.com/docsz/AD_4nXeiuCm7c8lEwEJuRey9kiVZsRn2W-b4pWlu3-X534V3YmVuVc2ZL-NXg2RkzSOOS2JXGHutDuyyNAUtdJI65jGTo8jT9Y99tMi4H4MqL44Uc5QKG77B0d6-JfIkZHFaUA71-RtjyYZWVIhqsNZcx8-OMaA?key=xt3VSDoCbmTY7o-cwwOFwQ)
17
+
18
+ # QuantFactory/Mistral-NeMo-Minitron-8B-Chat-GGUF
19
+ This is quantized version of [rasyosef/Mistral-NeMo-Minitron-8B-Chat](https://huggingface.co/rasyosef/Mistral-NeMo-Minitron-8B-Chat) created using llama.cpp
20
+
21
+ # Original Model Card
22
+
23
+
24
+ # Mistral-NeMo-Minitron-8B-Chat
25
+
26
+ This is an instruction-tuned version of [nvidia/Mistral-NeMo-Minitron-8B-Base](https://huggingface.co/nvidia/Mistral-NeMo-Minitron-8B-Base) that has underwent **supervised fine-tuning** with 32k instruction-response pairs from the [teknium/OpenHermes-2.5](https://huggingface.co/datasets/teknium/OpenHermes-2.5) dataset.
27
+
28
+ ## How to use
29
+ ### Chat Format
30
+
31
+ Given the nature of the training data, the phi-2 instruct model is best suited for prompts using the chat format as follows.
32
+ You can provide the prompt as a question with a generic template as follows:
33
+ ```markdown
34
+ <|im_start|>system
35
+ You are a helpful assistant.<|im_end|>
36
+ <|im_start|>user
37
+ Question?<|im_end|>
38
+ <|im_start|>assistant
39
+ ```
40
+
41
+ For example:
42
+ ```markdown
43
+ <|im_start|>system
44
+ You are a helpful assistant.<|im_end|>
45
+ <|im_start|>user
46
+ How to explain Internet for a medieval knight?<|im_end|>
47
+ <|im_start|>assistant
48
+ ```
49
+ where the model generates the text after `<|im_start|>assistant` .
50
+
51
+ ### Sample inference code
52
+
53
+ This code snippets show how to get quickly started with running the model on a GPU:
54
+
55
+ ```python
56
+ import torch
57
+ from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
58
+
59
+ torch.random.manual_seed(0)
60
+
61
+ model_id = "rasyosef/Mistral-NeMo-Minitron-8B-Chat"
62
+ model = AutoModelForCausalLM.from_pretrained(
63
+ model_id,
64
+ device_map="auto",
65
+ torch_dtype=torch.bfloat16
66
+ )
67
+
68
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
69
+
70
+ messages = [
71
+ {"role": "system", "content": "You are a helpful AI assistant."},
72
+ {"role": "user", "content": "Can you provide ways to eat combinations of bananas and dragonfruits?"},
73
+ {"role": "assistant", "content": "Sure! Here are some ways to eat bananas and dragonfruits together: 1. Banana and dragonfruit smoothie: Blend bananas and dragonfruits together with some milk and honey. 2. Banana and dragonfruit salad: Mix sliced bananas and dragonfruits together with some lemon juice and honey."},
74
+ {"role": "user", "content": "What about solving an 2x + 3 = 7 equation?"},
75
+ ]
76
+
77
+ pipe = pipeline(
78
+ "text-generation",
79
+ model=model,
80
+ tokenizer=tokenizer,
81
+ )
82
+
83
+ generation_args = {
84
+ "max_new_tokens": 256,
85
+ "return_full_text": False,
86
+ "temperature": 0.0,
87
+ "do_sample": False,
88
+ }
89
+
90
+ output = pipe(messages, **generation_args)
91
+ print(output[0]['generated_text'])
92
+ ```
93
+
94
+ Note: If you want to use flash attention, call _AutoModelForCausalLM.from_pretrained()_ with _attn_implementation="flash_attention_2"_
95
+