saucam commited on
Commit
22ca2c0
·
verified ·
1 Parent(s): 1385939

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +139 -0
README.md ADDED
@@ -0,0 +1,139 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ - ja
6
+ ---
7
+
8
+ ![](https://raw.githubusercontent.com/saucam/models/main/hikari.png)
9
+
10
+ # 🏮 Hikari-4x7B
11
+
12
+ Hikari 4x7B is a MoE franken model generated by merging the following models:
13
+
14
+ - [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)
15
+ - [NousResearch/Hermes-2-Pro-Mistral-7B](https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B)
16
+ - [Rakuten/RakutenAI-7B-chat](https://huggingface.co/Rakuten/RakutenAI-7B)
17
+ - [NeuralNovel/Mistral-7B-Instruct-v0.2-Neural-Story](https://huggingface.co/NeuralNovel/Mistral-7B-Instruct-v0.2-Neural-Story)
18
+
19
+ ### Config used for merging
20
+
21
+ ```
22
+ base_model: Rakuten/RakutenAI-7B-chat
23
+ experts:
24
+ - source_model: mistralai/Mistral-7B-Instruct-v0.2
25
+ positive_prompts:
26
+ - "chat"
27
+ - "assistant"
28
+ - "tell me"
29
+ - "explain"
30
+ - "I want"
31
+ - source_model: NousResearch/Hermes-2-Pro-Mistral-7B
32
+ positive_prompts:
33
+ - "json"
34
+ - "function calling"
35
+ - "call a function"
36
+ - "story"
37
+ - "roleplay"
38
+ - "logic"
39
+ - "rationality"
40
+ - "mathematics"
41
+ - "solve"
42
+ - source_model: Rakuten/RakutenAI-7B-chat
43
+ positive_prompts:
44
+ - "japanese"
45
+ - "日本語"
46
+ - "話"
47
+ - "日本語で答えて"
48
+ - "日本語版"
49
+ - "日本語で"
50
+ - "Japanese translation"
51
+ - source_model: NeuralNovel/Mistral-7B-Instruct-v0.2-Neural-Story
52
+ positive_prompts:
53
+ - "story"
54
+ - "storywriting"
55
+ - "character"
56
+ - "narrative"
57
+ - "creative"
58
+ ```
59
+
60
+ ### Inference
61
+
62
+ Via transfomers
63
+
64
+ ```
65
+ !pip install -qU transformers bitsandbytes accelerate
66
+
67
+ from transformers import AutoTokenizer
68
+ import transformers
69
+ import torch
70
+
71
+ model = "saucam/Hikari-4x7B"
72
+
73
+ tokenizer = AutoTokenizer.from_pretrained(model)
74
+ pipeline = transformers.pipeline(
75
+ "text-generation",
76
+ model=model,
77
+ model_kwargs={"torch_dtype": torch.float16, "load_in_4bit": False},
78
+ )
79
+
80
+ messages = [{"role": "user", "content": "Explain what a Mixture of Experts is in less than 100 words."}]
81
+ prompt = pipeline.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
82
+ outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
83
+ print(outputs[0]["generated_text"])
84
+ ```
85
+
86
+ Sample responses:
87
+
88
+ ```
89
+ No chat template is defined for this tokenizer - using the default template for the LlamaTokenizerFast class. If the default is not a
90
+ ppropriate for your model, please set `tokenizer.chat_template` to an appropriate template. See https://huggingface.co/docs/transform
91
+ ers/main/chat_templating for more information.
92
+
93
+
94
+ <s>[INST] <<SYS>>
95
+ You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not
96
+ include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially
97
+ unbiased and positive in nature.
98
+
99
+ If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you d
100
+ on't know the answer to a question, please don't share false information.
101
+ <</SYS>>
102
+
103
+ Explain what a Mixture of Experts is in less than 100 words. [/INST] A Mixture of Experts (MoE) is a neural network architecture that
104
+ was introduced in the 1990s. It is based on the idea of having a set of expert networks, each specialized in a particular part of th
105
+ e input space, and then combining their outputs to produce a final prediction.
106
+
107
+ The Mixture of Experts architecture is well-suited for tasks such as speech recognition and natural language processing, where the in
108
+ put data can be divided into distinct categories or classes, and each expert network can be trained to specialize in one of these cat
109
+ egories.
110
+
111
+ The combining mechanism in a MoE network can be a simple weighted average or a more complex function, such as a Bayesian mixture mode
112
+ l. The weights assigned to each expert network are learned during training, based on the accuracy of their predictions.
113
+
114
+ In summary, a Mixture of Experts is a neural network architecture that combines the strengths of multiple specialized networks to pro
115
+ duce a more accurate prediction than any individual network alone.
116
+
117
+ I hope this helps! Let me know if you have any other questions. 😊
118
+
119
+ ```
120
+ Interestingly we see a few sentences in Japanese when asked!
121
+
122
+ ```
123
+ <s>[INST] <<SYS>>
124
+ You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.
125
+
126
+ If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
127
+ <</SYS>>
128
+
129
+ Explain what a Mixture of Experts is in less than 100 words. Answer in japanese [/INST] 「Mixture of Experts」とは、機械学習アルGorithmの一種である「Experts」に複数の専門知識を持つモデルをを組み合わせる学習方法である。
130
+
131
+ この手法は especially useful in situations where、各分野 has a high degree of specialization and the data set is large、 and it can be used to improve the performance of the model by combining the knowledge of various experts.
132
+
133
+ The「Mixture of Experts」alGorithm works by dividing the input space into multiple partitions and assigning an "expert" to each partition. The "experts" are then trained on their respective partitions of the data, and at prediction time, the input is passed to all the "experts," and their predictions are combined to produce the final output.
134
+
135
+ The「Mixture of Experts」alGorithm is used in various applications, such as speech recognition、object recognition、 and natural language processing.
136
+
137
+ [1] Jordan, M. I. (1986). The use of mixtures of experts for adaptive parametric recognition. Cognitive Science, 10(2), 171-206.
138
+
139
+ ```