Isotonic
/

AdaptLLM-4x7B-MoE

Text Generation

Mixture of Experts

mlabonne/NeuralBeagle14-7B

AdaptLLM/finance-chat

AdaptLLM/medicine-chat

AdaptLLM/law-chat

text-generation-inference

Model card Files Files and versions

Isotonic commited on Feb 3, 2024

Commit

36fe3f1

·

verified ·

1 Parent(s): f0635bb

Update README.md

Files changed (1) hide show

README.md +37 -24

README.md CHANGED Viewed

@@ -9,6 +9,12 @@ tags:
 - AdaptLLM/finance-chat
 - AdaptLLM/medicine-chat
 - AdaptLLM/law-chat
 ---
 # AdaptLLM-4x7B-MoE
@@ -19,6 +25,37 @@ AdaptLLM-4x7B-MoE is a Mixure of Experts (MoE) made with the following models us
 * [AdaptLLM/medicine-chat](https://huggingface.co/AdaptLLM/medicine-chat)
 * [AdaptLLM/law-chat](https://huggingface.co/AdaptLLM/law-chat)
 ## 🧩 Configuration
 ```yaml
@@ -93,28 +130,4 @@ experts:
     - "litigation"
     - "arbitration"
     - "mediation"
-```
-## 💻 Usage
-```python
-!pip install -qU transformers bitsandbytes accelerate
-from transformers import AutoTokenizer
-import transformers
-import torch
-model = "Isotonic/AdaptLLM-4x7B-MoE"
-tokenizer = AutoTokenizer.from_pretrained(model)
-pipeline = transformers.pipeline(
-    "text-generation",
-    model=model,
-    model_kwargs={"torch_dtype": torch.float16, "load_in_4bit": True},
-)
-messages = [{"role": "user", "content": "Explain what a Mixture of Experts is in less than 100 words."}]
-prompt = pipeline.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
-outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
-print(outputs[0]["generated_text"])
 ```

 - AdaptLLM/finance-chat
 - AdaptLLM/medicine-chat
 - AdaptLLM/law-chat
+datasets:
+- Open-Orca/OpenOrca
+- WizardLM/WizardLM_evol_instruct_V2_196k
+- EleutherAI/pile
+- GAIR/lima
+pipeline_tag: text-generation
 ---
 # AdaptLLM-4x7B-MoE
 * [AdaptLLM/medicine-chat](https://huggingface.co/AdaptLLM/medicine-chat)
 * [AdaptLLM/law-chat](https://huggingface.co/AdaptLLM/law-chat)
+## 💻 Usage
+```python
+!pip install -qU transformers bitsandbytes accelerate
+from transformers import AutoTokenizer
+import transformers
+import torch
+model = "Isotonic/AdaptLLM-4x7B-MoE"
+tokenizer = AutoTokenizer.from_pretrained(model)
+pipeline = transformers.pipeline(
+    "text-generation",
+    model=model,
+    model_kwargs={
+    "torch_dtype": torch.float16,
+    "low_cpu_mem_usage": True,
+    "use_cache" : False,
+    "gradient_checkpointing" : True,
+    "device_map" : 'auto',
+    "load_in_8bit" : True
+    },
+)
+messages = [{"role": "user", "content": "Explain what a Mixture of Experts is in less than 100 words."}]
+prompt = pipeline.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+outputs = pipeline(prompt, max_new_tokens=512, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
+print(outputs[0]["generated_text"])
+```
 ## 🧩 Configuration
 ```yaml
     - "litigation"
     - "arbitration"
     - "mediation"
 ```