Text Generation
Transformers
Safetensors
mixtral
Merge
mergekit
Mixture of Experts
frankenmoe
abacusai/Llama-3-Smaug-8B
cognitivecomputations/dolphin-2.9-llama3-8b
Weyaxi/Einstein-v6.1-Llama3-8B
dreamgen-preview/opus-v1.2-llama-3-8b-base-run3.4-epoch2
text-generation-inference
Instructions to use saucam/Skyro-4X8B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use saucam/Skyro-4X8B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="saucam/Skyro-4X8B")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("saucam/Skyro-4X8B") model = AutoModelForCausalLM.from_pretrained("saucam/Skyro-4X8B") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use saucam/Skyro-4X8B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "saucam/Skyro-4X8B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "saucam/Skyro-4X8B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/saucam/Skyro-4X8B
- SGLang
How to use saucam/Skyro-4X8B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "saucam/Skyro-4X8B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "saucam/Skyro-4X8B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "saucam/Skyro-4X8B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "saucam/Skyro-4X8B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use saucam/Skyro-4X8B with Docker Model Runner:
docker model run hf.co/saucam/Skyro-4X8B
🚀 Skyro-4X8B
Skyro-4X8B is a Mixure of Experts (MoE) made with the following models using Mergekit:
- abacusai/Llama-3-Smaug-8B
- cognitivecomputations/dolphin-2.9-llama3-8b
- Weyaxi/Einstein-v6.1-Llama3-8B
- dreamgen-preview/opus-v1.2-llama-3-8b-base-run3.4-epoch2
🧩 Configuration
base_model: meta-llama/Meta-Llama-3-8B
gate_mode: hidden
experts:
- source_model: abacusai/Llama-3-Smaug-8B
positive_prompts:
- "chat"
- "assistant"
- "tell me"
- "explain"
- "I want"
- source_model: cognitivecomputations/dolphin-2.9-llama3-8b
positive_prompts:
- "math"
- "mathematics"
- "code"
- "engineering"
- "solve"
- "logic"
- "rationality"
- "puzzle"
- "solve"
- source_model: Weyaxi/Einstein-v6.1-Llama3-8B
positive_prompts:
- "science"
- "medical"
- "physics"
- "engineering"
- "math"
- "logic"
- "rationality"
- "mathematics"
- "solve"
- source_model: dreamgen-preview/opus-v1.2-llama-3-8b-base-run3.4-epoch2
positive_prompts:
- "story"
- "roleplay"
- "role-play"
- "storywriting"
- "character"
- "narrative"
- "creative"
Evaluation
| Average | ARC | HellaSwag | MMLU | TruthfulQA | Winogrande | GSM8K |
|---|---|---|---|---|---|---|
| 66.39 | 61.26 | 82.38 | 66.67 | 50.15 | 77.66 | 60.2 |
💻 Usage
!pip install -qU transformers accelerate
from transformers import AutoTokenizer
import transformers
import torch
model = "saucam/Skyro-4X8B"
messages = [{"role": "user", "content": "In a student council election, candidate A got 20% of the votes while candidate B got 50% more than candidate A's votes. The rest of the votes was given to candidate C. If there were 100 voters, how many votes did candidate C get?"}]
tokenizer = AutoTokenizer.from_pretrained(model)
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
pipeline = transformers.pipeline(
"text-generation",
model=model,
torch_dtype=torch.float16,
device_map="auto",
)
outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
Sample output
config.json: 100%|██████████████████████████████████████████████████████████████| 878/878 [00:00<00:00, 4.18MB/s]
model.safetensors.index.json: 100%|██████████████████████████████████████████| 53.5k/53.5k [00:00<00:00, 101MB/s]
model-00001-of-00006.safetensors: 100%|█████████████████████████████████████| 9.89G/9.89G [03:47<00:00, 43.4MB/s]
model-00002-of-00006.safetensors: 100%|█████████████████████████████████████| 9.98G/9.98G [03:23<00:00, 49.0MB/s]
model-00003-of-00006.safetensors: 100%|█████████████████████████████████████| 9.98G/9.98G [03:44<00:00, 44.5MB/s]
model-00004-of-00006.safetensors: 100%|█████████████████████████████████████| 9.90G/9.90G [03:30<00:00, 46.9MB/s]
model-00005-of-00006.safetensors: 100%|█████████████████████████████████████| 9.08G/9.08G [03:08<00:00, 48.1MB/s]
model-00006-of-00006.safetensors: 100%|█████████████████████████████████████| 1.05G/1.05G [00:20<00:00, 51.3MB/s]
Downloading shards: 100%|█████████████████████████████████████████████████████████| 6/6 [17:58<00:00, 179.78s/it]
Loading checkpoint shards: 100%|███████████████████████████████████████████████████| 6/6 [01:27<00:00, 14.59s/it]
WARNING:root:Some parameters are on the meta device device because they were offloaded to the cpu.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
<|im_start|>user
In a student council election, candidate A got 20% of the votes while candidate B got 50% more than candidate A's votes. The rest of the votes was given to candidate C. If there were 100 voters, how many votes did candidate C get?<|im_end|>
<|im_start|>assistant
Let's denote the number of votes candidate A got as \( A \).
Candidate B got 50% more votes than candidate A, so candidate B got \( A + 0.5A = 1.5A \) votes.
Candidate C got the rest of the votes, which means \( C = 100 - (A + 1.5A) \).
We know that candidate A got 20% of the votes, so \( A = 20\% \times 100 = 20 \).
Now we can calculate candidate C's votes:
\( C = 100 - (20 + 1.5 \times 20) \)
\( C = 100 - (20 + 30) \)
\( C = 100 - 50 \)
\( C = 50 \).
Therefore, candidate C got 50 votes.<|im_end|>
- Downloads last month
- 9
Model tree for saucam/Skyro-4X8B
Merge model
this model
