Instructions to use indischepartij/TinyUltra-4x1.1B-Base-Alpha with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use indischepartij/TinyUltra-4x1.1B-Base-Alpha with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="indischepartij/TinyUltra-4x1.1B-Base-Alpha")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("indischepartij/TinyUltra-4x1.1B-Base-Alpha")
model = AutoModelForCausalLM.from_pretrained("indischepartij/TinyUltra-4x1.1B-Base-Alpha")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use indischepartij/TinyUltra-4x1.1B-Base-Alpha with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "indischepartij/TinyUltra-4x1.1B-Base-Alpha"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "indischepartij/TinyUltra-4x1.1B-Base-Alpha",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/indischepartij/TinyUltra-4x1.1B-Base-Alpha

SGLang

How to use indischepartij/TinyUltra-4x1.1B-Base-Alpha with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "indischepartij/TinyUltra-4x1.1B-Base-Alpha" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "indischepartij/TinyUltra-4x1.1B-Base-Alpha",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "indischepartij/TinyUltra-4x1.1B-Base-Alpha" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "indischepartij/TinyUltra-4x1.1B-Base-Alpha",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use indischepartij/TinyUltra-4x1.1B-Base-Alpha with Docker Model Runner:
```
docker model run hf.co/indischepartij/TinyUltra-4x1.1B-Base-Alpha
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

TinyUltra-4x1.1B-Base-Alpha

TinyUltra-4x1.1B-Base-Alpha is a Mixure of Experts (MoE) made with the following models using MergeKit:

Modelfile/Prompt format

SYSTEM You are a TinyUltra, helpful and lovely AI assistant.

TEMPLATE <|system|> {{ .System }}</s> <|user|> {{ .Prompt }}</s> <|assistant|>

PARAMETER stop <|system|>
PARAMETER stop <|user|>
PARAMETER stop <|assistant|>
PARAMETER stop </s>

🧩 Configuration

base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
gate_mode: hidden
dtype: float16
experts:
  - source_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
    positive_prompts:
    - "Help me debug this code."
    - "Rewrite this function in Python."
    - "Optimize this C# script."
    - "Implement this feature using JavaScript."
    - "Convert this HTML structure into a more efficient design."
    - "Assist me with writing a program that"
  - source_model: vihangd/DopeyTinyLlama-1.1B-v1
    positive_prompts:
    - "How do you"
    - "Explain the concept of"
    - "Give an overview of"
    - "Compare and contrast between"
    - "Provide information about"
    - "Help me understand"
    - "Summarize"
    - "Make a recommendation on"
    - "Answer this question"
  - source_model: cognitivecomputations/TinyDolphin-2.8.1-1.1b
    positive_prompts:
    - "Write a program to solve this problem"
    - "Modify this function to improve its performance"
    - "Refactor this code to enhance readability"
    - "Create a custom function for this specific use case"
    - "Optimize this algorithm to reduce computational complexity"
    - "Implement this feature by extending existing codebase"
    - "Integrate this API call into the application"
    - "Help me troubleshoot and fix this bug"
    - "Review and test this code snippet before deployment"
    - "Analyze this error log to identify potential issues"
    - "Generate a set of unit tests for this module"
    - "Evaluate different approaches to solving this problem"
    - "Do a web search for"
    - "Use the plugin to"
  - source_model: Josephgflowers/Tinyllama-Cinder-1.3B-Reason-Test
    positive_prompts:
    - "add these numbers"
    - "whats 2+2"
    - "subtraction"
    - "division"
    - "multiplication"
    - "addition"
    - "I need help with a math problem"
    - "Solve for x"
    - "Add these two numbers together: 4 + 3 = 7"
    - "Multiply 5 by 6: 5 * 6 = 30"
    - "Divide 8 by 2: 8 / 2 = 4"
    - "Find the remainder when 9 is divided by 3: 9 % 3 = 0"
    - "Calculate the square root of 16: sqrt(16) = 4"
    - "Simplify the expression (a+b)/(c-d): (a+b)/(c-d)"
    - "Factor out the common factor of 2 from 4x + 6y: 2(2x + 3y)"
    - "Solve for x in the equation 3x - 7 = 2x + 5: x = 12"
    - "Graph the line y = 2x + 3"
    - "Approximate pi to three decimal places: 3.142"
    - "Find the derivative of f(x) = sin(x): f'(x) = cos(x)"
    - "Integrate g(x) = x^2 over the interval [0, 1]: g(1) - g(0) = 1/3"
    - "Calculate the determinant of the matrix A = [[2, 3], [4, 5]]: det(A) = 2*5 - 3*4 = -2"
    - "Solve the system of equations Ax = b: x = [-5, 10]"
    - "Calculate the sum of the first n natural numbers using the formula Sn = n*(n+1)/2: sum(n=1 to 5) = 15"

💻 Usage

!pip install -qU transformers bitsandbytes accelerate

from transformers import AutoTokenizer
import transformers
import torch

model = "gmonsoon/TinyUltra-4x1.1B-Base-Alpha"

tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    model_kwargs={"torch_dtype": torch.float16, "load_in_4bit": True},
)

messages = [{"role": "user", "content": "Explain what a Mixture of Experts is in less than 100 words."}]
prompt = pipeline.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])

GGUF: https://huggingface.co/indischepartij/TinyUltra-4x1.1B-Base-Alpha-GGUF

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	37.94
AI2 Reasoning Challenge (25-Shot)	34.90
HellaSwag (10-Shot)	61.42
MMLU (5-Shot)	25.42
TruthfulQA (0-shot)	37.59
Winogrande (5-shot)	65.75
GSM8k (5-shot)	2.58

Downloads last month: 116

Safetensors

Model size

3B params

Tensor type

F16

Model tree for indischepartij/TinyUltra-4x1.1B-Base-Alpha

Quantizations

1 model

Collection including indischepartij/TinyUltra-4x1.1B-Base-Alpha

Tiny Based Model

Collection

1 item • Updated Mar 10, 2024

Evaluation results

normalized accuracy on AI2 Reasoning Challenge (25-Shot)
test set Open LLM Leaderboard

34.900
normalized accuracy on HellaSwag (10-Shot)
validation set Open LLM Leaderboard

61.420
accuracy on MMLU (5-Shot)
test set Open LLM Leaderboard

25.420
mc2 on TruthfulQA (0-shot)
validation set Open LLM Leaderboard

37.590
accuracy on Winogrande (5-shot)
validation set Open LLM Leaderboard

65.750
accuracy on GSM8k (5-shot)
test set Open LLM Leaderboard

2.580