Tiny Based Model
Collection
1 item • Updated
How to use indischepartij/TinyUltra-4x1.1B-Base-Alpha with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-generation", model="indischepartij/TinyUltra-4x1.1B-Base-Alpha")
messages = [
{"role": "user", "content": "Who are you?"},
]
pipe(messages) # Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("indischepartij/TinyUltra-4x1.1B-Base-Alpha")
model = AutoModelForCausalLM.from_pretrained("indischepartij/TinyUltra-4x1.1B-Base-Alpha")
messages = [
{"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))How to use indischepartij/TinyUltra-4x1.1B-Base-Alpha with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "indischepartij/TinyUltra-4x1.1B-Base-Alpha"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "indischepartij/TinyUltra-4x1.1B-Base-Alpha",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'docker model run hf.co/indischepartij/TinyUltra-4x1.1B-Base-Alpha
How to use indischepartij/TinyUltra-4x1.1B-Base-Alpha with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "indischepartij/TinyUltra-4x1.1B-Base-Alpha" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "indischepartij/TinyUltra-4x1.1B-Base-Alpha",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "indischepartij/TinyUltra-4x1.1B-Base-Alpha" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "indischepartij/TinyUltra-4x1.1B-Base-Alpha",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'How to use indischepartij/TinyUltra-4x1.1B-Base-Alpha with Docker Model Runner:
docker model run hf.co/indischepartij/TinyUltra-4x1.1B-Base-Alpha
TinyUltra-4x1.1B-Base-Alpha is a Mixure of Experts (MoE) made with the following models using MergeKit:
SYSTEM You are a TinyUltra, helpful and lovely AI assistant.
TEMPLATE <|system|> {{ .System }}</s> <|user|> {{ .Prompt }}</s> <|assistant|>
PARAMETER stop <|system|>
PARAMETER stop <|user|>
PARAMETER stop <|assistant|>
PARAMETER stop </s>
base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
gate_mode: hidden
dtype: float16
experts:
- source_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
positive_prompts:
- "Help me debug this code."
- "Rewrite this function in Python."
- "Optimize this C# script."
- "Implement this feature using JavaScript."
- "Convert this HTML structure into a more efficient design."
- "Assist me with writing a program that"
- source_model: vihangd/DopeyTinyLlama-1.1B-v1
positive_prompts:
- "How do you"
- "Explain the concept of"
- "Give an overview of"
- "Compare and contrast between"
- "Provide information about"
- "Help me understand"
- "Summarize"
- "Make a recommendation on"
- "Answer this question"
- source_model: cognitivecomputations/TinyDolphin-2.8.1-1.1b
positive_prompts:
- "Write a program to solve this problem"
- "Modify this function to improve its performance"
- "Refactor this code to enhance readability"
- "Create a custom function for this specific use case"
- "Optimize this algorithm to reduce computational complexity"
- "Implement this feature by extending existing codebase"
- "Integrate this API call into the application"
- "Help me troubleshoot and fix this bug"
- "Review and test this code snippet before deployment"
- "Analyze this error log to identify potential issues"
- "Generate a set of unit tests for this module"
- "Evaluate different approaches to solving this problem"
- "Do a web search for"
- "Use the plugin to"
- source_model: Josephgflowers/Tinyllama-Cinder-1.3B-Reason-Test
positive_prompts:
- "add these numbers"
- "whats 2+2"
- "subtraction"
- "division"
- "multiplication"
- "addition"
- "I need help with a math problem"
- "Solve for x"
- "Add these two numbers together: 4 + 3 = 7"
- "Multiply 5 by 6: 5 * 6 = 30"
- "Divide 8 by 2: 8 / 2 = 4"
- "Find the remainder when 9 is divided by 3: 9 % 3 = 0"
- "Calculate the square root of 16: sqrt(16) = 4"
- "Simplify the expression (a+b)/(c-d): (a+b)/(c-d)"
- "Factor out the common factor of 2 from 4x + 6y: 2(2x + 3y)"
- "Solve for x in the equation 3x - 7 = 2x + 5: x = 12"
- "Graph the line y = 2x + 3"
- "Approximate pi to three decimal places: 3.142"
- "Find the derivative of f(x) = sin(x): f'(x) = cos(x)"
- "Integrate g(x) = x^2 over the interval [0, 1]: g(1) - g(0) = 1/3"
- "Calculate the determinant of the matrix A = [[2, 3], [4, 5]]: det(A) = 2*5 - 3*4 = -2"
- "Solve the system of equations Ax = b: x = [-5, 10]"
- "Calculate the sum of the first n natural numbers using the formula Sn = n*(n+1)/2: sum(n=1 to 5) = 15"
!pip install -qU transformers bitsandbytes accelerate
from transformers import AutoTokenizer
import transformers
import torch
model = "gmonsoon/TinyUltra-4x1.1B-Base-Alpha"
tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
"text-generation",
model=model,
model_kwargs={"torch_dtype": torch.float16, "load_in_4bit": True},
)
messages = [{"role": "user", "content": "Explain what a Mixture of Experts is in less than 100 words."}]
prompt = pipeline.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
GGUF: https://huggingface.co/indischepartij/TinyUltra-4x1.1B-Base-Alpha-GGUF
Detailed results can be found here
| Metric | Value |
|---|---|
| Avg. | 37.94 |
| AI2 Reasoning Challenge (25-Shot) | 34.90 |
| HellaSwag (10-Shot) | 61.42 |
| MMLU (5-Shot) | 25.42 |
| TruthfulQA (0-shot) | 37.59 |
| Winogrande (5-shot) | 65.75 |
| GSM8k (5-shot) | 2.58 |