Instructions to use QuantFactory/Triangulum-1B-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use QuantFactory/Triangulum-1B-GGUF with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="QuantFactory/Triangulum-1B-GGUF")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("QuantFactory/Triangulum-1B-GGUF", dtype="auto")

llama-cpp-python

How to use QuantFactory/Triangulum-1B-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="QuantFactory/Triangulum-1B-GGUF",
	filename="Triangulum-1B.Q2_K.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use QuantFactory/Triangulum-1B-GGUF with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf QuantFactory/Triangulum-1B-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf QuantFactory/Triangulum-1B-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf QuantFactory/Triangulum-1B-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf QuantFactory/Triangulum-1B-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf QuantFactory/Triangulum-1B-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf QuantFactory/Triangulum-1B-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf QuantFactory/Triangulum-1B-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf QuantFactory/Triangulum-1B-GGUF:Q4_K_M

Use Docker

docker model run hf.co/QuantFactory/Triangulum-1B-GGUF:Q4_K_M

LM Studio
Jan

vLLM

How to use QuantFactory/Triangulum-1B-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "QuantFactory/Triangulum-1B-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "QuantFactory/Triangulum-1B-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/QuantFactory/Triangulum-1B-GGUF:Q4_K_M

SGLang

How to use QuantFactory/Triangulum-1B-GGUF with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "QuantFactory/Triangulum-1B-GGUF" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "QuantFactory/Triangulum-1B-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "QuantFactory/Triangulum-1B-GGUF" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "QuantFactory/Triangulum-1B-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use QuantFactory/Triangulum-1B-GGUF with Ollama:
```
ollama run hf.co/QuantFactory/Triangulum-1B-GGUF:Q4_K_M
```

Unsloth Studio

How to use QuantFactory/Triangulum-1B-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for QuantFactory/Triangulum-1B-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for QuantFactory/Triangulum-1B-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for QuantFactory/Triangulum-1B-GGUF to start chatting

How to use QuantFactory/Triangulum-1B-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf QuantFactory/Triangulum-1B-GGUF:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "QuantFactory/Triangulum-1B-GGUF:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use QuantFactory/Triangulum-1B-GGUF with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf QuantFactory/Triangulum-1B-GGUF:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default QuantFactory/Triangulum-1B-GGUF:Q4_K_M

Run Hermes

hermes

Atomic Chat new

OpenClaw new

How to use QuantFactory/Triangulum-1B-GGUF with OpenClaw:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf QuantFactory/Triangulum-1B-GGUF:Q4_K_M

Configure OpenClaw

# Install OpenClaw:
npm install -g openclaw@latest
# Register the local server and set it as the default model:
openclaw onboard --non-interactive --mode local \
  --auth-choice custom-api-key \
  --custom-base-url http://127.0.0.1:8080/v1 \
  --custom-model-id "QuantFactory/Triangulum-1B-GGUF:Q4_K_M" \
  --custom-provider-id llama-cpp \
  --custom-compatibility openai \
  --custom-text-input \
  --accept-risk \
  --skip-health

Run OpenClaw

openclaw agent --local --agent main --message "Hello from Hugging Face"

Docker Model Runner
How to use QuantFactory/Triangulum-1B-GGUF with Docker Model Runner:
```
docker model run hf.co/QuantFactory/Triangulum-1B-GGUF:Q4_K_M
```

Lemonade

How to use QuantFactory/Triangulum-1B-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull QuantFactory/Triangulum-1B-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.Triangulum-1B-GGUF-Q4_K_M

List all available models

lemonade list

QuantFactory/Triangulum-1B-GGUF

This is quantized version of prithivMLmods/Triangulum-1B created using llama.cpp

Original Model Card

  __           .__                                .__                   
_/  |_ _______ |__|_____     ____    ____   __ __ |  |   __ __   _____  
\   __\\_  __ \|  |\__  \   /    \  / ___\ |  |  \|  |  |  |  \ /     \ 
 |  |   |  | \/|  | / __ \_|   |  \/ /_/  >|  |  /|  |__|  |  /|  Y Y  \
 |__|   |__|   |__|(____  /|___|  /\___  / |____/ |____/|____/ |__|_|  /
                        \/      \//_____/                            \/

Triangulum 1B: Multilingual Large Language Models (LLMs)

Triangulum 1B is a collection of pretrained and instruction-tuned generative models, designed for multilingual applications. These models are trained using synthetic datasets based on long chains of thought, enabling them to perform complex reasoning tasks effectively.

Key Features & Model Architecture

Foundation Model: Built upon LLaMA's autoregressive language model, leveraging an optimized transformer architecture for enhanced performance.
Instruction Tuning: Includes supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align model outputs with human preferences for helpfulness and safety.
Multilingual Support: Designed to handle multiple languages, ensuring broad applicability across diverse linguistic contexts.

Llama 3.2 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.

Training Approach

Synthetic Datasets: Utilizes long chain-of-thought synthetic data to enhance reasoning capabilities.
Supervised Fine-Tuning (SFT): Aligns the model to specific tasks through curated datasets.
Reinforcement Learning with Human Feedback (RLHF): Ensures the model adheres to human values and safety guidelines through iterative training processes.

How to use with transformers

Starting with transformers >= 4.43.0 onward, you can run conversational inference using the Transformers pipeline abstraction or by leveraging the Auto classes with the generate() function.

Make sure to update your transformers installation via pip install --upgrade transformers.

import torch
from transformers import pipeline

model_id = "prithivMLmods/Triangulum-1B"
pipe = pipeline(
    "text-generation",
    model=model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
messages = [
    {"role": "system", "content": "You are the kind and tri-intelligent assistant helping people to understand complex concepts."},
    {"role": "user", "content": "Who are you?"},
]
outputs = pipe(
    messages,
    max_new_tokens=256,
)
print(outputs[0]["generated_text"][-1])

Demo Inference LlamaForCausalLM

import torch
from transformers import AutoTokenizer, LlamaForCausalLM

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained('prithivMLmods/Triangulum-1B', trust_remote_code=True)
model = LlamaForCausalLM.from_pretrained(
    "prithivMLmods/Triangulum-1B",
    torch_dtype=torch.float16,
    device_map="auto",
    load_in_8bit=False,
    load_in_4bit=True,
    use_flash_attention_2=True
)

# Define a list of system and user prompts
prompts = [
    """<|im_start|>system
You are the kind and tri-intelligent assistant helping people to understand complex concepts.<|im_end|>
<|im_start|>user
Can you explain the concept of eigenvalues and eigenvectors in a simple way?<|im_end|>
<|im_start|>assistant"""
]

# Generate responses for each prompt
for chat in prompts:
    print(f"Prompt:\n{chat}\n")
    input_ids = tokenizer(chat, return_tensors="pt").input_ids.to("cuda")
    generated_ids = model.generate(input_ids, max_new_tokens=750, temperature=0.8, repetition_penalty=1.1, do_sample=True, eos_token_id=tokenizer.eos_token_id)
    response = tokenizer.decode(generated_ids[0][input_ids.shape[-1]:], skip_special_tokens=True, clean_up_tokenization_space=True)
    print(f"Response:\n{response}\n{'-'*80}\n")

Key Adjustments

System Prompts: Each prompt defines a different role or persona for the AI to adopt.
User Prompts: These specify the context or task for the assistant, ranging from teaching to storytelling or career advice.
Looping Through Prompts: Each prompt is processed in a loop to showcase the model's versatility.

You can expand the list of prompts to explore a variety of scenarios and responses.

Use Cases for T5B

Multilingual content generation
Question answering and dialogue systems
Text summarization and analysis
Translation and localization tasks

Technical Details

Triangulum 1B employs a state-of-the-art autoregressive architecture inspired by LLaMA. The optimized transformer framework ensures both efficiency and scalability, making it suitable for a variety of use cases.

How to Run Triangulum 5B on Ollama Locally

# How to Run Ollama Locally

This guide demonstrates the power of using open-source LLMs locally, showcasing examples with different open-source models for various use cases. By the end, you'll be equipped to run any future open-source LLM models with ease.

---

## Example 1: How to Run the Triangulum-1B Model

The **Triangulum-10B** model is an open-source LLM known for its capabilities across text-based tasks. We'll interact with it similarly to ChatGPT, but run it locally with support for quants.

### Step 1: Download the Model

First, download the **Triangulum-1B-F16.gguf** model using the following command:

```bash
ollama run triangulum-1b-f16.gguf

Step 2: Model Initialization and Download

Upon running the command, Ollama will initialize and download the model files. You should see output similar to the following:

pulling manifest
pulling 8934d96d3f08... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████▏ 3.8 GB
pulling 8c17c2ebb0ea... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████▏ 7.0 KB
pulling 7c23fb36d801... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████▏ 4.8 KB
pulling 2e0493f67d0c... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████▏   59 B
pulling fa304d675061... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████▏   91 B
pulling 42ba7f8a01dd... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████▏  557 B
verifying sha256 digest
writing manifest
removing any unused layers
success
>>> Send a message (/? for help)

Step 3: Interact with the Model

Once the model is ready, you can send a prompt. For example, let's ask:

>>> What can you do for me?

If you asked the same question, you should receive a response like this:

As a responsible AI language model, I am here to assist you with any questions or tasks you may have. Here are some examples of things I can help with:

1. Answering questions: I can provide information on a wide range of topics, from science and technology to history and culture.
2. Generating ideas: I can help you brainstorm ideas for creative projects, or provide suggestions for solving problems.
3. Writing assistance: I can help you with writing tasks such as proofreading, editing, and suggesting alternative words or phrases.
4. Translation: I can translate text from one language to another.
5. Summarizing content: I can summarize long pieces of text, such as articles or documents, into shorter, more digestible versions.
6. Creativity: I can help you generate creative ideas for stories, poems, or other forms of writing.
7. Language learning: I can assist you in learning a new language by providing grammar explanations, vocabulary lists, and practice exercises.
8. Chatting: I'm here to chat with you and provide a response to any question or topic you'd like to discuss.

Please let me know if there is anything specific you would like me to help you with.

Step 4: Exit the Program

To exit the program, simply type:

/exit

Example 2: Running Multi-Modal Models (Future Use)

Ollama supports running multi-modal models where you can send images and ask questions based on them. This section will be updated as more models become available.

Notes on Using Quantized Models

Quantized models like triangulum-1b-f16.gguf are optimized for performance on resource-constrained hardware, making it accessible for local inference.

Ensure your system has sufficient VRAM or CPU resources.
Use the .gguf model format for compatibility with Ollama.

Conclusion

Running the Triangulum-5B model with Ollama provides a robust way to leverage open-source LLMs locally for diverse use cases. By following these steps, you can explore the capabilities of other open-source models in the future.

Downloads last month: 16

GGUF

Model size

1B params

Architecture

llama

Hardware compatibility

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit