Instructions to use QuantFactory/Triangulum-1B-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use QuantFactory/Triangulum-1B-GGUF with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="QuantFactory/Triangulum-1B-GGUF") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("QuantFactory/Triangulum-1B-GGUF", dtype="auto") - llama-cpp-python
How to use QuantFactory/Triangulum-1B-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="QuantFactory/Triangulum-1B-GGUF", filename="Triangulum-1B.Q2_K.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use QuantFactory/Triangulum-1B-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf QuantFactory/Triangulum-1B-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf QuantFactory/Triangulum-1B-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf QuantFactory/Triangulum-1B-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf QuantFactory/Triangulum-1B-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf QuantFactory/Triangulum-1B-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf QuantFactory/Triangulum-1B-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf QuantFactory/Triangulum-1B-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf QuantFactory/Triangulum-1B-GGUF:Q4_K_M
Use Docker
docker model run hf.co/QuantFactory/Triangulum-1B-GGUF:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use QuantFactory/Triangulum-1B-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "QuantFactory/Triangulum-1B-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "QuantFactory/Triangulum-1B-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/QuantFactory/Triangulum-1B-GGUF:Q4_K_M
- SGLang
How to use QuantFactory/Triangulum-1B-GGUF with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "QuantFactory/Triangulum-1B-GGUF" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "QuantFactory/Triangulum-1B-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "QuantFactory/Triangulum-1B-GGUF" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "QuantFactory/Triangulum-1B-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Ollama
How to use QuantFactory/Triangulum-1B-GGUF with Ollama:
ollama run hf.co/QuantFactory/Triangulum-1B-GGUF:Q4_K_M
- Unsloth Studio new
How to use QuantFactory/Triangulum-1B-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for QuantFactory/Triangulum-1B-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for QuantFactory/Triangulum-1B-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for QuantFactory/Triangulum-1B-GGUF to start chatting
- Pi new
How to use QuantFactory/Triangulum-1B-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf QuantFactory/Triangulum-1B-GGUF:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "QuantFactory/Triangulum-1B-GGUF:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use QuantFactory/Triangulum-1B-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf QuantFactory/Triangulum-1B-GGUF:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default QuantFactory/Triangulum-1B-GGUF:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use QuantFactory/Triangulum-1B-GGUF with Docker Model Runner:
docker model run hf.co/QuantFactory/Triangulum-1B-GGUF:Q4_K_M
- Lemonade
How to use QuantFactory/Triangulum-1B-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull QuantFactory/Triangulum-1B-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.Triangulum-1B-GGUF-Q4_K_M
List all available models
lemonade list
# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("QuantFactory/Triangulum-1B-GGUF", dtype="auto")- QuantFactory/Triangulum-1B-GGUF
- Original Model Card
- Triangulum 1B: Multilingual Large Language Models (LLMs)
- Key Features & Model Architecture
- Training Approach
- How to use with transformers
- Demo Inference LlamaForCausalLM
- Key Adjustments
- Use Cases for T5B
- Technical Details
- How to Run Triangulum 5B on Ollama Locally
- Conclusion
QuantFactory/Triangulum-1B-GGUF
This is quantized version of prithivMLmods/Triangulum-1B created using llama.cpp
Original Model Card
__ .__ .__
_/ |_ _______ |__|_____ ____ ____ __ __ | | __ __ _____
\ __\\_ __ \| |\__ \ / \ / ___\ | | \| | | | \ / \
| | | | \/| | / __ \_| | \/ /_/ >| | /| |__| | /| Y Y \
|__| |__| |__|(____ /|___| /\___ / |____/ |____/|____/ |__|_| /
\/ \//_____/ \/
Triangulum 1B: Multilingual Large Language Models (LLMs)
Triangulum 1B is a collection of pretrained and instruction-tuned generative models, designed for multilingual applications. These models are trained using synthetic datasets based on long chains of thought, enabling them to perform complex reasoning tasks effectively.
Key Features & Model Architecture
Foundation Model: Built upon LLaMA's autoregressive language model, leveraging an optimized transformer architecture for enhanced performance.
Instruction Tuning: Includes supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align model outputs with human preferences for helpfulness and safety.
Multilingual Support: Designed to handle multiple languages, ensuring broad applicability across diverse linguistic contexts.
- Llama 3.2 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.
Training Approach
- Synthetic Datasets: Utilizes long chain-of-thought synthetic data to enhance reasoning capabilities.
- Supervised Fine-Tuning (SFT): Aligns the model to specific tasks through curated datasets.
- Reinforcement Learning with Human Feedback (RLHF): Ensures the model adheres to human values and safety guidelines through iterative training processes.
How to use with transformers
Starting with transformers >= 4.43.0 onward, you can run conversational inference using the Transformers pipeline abstraction or by leveraging the Auto classes with the generate() function.
Make sure to update your transformers installation via pip install --upgrade transformers.
import torch
from transformers import pipeline
model_id = "prithivMLmods/Triangulum-1B"
pipe = pipeline(
"text-generation",
model=model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
messages = [
{"role": "system", "content": "You are the kind and tri-intelligent assistant helping people to understand complex concepts."},
{"role": "user", "content": "Who are you?"},
]
outputs = pipe(
messages,
max_new_tokens=256,
)
print(outputs[0]["generated_text"][-1])
Demo Inference LlamaForCausalLM
import torch
from transformers import AutoTokenizer, LlamaForCausalLM
# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained('prithivMLmods/Triangulum-1B', trust_remote_code=True)
model = LlamaForCausalLM.from_pretrained(
"prithivMLmods/Triangulum-1B",
torch_dtype=torch.float16,
device_map="auto",
load_in_8bit=False,
load_in_4bit=True,
use_flash_attention_2=True
)
# Define a list of system and user prompts
prompts = [
"""<|im_start|>system
You are the kind and tri-intelligent assistant helping people to understand complex concepts.<|im_end|>
<|im_start|>user
Can you explain the concept of eigenvalues and eigenvectors in a simple way?<|im_end|>
<|im_start|>assistant"""
]
# Generate responses for each prompt
for chat in prompts:
print(f"Prompt:\n{chat}\n")
input_ids = tokenizer(chat, return_tensors="pt").input_ids.to("cuda")
generated_ids = model.generate(input_ids, max_new_tokens=750, temperature=0.8, repetition_penalty=1.1, do_sample=True, eos_token_id=tokenizer.eos_token_id)
response = tokenizer.decode(generated_ids[0][input_ids.shape[-1]:], skip_special_tokens=True, clean_up_tokenization_space=True)
print(f"Response:\n{response}\n{'-'*80}\n")
Key Adjustments
- System Prompts: Each prompt defines a different role or persona for the AI to adopt.
- User Prompts: These specify the context or task for the assistant, ranging from teaching to storytelling or career advice.
- Looping Through Prompts: Each prompt is processed in a loop to showcase the model's versatility.
You can expand the list of prompts to explore a variety of scenarios and responses.
Use Cases for T5B
- Multilingual content generation
- Question answering and dialogue systems
- Text summarization and analysis
- Translation and localization tasks
Technical Details
Triangulum 1B employs a state-of-the-art autoregressive architecture inspired by LLaMA. The optimized transformer framework ensures both efficiency and scalability, making it suitable for a variety of use cases.
How to Run Triangulum 5B on Ollama Locally
# How to Run Ollama Locally
This guide demonstrates the power of using open-source LLMs locally, showcasing examples with different open-source models for various use cases. By the end, you'll be equipped to run any future open-source LLM models with ease.
---
## Example 1: How to Run the Triangulum-1B Model
The **Triangulum-10B** model is an open-source LLM known for its capabilities across text-based tasks. We'll interact with it similarly to ChatGPT, but run it locally with support for quants.
### Step 1: Download the Model
First, download the **Triangulum-1B-F16.gguf** model using the following command:
```bash
ollama run triangulum-1b-f16.gguf
Step 2: Model Initialization and Download
Upon running the command, Ollama will initialize and download the model files. You should see output similar to the following:
pulling manifest
pulling 8934d96d3f08... 100% โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ 3.8 GB
pulling 8c17c2ebb0ea... 100% โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ 7.0 KB
pulling 7c23fb36d801... 100% โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ 4.8 KB
pulling 2e0493f67d0c... 100% โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ 59 B
pulling fa304d675061... 100% โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ 91 B
pulling 42ba7f8a01dd... 100% โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ 557 B
verifying sha256 digest
writing manifest
removing any unused layers
success
>>> Send a message (/? for help)
Step 3: Interact with the Model
Once the model is ready, you can send a prompt. For example, let's ask:
>>> What can you do for me?
If you asked the same question, you should receive a response like this:
As a responsible AI language model, I am here to assist you with any questions or tasks you may have. Here are some examples of things I can help with:
1. Answering questions: I can provide information on a wide range of topics, from science and technology to history and culture.
2. Generating ideas: I can help you brainstorm ideas for creative projects, or provide suggestions for solving problems.
3. Writing assistance: I can help you with writing tasks such as proofreading, editing, and suggesting alternative words or phrases.
4. Translation: I can translate text from one language to another.
5. Summarizing content: I can summarize long pieces of text, such as articles or documents, into shorter, more digestible versions.
6. Creativity: I can help you generate creative ideas for stories, poems, or other forms of writing.
7. Language learning: I can assist you in learning a new language by providing grammar explanations, vocabulary lists, and practice exercises.
8. Chatting: I'm here to chat with you and provide a response to any question or topic you'd like to discuss.
Please let me know if there is anything specific you would like me to help you with.
Step 4: Exit the Program
To exit the program, simply type:
/exit
Example 2: Running Multi-Modal Models (Future Use)
Ollama supports running multi-modal models where you can send images and ask questions based on them. This section will be updated as more models become available.
Notes on Using Quantized Models
Quantized models like triangulum-1b-f16.gguf are optimized for performance on resource-constrained hardware, making it accessible for local inference.
- Ensure your system has sufficient VRAM or CPU resources.
- Use the
.ggufmodel format for compatibility with Ollama.
Conclusion
Running the Triangulum-5B model with Ollama provides a robust way to leverage open-source LLMs locally for diverse use cases. By following these steps, you can explore the capabilities of other open-source models in the future.
- Downloads last month
- 81
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit

# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="QuantFactory/Triangulum-1B-GGUF") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)