Instructions to use QuantFactory/Teleut-7b-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use QuantFactory/Teleut-7b-GGUF with Transformers:

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("QuantFactory/Teleut-7b-GGUF", dtype="auto")

llama-cpp-python

How to use QuantFactory/Teleut-7b-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="QuantFactory/Teleut-7b-GGUF",
	filename="Teleut-7b.Q2_K.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use QuantFactory/Teleut-7b-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf QuantFactory/Teleut-7b-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf QuantFactory/Teleut-7b-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf QuantFactory/Teleut-7b-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf QuantFactory/Teleut-7b-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf QuantFactory/Teleut-7b-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf QuantFactory/Teleut-7b-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf QuantFactory/Teleut-7b-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf QuantFactory/Teleut-7b-GGUF:Q4_K_M

Use Docker

docker model run hf.co/QuantFactory/Teleut-7b-GGUF:Q4_K_M

LM Studio
Jan
Ollama
How to use QuantFactory/Teleut-7b-GGUF with Ollama:
```
ollama run hf.co/QuantFactory/Teleut-7b-GGUF:Q4_K_M
```

Unsloth Studio new

How to use QuantFactory/Teleut-7b-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for QuantFactory/Teleut-7b-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for QuantFactory/Teleut-7b-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for QuantFactory/Teleut-7b-GGUF to start chatting

Docker Model Runner
How to use QuantFactory/Teleut-7b-GGUF with Docker Model Runner:
```
docker model run hf.co/QuantFactory/Teleut-7b-GGUF:Q4_K_M
```

Lemonade

How to use QuantFactory/Teleut-7b-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull QuantFactory/Teleut-7b-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.Teleut-7b-GGUF-Q4_K_M

List all available models

lemonade list

Improve language tag

by lbourdois - opened Apr 27, 2025

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

+185

-174

Files changed (1) hide show

README.md +185 -174

README.md CHANGED Viewed

@@ -1,174 +1,185 @@
----
-library_name: transformers
-license: apache-2.0
-base_model: Qwen/Qwen2.5-7B
-datasets:
-- allenai/tulu-3-sft-mixture
----
-[![QuantFactory Banner](https://lh7-rt.googleusercontent.com/docsz/AD_4nXeiuCm7c8lEwEJuRey9kiVZsRn2W-b4pWlu3-X534V3YmVuVc2ZL-NXg2RkzSOOS2JXGHutDuyyNAUtdJI65jGTo8jT9Y99tMi4H4MqL44Uc5QKG77B0d6-JfIkZHFaUA71-RtjyYZWVIhqsNZcx8-OMaA?key=xt3VSDoCbmTY7o-cwwOFwQ)](https://hf.co/QuantFactory)
-# QuantFactory/Teleut-7b-GGUF
-This is quantized version of [allura-org/Teleut-7b](https://huggingface.co/allura-org/Teleut-7b) created using llama.cpp
-# Original Model Card
-# Teleut 7b
-![image/png](https://cdn-uploads.huggingface.co/production/uploads/634262af8d8089ebaefd410e/UqIi8eztdptvt52Mak_1K.png)
-A replication attempt of Tulu 3 on the Qwen 2.5 base models.
-## Evals (so far)
-|                         | Teleut 7B (measured) | Tülu 3 SFT 8B (reported) | Qwen 2.5 7B Instruct (reported) | Ministral 8B (reported) | Mistral 7B v0.3 (reported)
-|-------------------------|----------------------|--------------------------|---------------------------------|-------------------------|---------------------------
-|BBH (3 shot, CoT)        |*64.4%*               |**67.9%**                 |21.7%                            |56.2%                    |47.0%<sup>NLL</sup>
-|GSM8K (8 shot, CoT)      |78.5%                 |76.2%                     |**83.8%**                        |*80.0%*                  |xx.x%
-|IFEval (prompt loose)    |66.3%                 |*72.8%*                   |**74.7%**                        |56.4%                    |53.0%
-|MMLU (0 shot, CoT)       |*73.2%*               |65.9%                     |**76.6%**                        |68.5%                    |30.7%<sup>5-shot</sup>
-|MMLU Pro (0 shot, CoT)   |*48.3%*               |44.3%                     |**56.3%**<sup>Unknown</sup>      |32.9%<sup>5-shot</sup>   |30.7%<sup>5-shot</sup>
-|PopQA (15 shot)          |18.9%                 |**29.3%**                 |18.1%                            |*20.2%*                  |xx.x%
-|TruthfulQA               |47.2%                 |46.8%                     |**63.1%**                        |*55.5%*                  |xx.x%
-## Credits
-Big thanks to Retis Labs for being providing my 8xH100 polycule used to train and test this model!
-Another big thanks to AllenAI for publishing the Tülu 3 data and model series (as well as the paper and details on training), as well as Alibaba for training the original Qwen 2.5 base model series!
-```
-@article{lambert2024tulu3,
-  title = {Tülu 3: Pushing Frontiers in Open Language Model Post-Training},
-  author = {
-    Nathan Lambert and
-    Jacob Morrison and
-    Valentina Pyatkin and
-    Shengyi Huang and
-    Hamish Ivison and
-    Faeze Brahman and
-    Lester James V. Miranda and
-    Alisa Liu and
-    Nouha Dziri and
-    Shane Lyu and
-    Yuling Gu and
-    Saumya Malik and
-    Victoria Graf and
-    Jena D. Hwang and
-    Jiangjiang Yang and
-    Ronan Le Bras and
-    Oyvind Tafjord and
-    Chris Wilhelm and
-    Luca Soldaini and
-    Noah A. Smith and
-    Yizhong Wang and
-    Pradeep Dasigi and
-    Hannaneh Hajishirzi
-  },
-  year = {2024},
-  email = {tulu@allenai.org}
-}
-```
-## Training procedure
-[<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
-### Training hyperparameters
-The following hyperparameters were used during training:
-- learning_rate: 3.5e-06
-- train_batch_size: 8
-- eval_batch_size: 8
-- seed: 42
-- distributed_type: multi-GPU
-- num_devices: 8
-- gradient_accumulation_steps: 2
-- total_train_batch_size: 128
-- total_eval_batch_size: 64
-- optimizer: Use paged_ademamix_8bit and the args are:
-No additional optimizer arguments
-- lr_scheduler_type: cosine
-- lr_scheduler_warmup_steps: 370
-- num_epochs: 1
-### Framework versions
-- Transformers 4.46.3
-- Pytorch 2.5.1+cu124
-- Datasets 3.1.0
-- Tokenizers 0.20.3
-### Configuration
-<details><summary>See axolotl config</summary>
-axolotl version: `0.5.2`
-```yaml
-base_model: Qwen/Qwen2.5-7B
-plugins:
-  - axolotl.integrations.liger.LigerPlugin
-liger_rope: true
-liger_rms_norm: true
-liger_glu_activation: true
-liger_fused_linear_cross_entropy: true
-strict: false
-chat_template: chatml
-datasets:
-  - path: allenai/tulu-3-sft-mixture
-    type: chat_template
-    split: train
-    field_messages: messages
-dataset_prepared_path: last_run_prepared
-#val_set_size: 0.02
-output_dir: ./ckpts
-sequence_len: 8192
-#sample_packing: true
-pad_to_sequence_len: true
-wandb_project: qwen-2.5-7b-sft
-wandb_entity:
-wandb_watch:
-wandb_name:
-wandb_log_model:
-gradient_accumulation_steps: 2
-micro_batch_size: 8
-num_epochs: 1
-optimizer: paged_ademamix_8bit
-lr_scheduler: cosine
-learning_rate: 3.5e-6
-train_on_inputs: false
-group_by_length: false
-bf16: auto
-fp16:
-tf32: false
-gradient_checkpointing: true
-gradient_checkpointing_kwargs:
-  use_reentrant: false
-early_stopping_patience:
-resume_from_checkpoint:
-logging_steps: 1
-xformers_attention:
-flash_attention: true
-deepspeed: deepspeed_configs/zero3_bf16.json
-warmup_steps: 370
-#evals_per_epoch: 4
-eval_table_size:
-saves_per_epoch: 2
-debug:
-weight_decay: 0.0
-```
-</details><br>

+---
+library_name: transformers
+license: apache-2.0
+base_model: Qwen/Qwen2.5-7B
+datasets:
+- allenai/tulu-3-sft-mixture
+language:
+- zho
+- eng
+- fra
+- spa
+- por
+- deu
+- ita
+- rus
+- jpn
+- kor
+- vie
+- tha
+- ara
+---
+[![QuantFactory Banner](https://lh7-rt.googleusercontent.com/docsz/AD_4nXeiuCm7c8lEwEJuRey9kiVZsRn2W-b4pWlu3-X534V3YmVuVc2ZL-NXg2RkzSOOS2JXGHutDuyyNAUtdJI65jGTo8jT9Y99tMi4H4MqL44Uc5QKG77B0d6-JfIkZHFaUA71-RtjyYZWVIhqsNZcx8-OMaA?key=xt3VSDoCbmTY7o-cwwOFwQ)](https://hf.co/QuantFactory)
+# QuantFactory/Teleut-7b-GGUF
+This is quantized version of [allura-org/Teleut-7b](https://huggingface.co/allura-org/Teleut-7b) created using llama.cpp
+# Original Model Card
+# Teleut 7b
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/634262af8d8089ebaefd410e/UqIi8eztdptvt52Mak_1K.png)
+A replication attempt of Tulu 3 on the Qwen 2.5 base models.
+## Evals (so far)
+|                         | Teleut 7B (measured) | Tülu 3 SFT 8B (reported) | Qwen 2.5 7B Instruct (reported) | Ministral 8B (reported) | Mistral 7B v0.3 (reported)
+|-------------------------|----------------------|--------------------------|---------------------------------|-------------------------|---------------------------
+|BBH (3 shot, CoT)        |*64.4%*               |**67.9%**                 |21.7%                            |56.2%                    |47.0%<sup>NLL</sup>
+|GSM8K (8 shot, CoT)      |78.5%                 |76.2%                     |**83.8%**                        |*80.0%*                  |xx.x%
+|IFEval (prompt loose)    |66.3%                 |*72.8%*                   |**74.7%**                        |56.4%                    |53.0%
+|MMLU (0 shot, CoT)       |*73.2%*               |65.9%                     |**76.6%**                        |68.5%                    |30.7%<sup>5-shot</sup>
+|MMLU Pro (0 shot, CoT)   |*48.3%*               |44.3%                     |**56.3%**<sup>Unknown</sup>      |32.9%<sup>5-shot</sup>   |30.7%<sup>5-shot</sup>
+|PopQA (15 shot)          |18.9%                 |**29.3%**                 |18.1%                            |*20.2%*                  |xx.x%
+|TruthfulQA               |47.2%                 |46.8%                     |**63.1%**                        |*55.5%*                  |xx.x%
+## Credits
+Big thanks to Retis Labs for being providing my 8xH100 polycule used to train and test this model!
+Another big thanks to AllenAI for publishing the Tülu 3 data and model series (as well as the paper and details on training), as well as Alibaba for training the original Qwen 2.5 base model series!
+```
+@article{lambert2024tulu3,
+  title = {Tülu 3: Pushing Frontiers in Open Language Model Post-Training},
+  author = {
+    Nathan Lambert and
+    Jacob Morrison and
+    Valentina Pyatkin and
+    Shengyi Huang and
+    Hamish Ivison and
+    Faeze Brahman and
+    Lester James V. Miranda and
+    Alisa Liu and
+    Nouha Dziri and
+    Shane Lyu and
+    Yuling Gu and
+    Saumya Malik and
+    Victoria Graf and
+    Jena D. Hwang and
+    Jiangjiang Yang and
+    Ronan Le Bras and
+    Oyvind Tafjord and
+    Chris Wilhelm and
+    Luca Soldaini and
+    Noah A. Smith and
+    Yizhong Wang and
+    Pradeep Dasigi and
+    Hannaneh Hajishirzi
+  },
+  year = {2024},
+  email = {tulu@allenai.org}
+}
+```
+## Training procedure
+[<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 3.5e-06
+- train_batch_size: 8
+- eval_batch_size: 8
+- seed: 42
+- distributed_type: multi-GPU
+- num_devices: 8
+- gradient_accumulation_steps: 2
+- total_train_batch_size: 128
+- total_eval_batch_size: 64
+- optimizer: Use paged_ademamix_8bit and the args are:
+No additional optimizer arguments
+- lr_scheduler_type: cosine
+- lr_scheduler_warmup_steps: 370
+- num_epochs: 1
+### Framework versions
+- Transformers 4.46.3
+- Pytorch 2.5.1+cu124
+- Datasets 3.1.0
+- Tokenizers 0.20.3
+### Configuration
+<details><summary>See axolotl config</summary>
+axolotl version: `0.5.2`
+```yaml
+base_model: Qwen/Qwen2.5-7B
+plugins:
+  - axolotl.integrations.liger.LigerPlugin
+liger_rope: true
+liger_rms_norm: true
+liger_glu_activation: true
+liger_fused_linear_cross_entropy: true
+strict: false
+chat_template: chatml
+datasets:
+  - path: allenai/tulu-3-sft-mixture
+    type: chat_template
+    split: train
+    field_messages: messages
+dataset_prepared_path: last_run_prepared
+#val_set_size: 0.02
+output_dir: ./ckpts
+sequence_len: 8192
+#sample_packing: true
+pad_to_sequence_len: true
+wandb_project: qwen-2.5-7b-sft
+wandb_entity:
+wandb_watch:
+wandb_name:
+wandb_log_model:
+gradient_accumulation_steps: 2
+micro_batch_size: 8
+num_epochs: 1
+optimizer: paged_ademamix_8bit
+lr_scheduler: cosine
+learning_rate: 3.5e-6
+train_on_inputs: false
+group_by_length: false
+bf16: auto
+fp16:
+tf32: false
+gradient_checkpointing: true
+gradient_checkpointing_kwargs:
+  use_reentrant: false
+early_stopping_patience:
+resume_from_checkpoint:
+logging_steps: 1
+xformers_attention:
+flash_attention: true
+deepspeed: deepspeed_configs/zero3_bf16.json
+warmup_steps: 370
+#evals_per_epoch: 4
+eval_table_size:
+saves_per_epoch: 2
+debug:
+weight_decay: 0.0
+```
+</details><br>