Instructions to use intrect/VELA with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use intrect/VELA with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="intrect/VELA")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("intrect/VELA")
model = AutoModelForCausalLM.from_pretrained("intrect/VELA")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

llama-cpp-python

How to use intrect/VELA with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="intrect/VELA",
	filename="vela-dpo-v6-q4_k_m.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

MLX

How to use intrect/VELA with MLX:

# Make sure mlx-lm is installed
# pip install --upgrade mlx-lm

# Generate text with mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("intrect/VELA")

prompt = "Write a story about Einstein"
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True
)

text = generate(model, tokenizer, prompt=prompt, verbose=True)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use intrect/VELA with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf intrect/VELA:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf intrect/VELA:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf intrect/VELA:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf intrect/VELA:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf intrect/VELA:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf intrect/VELA:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf intrect/VELA:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf intrect/VELA:Q4_K_M

Use Docker

docker model run hf.co/intrect/VELA:Q4_K_M

LM Studio
Jan

vLLM

How to use intrect/VELA with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "intrect/VELA"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "intrect/VELA",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/intrect/VELA:Q4_K_M

SGLang

How to use intrect/VELA with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "intrect/VELA" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "intrect/VELA",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "intrect/VELA" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "intrect/VELA",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use intrect/VELA with Ollama:
```
ollama run hf.co/intrect/VELA:Q4_K_M
```

Unsloth Studio new

How to use intrect/VELA with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for intrect/VELA to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for intrect/VELA to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for intrect/VELA to start chatting

Pi new

How to use intrect/VELA with Pi:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "intrect/VELA"

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "mlx-lm": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "intrect/VELA"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use intrect/VELA with Hermes Agent:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "intrect/VELA"

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default intrect/VELA

Run Hermes

hermes

MLX LM

How to use intrect/VELA with MLX LM:

Generate or start a chat session

# Install MLX LM
uv tool install mlx-lm
# Interactive chat REPL
mlx_lm.chat --model "intrect/VELA"

Run an OpenAI-compatible server

# Install MLX LM
uv tool install mlx-lm
# Start the server
mlx_lm.server --model "intrect/VELA"
# Calling the OpenAI-compatible server with curl
curl -X POST "http://localhost:8000/v1/chat/completions" \
   -H "Content-Type: application/json" \
   --data '{
     "model": "intrect/VELA",
     "messages": [
       {"role": "user", "content": "Hello"}
     ]
   }'

Docker Model Runner
How to use intrect/VELA with Docker Model Runner:
```
docker model run hf.co/intrect/VELA:Q4_K_M
```

Lemonade

How to use intrect/VELA with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull intrect/VELA:Q4_K_M

Run and chat with the model

lemonade run user.VELA-Q4_K_M

List all available models

lemonade list

VELA

Commit History

docs: 벤치마크 섹션 한국어로 변환

1bc7a0f
verified

intrect commited on Apr 1

data: add raw benchmark results JSON (KMMLU + HAE-RAE, 3-model comparison)

de3e104
verified

intrect commited on Apr 1

docs: add Korean LLM benchmark results (KMMLU + HAE-RAE, 3-model comparison)

4043357
verified

intrect commited on Apr 1

fix: replace Qwen default system prompt with VELA identity in chat templates

8fa0fb0
verified

intrect commited on Mar 31

docs: add llama-cpp-python, vllm, ollama tags for library discovery

6eb484c
verified

intrect commited on Mar 31

docs: add MLX 4-bit format, update quant links in model card

8bdb2c7
verified

intrect commited on Mar 31

feat: add MLX 4-bit quantized model (Apple Silicon optimized)

d7f1985
verified

intrect commited on Mar 31

docs: update README to v1.3 (DPO v6)

a8a5cbd
verified

intrect commited on Mar 31

feat: update to DPO v6 merged model (BF16 safetensors)

ffbdde5
verified

intrect commited on Mar 31

feat: add DPO v6 GGUF Q4_K_M model

2616a76
verified

intrect commited on Mar 31

chore: remove v4 GGUF (replacing with v6)

4917e1e
verified

intrect commited on Mar 31

Update README.md

07d2ac1
verified

intrect commited on Feb 19

feat: vela-v5-merged 업로드 (SFT merged, rep_penalty=1.05)

a8cb4d0
verified

intrect commited on Feb 18

fix: generation_config rep_penalty=1.05, top_k=20, top_p=0.8 (vela-v5-merged)

a0a3973
verified

intrect commited on Feb 18

docs: add recommended inference settings and backend configuration guide

3e8a713
verified

intrect commited on Feb 17

fix: match llama-cpp-python defaults (top_k=40, top_p=0.95, rep_penalty=1.0)

dea4e49
verified

intrect commited on Feb 17

fix: rollback generation_config to safe Qwen2.5 defaults (rep_penalty 1.3→1.1)

bfc3043
verified

intrect commited on Feb 16

fix: update generation params — repetition_penalty 1.05→1.3, top_k 20→50, top_p 0.8→0.92

d4a1479
verified

intrect commited on Feb 16

docs: add GitHub framework link and badges

67d9f79
verified

intrect commited on Feb 16

Update README.md

3fbe015
verified

intrect commited on Feb 16

docs: add real output example (RT + Quick Assessment + Analysis Report)

c66f3a2
verified

intrect commited on Feb 16

docs: update model card with v3 training data (58K SFT, 26K DPO), MLX benchmark, Markdown RT format

7f641b5
verified

intrect commited on Feb 15

feat: update to DPO v4 merged model (SFT + DPO v4 language leak fix)

c34c4f4
verified

intrect commited on Feb 15

Delete vela-q8_0.gguf with huggingface_hub

b4ff857
verified

intrect commited on Feb 15

Delete model.safetensors with huggingface_hub

843923f
verified

intrect commited on Feb 15

feat: replace SFT-only GGUF with DPO v4 merged Q4_K_M

88a0458
verified

intrect commited on Feb 15

Delete vela-q4_k_m.gguf with huggingface_hub

a3cb0d9
verified

intrect commited on Feb 15

fix: change tokenizer_class to Qwen2TokenizerFast for vLLM compatibility

ecf8e6b
verified

intrect commited on Feb 13

docs: update training data distribution with accurate numbers (SFT 36,713 + DPO 24,779)

d35ad96
verified

intrect commited on Feb 12

feat: add Q8_0 GGUF quantized model (7.6GB)

ecb3e2c
verified

intrect commited on Feb 12

feat: add Q4_K_M GGUF quantized model (4.4GB)

7b906a5
verified

intrect commited on Feb 12

docs: update model card with GGUF formats, benchmarks, usage examples

2448e8a
verified

intrect commited on Feb 12

Fix tokenizer_config.json - remove extra_special_tokens list causing vLLM error

f650ef7
verified

intrect commited on Jan 28

Fix config.json for vLLM compatibility (remove layer_types, fix rope_parameters)

7dc1232
verified

intrect commited on Jan 28

Fix year to 2026

086591b
verified

intrect commited on Jan 28

Update model card for DPO v4

0330d0c
verified

intrect commited on Jan 28

Update to DPO v4 merged model

c5f7d77
verified

intrect commited on Jan 28

Upload folder using huggingface_hub

ee82134
verified

intrect commited on Jan 28

initial commit

a713b02
verified

intrect commited on Jan 28

Commit History

docs: 벤치마크 섹션 한국어로 변환 1bc7a0f verified

data: add raw benchmark results JSON (KMMLU + HAE-RAE, 3-model comparison) de3e104 verified

docs: add Korean LLM benchmark results (KMMLU + HAE-RAE, 3-model comparison) 4043357 verified

fix: replace Qwen default system prompt with VELA identity in chat templates 8fa0fb0 verified

docs: add llama-cpp-python, vllm, ollama tags for library discovery 6eb484c verified

docs: add MLX 4-bit format, update quant links in model card 8bdb2c7 verified

feat: add MLX 4-bit quantized model (Apple Silicon optimized) d7f1985 verified

docs: update README to v1.3 (DPO v6) a8a5cbd verified

feat: update to DPO v6 merged model (BF16 safetensors) ffbdde5 verified

feat: add DPO v6 GGUF Q4_K_M model 2616a76 verified

chore: remove v4 GGUF (replacing with v6) 4917e1e verified

Update README.md 07d2ac1 verified

feat: vela-v5-merged 업로드 (SFT merged, rep_penalty=1.05) a8cb4d0 verified

fix: generation_config rep_penalty=1.05, top_k=20, top_p=0.8 (vela-v5-merged) a0a3973 verified

docs: add recommended inference settings and backend configuration guide 3e8a713 verified

fix: match llama-cpp-python defaults (top_k=40, top_p=0.95, rep_penalty=1.0) dea4e49 verified

fix: rollback generation_config to safe Qwen2.5 defaults (rep_penalty 1.3→1.1) bfc3043 verified

fix: update generation params — repetition_penalty 1.05→1.3, top_k 20→50, top_p 0.8→0.92 d4a1479 verified

docs: add GitHub framework link and badges 67d9f79 verified

Update README.md 3fbe015 verified

docs: add real output example (RT + Quick Assessment + Analysis Report) c66f3a2 verified

docs: update model card with v3 training data (58K SFT, 26K DPO), MLX benchmark, Markdown RT format 7f641b5 verified

feat: update to DPO v4 merged model (SFT + DPO v4 language leak fix) c34c4f4 verified

Delete vela-q8_0.gguf with huggingface_hub b4ff857 verified

Delete model.safetensors with huggingface_hub 843923f verified

feat: replace SFT-only GGUF with DPO v4 merged Q4_K_M 88a0458 verified

Delete vela-q4_k_m.gguf with huggingface_hub a3cb0d9 verified

fix: change tokenizer_class to Qwen2TokenizerFast for vLLM compatibility ecf8e6b verified

docs: update training data distribution with accurate numbers (SFT 36,713 + DPO 24,779) d35ad96 verified

feat: add Q8_0 GGUF quantized model (7.6GB) ecb3e2c verified

feat: add Q4_K_M GGUF quantized model (4.4GB) 7b906a5 verified

docs: update model card with GGUF formats, benchmarks, usage examples 2448e8a verified

Fix tokenizer_config.json - remove extra_special_tokens list causing vLLM error f650ef7 verified

Fix config.json for vLLM compatibility (remove layer_types, fix rope_parameters) 7dc1232 verified

Fix year to 2026 086591b verified

Update model card for DPO v4 0330d0c verified

Update to DPO v4 merged model c5f7d77 verified

Upload folder using huggingface_hub ee82134 verified

initial commit a713b02 verified

docs: 벤치마크 섹션 한국어로 변환

1bc7a0f
verified

data: add raw benchmark results JSON (KMMLU + HAE-RAE, 3-model comparison)

de3e104
verified

docs: add Korean LLM benchmark results (KMMLU + HAE-RAE, 3-model comparison)

4043357
verified

fix: replace Qwen default system prompt with VELA identity in chat templates

8fa0fb0
verified

docs: add llama-cpp-python, vllm, ollama tags for library discovery

6eb484c
verified

docs: add MLX 4-bit format, update quant links in model card

8bdb2c7
verified

feat: add MLX 4-bit quantized model (Apple Silicon optimized)

d7f1985
verified

docs: update README to v1.3 (DPO v6)

a8a5cbd
verified

feat: update to DPO v6 merged model (BF16 safetensors)

ffbdde5
verified

feat: add DPO v6 GGUF Q4_K_M model

2616a76
verified

chore: remove v4 GGUF (replacing with v6)

4917e1e
verified

Update README.md

07d2ac1
verified

feat: vela-v5-merged 업로드 (SFT merged, rep_penalty=1.05)

a8cb4d0
verified

fix: generation_config rep_penalty=1.05, top_k=20, top_p=0.8 (vela-v5-merged)

a0a3973
verified

docs: add recommended inference settings and backend configuration guide

3e8a713
verified

fix: match llama-cpp-python defaults (top_k=40, top_p=0.95, rep_penalty=1.0)

dea4e49
verified

fix: rollback generation_config to safe Qwen2.5 defaults (rep_penalty 1.3→1.1)

bfc3043
verified

fix: update generation params — repetition_penalty 1.05→1.3, top_k 20→50, top_p 0.8→0.92

d4a1479
verified

docs: add GitHub framework link and badges

67d9f79
verified

Update README.md

3fbe015
verified

docs: add real output example (RT + Quick Assessment + Analysis Report)

c66f3a2
verified

docs: update model card with v3 training data (58K SFT, 26K DPO), MLX benchmark, Markdown RT format

7f641b5
verified

feat: update to DPO v4 merged model (SFT + DPO v4 language leak fix)

c34c4f4
verified

Delete vela-q8_0.gguf with huggingface_hub

b4ff857
verified

Delete model.safetensors with huggingface_hub

843923f
verified

feat: replace SFT-only GGUF with DPO v4 merged Q4_K_M

88a0458
verified

Delete vela-q4_k_m.gguf with huggingface_hub

a3cb0d9
verified

fix: change tokenizer_class to Qwen2TokenizerFast for vLLM compatibility

ecf8e6b
verified

docs: update training data distribution with accurate numbers (SFT 36,713 + DPO 24,779)

d35ad96
verified

feat: add Q8_0 GGUF quantized model (7.6GB)

ecb3e2c
verified

feat: add Q4_K_M GGUF quantized model (4.4GB)

7b906a5
verified

docs: update model card with GGUF formats, benchmarks, usage examples

2448e8a
verified

Fix tokenizer_config.json - remove extra_special_tokens list causing vLLM error

f650ef7
verified

Fix config.json for vLLM compatibility (remove layer_types, fix rope_parameters)

7dc1232
verified

Fix year to 2026

086591b
verified

Update model card for DPO v4

0330d0c
verified

Update to DPO v4 merged model

c5f7d77
verified

Upload folder using huggingface_hub

ee82134
verified

initial commit

a713b02
verified