Instructions to use Abiral129/Pulse3b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Abiral129/Pulse3b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Abiral129/Pulse3b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("Abiral129/Pulse3b", dtype="auto")

llama-cpp-python

How to use Abiral129/Pulse3b with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="Abiral129/Pulse3b",
	filename="gguf/pulse-q4_k_m.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use Abiral129/Pulse3b with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Abiral129/Pulse3b:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf Abiral129/Pulse3b:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Abiral129/Pulse3b:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf Abiral129/Pulse3b:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Abiral129/Pulse3b:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf Abiral129/Pulse3b:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Abiral129/Pulse3b:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf Abiral129/Pulse3b:Q4_K_M

Use Docker

docker model run hf.co/Abiral129/Pulse3b:Q4_K_M

LM Studio
Jan

vLLM

How to use Abiral129/Pulse3b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Abiral129/Pulse3b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Abiral129/Pulse3b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Abiral129/Pulse3b:Q4_K_M

SGLang

How to use Abiral129/Pulse3b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Abiral129/Pulse3b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Abiral129/Pulse3b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Abiral129/Pulse3b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Abiral129/Pulse3b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use Abiral129/Pulse3b with Ollama:
```
ollama run hf.co/Abiral129/Pulse3b:Q4_K_M
```

Unsloth Studio

How to use Abiral129/Pulse3b with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Abiral129/Pulse3b to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Abiral129/Pulse3b to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Abiral129/Pulse3b to start chatting

How to use Abiral129/Pulse3b with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf Abiral129/Pulse3b:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "Abiral129/Pulse3b:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use Abiral129/Pulse3b with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf Abiral129/Pulse3b:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default Abiral129/Pulse3b:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use Abiral129/Pulse3b with Docker Model Runner:
```
docker model run hf.co/Abiral129/Pulse3b:Q4_K_M
```

Lemonade

How to use Abiral129/Pulse3b with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull Abiral129/Pulse3b:Q4_K_M

Run and chat with the model

lemonade run user.Pulse3b-Q4_K_M

List all available models

lemonade list

Pulse3b / README.md

Abiral129

Initial upload: BF16 safetensors + Q4_K_M GGUF + Core ML mlpackage

e0ceb81 verified 11 days ago

preview code

raw

history blame contribute delete

5.21 kB

metadata

license: apache-2.0
base_model: Qwen/Qwen2.5-3B
language:
  - en
  - es
library_name: transformers
pipeline_tag: text-generation
tags:
  - wellness
  - health-coaching
  - sleep
  - fitness
  - mental-health
  - qwen2
  - gguf
  - coreml
  - on-device

Pulse 3B

Pulse is a personal wellness AI coach fine-tuned from Qwen2.5-3B. It is designed to help users with sleep, stress, fitness, nutrition, and mental wellbeing in a warm, motivating, science-backed tone.

Pulse is built into the Pulse app by Raxtech, and was created by Abiral Dahal (Head of Mobile & AI, Raxtech — Bilbao, Spain).

Highlights

3.1B parameters, Qwen2 architecture, 32K context.
Ships in three formats so you can run it anywhere:
- final/ — BF16 safetensors for HuggingFace transformers.
- gguf/pulse-q4_k_m.gguf — 4-bit quantized GGUF for llama.cpp / Ollama / LM Studio (~1.8 GB, runs on CPU).
- coreml/pulse.mlpackage — INT4 Core ML package for on-device inference on Apple Silicon (iOS / macOS).

Quick start

Ollama (easiest)

# Download the GGUF
huggingface-cli download Abiral129/Pulse3b gguf/pulse-q4_k_m.gguf --local-dir .

# Minimal Modelfile
cat > Modelfile <<'EOF'
FROM ./gguf/pulse-q4_k_m.gguf
TEMPLATE """<|im_start|>system
{{ .System }}<|im_end|>
<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
"""
PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER repeat_penalty 1.1
PARAMETER num_ctx 2048
PARAMETER stop "<|im_end|>"
PARAMETER stop "<|im_start|>"
EOF

ollama create pulse -f Modelfile
ollama run pulse "I've been sleeping 5 hours for a week, what do I do?"

Transformers (BF16)

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

tok = AutoTokenizer.from_pretrained("Abiral129/Pulse3b", subfolder="final")
model = AutoModelForCausalLM.from_pretrained(
    "Abiral129/Pulse3b",
    subfolder="final",
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

messages = [
    {"role": "system", "content": "You are Pulse, a personal wellness coach."},
    {"role": "user", "content": "My resting heart rate jumped from 62 to 88. What's going on?"},
]
ids = tok.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
out = model.generate(ids, max_new_tokens=300, temperature=0.7, top_p=0.9)
print(tok.decode(out[0][ids.shape[1]:], skip_special_tokens=True))

llama.cpp

./llama-cli -m gguf/pulse-q4_k_m.gguf \
  -p "You are Pulse, a wellness coach." \
  -cnv --temp 0.7 --top-p 0.9 --repeat-penalty 1.1 -c 2048

Core ML (Apple Silicon)

import coremltools as ct
from transformers import AutoTokenizer
import numpy as np

tok = AutoTokenizer.from_pretrained("Abiral129/Pulse3b", subfolder="final")
mlmodel = ct.models.MLModel("coreml/pulse.mlpackage")
ids = tok("Hello Pulse", return_tensors="np").input_ids.astype(np.int32)
print(mlmodel.predict({"input_ids": ids}))

For full token-by-token generation on iOS / macOS, integrate the .mlpackage with your app and implement a generation loop with greedy / sampling on top of the logits.

Recommended system prompt

You are Pulse, a personal wellness AI coach. You are warm, motivating, empathetic, and science-backed. You help users with sleep, stress, fitness, nutrition, and mental wellbeing. Never say "As an AI" — you are Pulse, a wellness coach. Be concise, practical, and encouraging.

Sampling defaults

Param	Value
`temperature`	0.7
`top_p`	0.9
`repeat_penalty`	1.1
`num_ctx`	2048
stop	`<

Intended use

Conversational wellness coaching: sleep hygiene, stress management, exercise habits, nutrition guidance, mental wellbeing check-ins.
On-device deployment in mobile apps where privacy and offline use matter.

Out of scope

Pulse is not a medical device, diagnostic tool, or substitute for a licensed healthcare professional.
Do not use Pulse for emergency situations, medication decisions, or diagnosing physical or mental health conditions.
For any persistent or severe symptoms, consult a qualified clinician.

Limitations

3B-parameter model — reasoning depth and factual recall are limited compared to larger models.
Quantized variants (Q4_K_M, INT4 Core ML) trade some quality for size and speed.
Training data is biased toward English and Spanish wellness content; performance in other languages may be weaker.
Can produce confident but incorrect statements ("hallucinations") — always verify health-related claims.

License

Apache 2.0, inherited from the base model Qwen/Qwen2.5-3B.

Citation

@misc{pulse3b2026,
  title  = {Pulse 3B: A wellness coaching language model},
  author = {Abiral Dahal and Raxtech},
  year   = {2026},
  url    = {https://huggingface.co/Abiral129/Pulse3b}
}

Acknowledgements

Built on top of Qwen2.5-3B by the Qwen team at Alibaba. GGUF conversion via llama.cpp. Core ML conversion via coremltools.