Instructions to use Abiral129/Pulse3b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Abiral129/Pulse3b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Abiral129/Pulse3b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("Abiral129/Pulse3b", dtype="auto")

llama-cpp-python

How to use Abiral129/Pulse3b with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="Abiral129/Pulse3b",
	filename="gguf/pulse-q4_k_m.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use Abiral129/Pulse3b with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Abiral129/Pulse3b:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf Abiral129/Pulse3b:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Abiral129/Pulse3b:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf Abiral129/Pulse3b:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Abiral129/Pulse3b:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf Abiral129/Pulse3b:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Abiral129/Pulse3b:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf Abiral129/Pulse3b:Q4_K_M

Use Docker

docker model run hf.co/Abiral129/Pulse3b:Q4_K_M

LM Studio
Jan

vLLM

How to use Abiral129/Pulse3b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Abiral129/Pulse3b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Abiral129/Pulse3b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Abiral129/Pulse3b:Q4_K_M

SGLang

How to use Abiral129/Pulse3b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Abiral129/Pulse3b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Abiral129/Pulse3b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Abiral129/Pulse3b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Abiral129/Pulse3b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use Abiral129/Pulse3b with Ollama:
```
ollama run hf.co/Abiral129/Pulse3b:Q4_K_M
```

Unsloth Studio

How to use Abiral129/Pulse3b with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Abiral129/Pulse3b to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Abiral129/Pulse3b to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Abiral129/Pulse3b to start chatting

How to use Abiral129/Pulse3b with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf Abiral129/Pulse3b:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "Abiral129/Pulse3b:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use Abiral129/Pulse3b with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf Abiral129/Pulse3b:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default Abiral129/Pulse3b:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use Abiral129/Pulse3b with Docker Model Runner:
```
docker model run hf.co/Abiral129/Pulse3b:Q4_K_M
```

Lemonade

How to use Abiral129/Pulse3b with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull Abiral129/Pulse3b:Q4_K_M

Run and chat with the model

lemonade run user.Pulse3b-Q4_K_M

List all available models

lemonade list

Pulse3b / README.md

Abiral129

Initial upload: BF16 safetensors + Q4_K_M GGUF + Core ML mlpackage

e0ceb81 verified 11 days ago

preview code

raw

history blame contribute delete

5.21 kB

	---
	license: apache-2.0
	base_model: Qwen/Qwen2.5-3B
	language:
	- en
	- es
	library_name: transformers
	pipeline_tag: text-generation
	tags:
	- wellness
	- health-coaching
	- sleep
	- fitness
	- mental-health
	- qwen2
	- gguf
	- coreml
	- on-device
	---

	# Pulse 3B

	Pulse is a personal wellness AI coach fine-tuned from Qwen2.5-3B. It is designed to help users with sleep, stress, fitness, nutrition, and mental wellbeing in a warm, motivating, science-backed tone.

	Pulse is built into the [Pulse app](https://raxtech.io) by Raxtech, and was created by Abiral Dahal (Head of Mobile & AI, Raxtech — Bilbao, Spain).

	## Highlights

	- 3.1B parameters, Qwen2 architecture, 32K context.
	- Ships in three formats so you can run it anywhere:
	- `final/` — BF16 `safetensors` for HuggingFace `transformers`.
	- `gguf/pulse-q4_k_m.gguf` — 4-bit quantized GGUF for `llama.cpp` / Ollama / LM Studio (~1.8 GB, runs on CPU).
	- `coreml/pulse.mlpackage` — INT4 Core ML package for on-device inference on Apple Silicon (iOS / macOS).

	## Quick start

	### Ollama (easiest)

	```bash
	# Download the GGUF
	huggingface-cli download Abiral129/Pulse3b gguf/pulse-q4_k_m.gguf --local-dir .

	# Minimal Modelfile
	cat > Modelfile <<'EOF'
	FROM ./gguf/pulse-q4_k_m.gguf
	TEMPLATE """<\|im_start\|>system
	{{ .System }}<\|im_end\|>
	<\|im_start\|>user
	{{ .Prompt }}<\|im_end\|>
	<\|im_start\|>assistant
	"""
	PARAMETER temperature 0.7
	PARAMETER top_p 0.9
	PARAMETER repeat_penalty 1.1
	PARAMETER num_ctx 2048
	PARAMETER stop "<\|im_end\|>"
	PARAMETER stop "<\|im_start\|>"
	EOF

	ollama create pulse -f Modelfile
	ollama run pulse "I've been sleeping 5 hours for a week, what do I do?"
	```

	### Transformers (BF16)

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	import torch

	tok = AutoTokenizer.from_pretrained("Abiral129/Pulse3b", subfolder="final")
	model = AutoModelForCausalLM.from_pretrained(
	"Abiral129/Pulse3b",
	subfolder="final",
	torch_dtype=torch.bfloat16,
	device_map="auto",
	)

	messages = [
	{"role": "system", "content": "You are Pulse, a personal wellness coach."},
	{"role": "user", "content": "My resting heart rate jumped from 62 to 88. What's going on?"},
	]
	ids = tok.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
	out = model.generate(ids, max_new_tokens=300, temperature=0.7, top_p=0.9)
	print(tok.decode(out[0][ids.shape[1]:], skip_special_tokens=True))
	```

	### llama.cpp

	```bash
	./llama-cli -m gguf/pulse-q4_k_m.gguf \
	-p "You are Pulse, a wellness coach." \
	-cnv --temp 0.7 --top-p 0.9 --repeat-penalty 1.1 -c 2048
	```

	### Core ML (Apple Silicon)

	```python
	import coremltools as ct
	from transformers import AutoTokenizer
	import numpy as np

	tok = AutoTokenizer.from_pretrained("Abiral129/Pulse3b", subfolder="final")
	mlmodel = ct.models.MLModel("coreml/pulse.mlpackage")
	ids = tok("Hello Pulse", return_tensors="np").input_ids.astype(np.int32)
	print(mlmodel.predict({"input_ids": ids}))
	```

	For full token-by-token generation on iOS / macOS, integrate the `.mlpackage` with your app and implement a generation loop with greedy / sampling on top of the logits.

	## Recommended system prompt

	```
	You are Pulse, a personal wellness AI coach. You are warm, motivating, empathetic, and science-backed. You help users with sleep, stress, fitness, nutrition, and mental wellbeing. Never say "As an AI" — you are Pulse, a wellness coach. Be concise, practical, and encouraging.
	```

	## Sampling defaults

	\| Param \| Value \|
	\|---\|---\|
	\| `temperature` \| 0.7 \|
	\| `top_p` \| 0.9 \|
	\| `repeat_penalty` \| 1.1 \|
	\| `num_ctx` \| 2048 \|
	\| stop \| `<\|im_end\|>`, `<\|im_start\|>` \|

	## Intended use

	- Conversational wellness coaching: sleep hygiene, stress management, exercise habits, nutrition guidance, mental wellbeing check-ins.
	- On-device deployment in mobile apps where privacy and offline use matter.

	## Out of scope

	- Pulse is not a medical device, diagnostic tool, or substitute for a licensed healthcare professional.
	- Do not use Pulse for emergency situations, medication decisions, or diagnosing physical or mental health conditions.
	- For any persistent or severe symptoms, consult a qualified clinician.

	## Limitations

	- 3B-parameter model — reasoning depth and factual recall are limited compared to larger models.
	- Quantized variants (Q4_K_M, INT4 Core ML) trade some quality for size and speed.
	- Training data is biased toward English and Spanish wellness content; performance in other languages may be weaker.
	- Can produce confident but incorrect statements ("hallucinations") — always verify health-related claims.

	## License

	Apache 2.0, inherited from the base model [Qwen/Qwen2.5-3B](https://huggingface.co/Qwen/Qwen2.5-3B).

	## Citation

	```bibtex
	@misc{pulse3b2026,
	title = {Pulse 3B: A wellness coaching language model},
	author = {Abiral Dahal and Raxtech},
	year = {2026},
	url = {https://huggingface.co/Abiral129/Pulse3b}
	}
	```

	## Acknowledgements

	Built on top of [Qwen2.5-3B](https://huggingface.co/Qwen/Qwen2.5-3B) by the Qwen team at Alibaba. GGUF conversion via [llama.cpp](https://github.com/ggerganov/llama.cpp). Core ML conversion via [coremltools](https://github.com/apple/coremltools).