Instructions to use apol/med-llm-triage-es with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use apol/med-llm-triage-es with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="apol/med-llm-triage-es",
	filename="med-llm-es-triage-FP16.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use apol/med-llm-triage-es with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf apol/med-llm-triage-es:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf apol/med-llm-triage-es:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf apol/med-llm-triage-es:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf apol/med-llm-triage-es:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf apol/med-llm-triage-es:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf apol/med-llm-triage-es:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf apol/med-llm-triage-es:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf apol/med-llm-triage-es:Q4_K_M

Use Docker

docker model run hf.co/apol/med-llm-triage-es:Q4_K_M

LM Studio
Jan
Ollama
How to use apol/med-llm-triage-es with Ollama:
```
ollama run hf.co/apol/med-llm-triage-es:Q4_K_M
```

Unsloth Studio

How to use apol/med-llm-triage-es with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for apol/med-llm-triage-es to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for apol/med-llm-triage-es to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for apol/med-llm-triage-es to start chatting

How to use apol/med-llm-triage-es with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf apol/med-llm-triage-es:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "apol/med-llm-triage-es:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use apol/med-llm-triage-es with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf apol/med-llm-triage-es:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default apol/med-llm-triage-es:Q4_K_M

Run Hermes

hermes

Atomic Chat new

OpenClaw new

How to use apol/med-llm-triage-es with OpenClaw:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf apol/med-llm-triage-es:Q4_K_M

Configure OpenClaw

# Install OpenClaw:
npm install -g openclaw@latest
# Register the local server and set it as the default model:
openclaw onboard --non-interactive --mode local \
  --auth-choice custom-api-key \
  --custom-base-url http://127.0.0.1:8080/v1 \
  --custom-model-id "apol/med-llm-triage-es:Q4_K_M" \
  --custom-provider-id llama-cpp \
  --custom-compatibility openai \
  --custom-text-input \
  --accept-risk \
  --skip-health

Run OpenClaw

openclaw agent --local --agent main --message "Hello from Hugging Face"

Docker Model Runner
How to use apol/med-llm-triage-es with Docker Model Runner:
```
docker model run hf.co/apol/med-llm-triage-es:Q4_K_M
```

Lemonade

How to use apol/med-llm-triage-es with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull apol/med-llm-triage-es:Q4_K_M

Run and chat with the model

lemonade run user.med-llm-triage-es-Q4_K_M

List all available models

lemonade list

YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

med-llm-es: Spanish Medical Triage LLM

End-to-end pipeline to build a fine-tuned Spanish medical triage model for offline/edge deployment.

⚠️ MEDICAL DISCLAIMER - IMPORTANT

THIS PROJECT IS FOR EDUCATIONAL AND RESEARCH PURPOSES ONLY.

❌ This is NOT a medical device
❌ This does NOT provide medical advice
❌ Do NOT use for actual patient triage
❌ The model may produce incorrect, incomplete, or harmful outputs

Required Actions:

Always recommend professional medical consultation
In real emergencies, call emergency services (112 in Europe, 911 in US)
This model should only be used for learning about LLM fine-tuning
Researchers: validate thoroughly before any downstream applications
Deployers: assume full liability for any use cases

Project Status

Phase	Status	Details
Data Preparation	✅ Complete	5000+ Spanish medical prompts
Continued Pre-Training (CPT)	✅ Complete	Medical domain adaptation
Supervised Fine-Tuning (SFT)	✅ Complete	Triage instruction tuning
Knowledge Distillation	✅ Complete	MiniMax-M2.5 teacher outputs
GRPO Training	✅ Complete	Reward-based optimization
DPO Training	✅ Complete	Preference alignment
GGUF Quantization	✅ Complete	Multiple quantization levels

What This Project Provides

Working Models

Model File	Size	Use Case
med-llm-es-triage-balanced-Q5_K_M.gguf	~800MB	Recommended - Best quality/size balance
med-llm-es-triage-balanced-Q4_K_M.gguf	~700MB	Mobile devices
med-llm-es-triage-balanced-Q2_K.gguf	~460MB	Low-resource devices

Training Datasets (Available)

Distilled data: 5000+ examples from MiniMax-M2.5
Preference data: 10K+ DPO training pairs
Balanced data: Enhanced training sets

Technical Achievements

Full RLHF pipeline (CPT → SFT → GRPO → DPO)
Offline-capable quantized models
Spanish medical language specialization
Manchester Triage System (MTS) implementation

Use Cases (Educational)

This project demonstrates how to:

Build domain-specific LLMs - Medical Spanish fine-tuning
Implement knowledge distillation - Using powerful teacher models
Apply RLHF techniques - GRPO and DPO for alignment
Optimize for edge deployment - GGUF quantization
Create safety-aligned models - Medical disclaimers and urgency levels

Pipeline

┌─────────────────────────────────────────────────────────────────────────┐
│  1. Data Prep      →  2. CPT       →  3. SFT       →  4. Distill    │
│  OpenMed + MTS      Spanish Med     Triage SFT      MiniMax-M2.5      │
│                                                                         │
│  5. GRPO        →  6. DPO       →  7. Quantize  →  8. Deploy       │
│  Rewards          Preference      GGUF Q5        Offline App         │
└─────────────────────────────────────────────────────────────────────────┘

Directory Structure

med-llm-es/
├── configs/
│   └── config.py              # Configuration settings
├── data/
│   ├── raw/                   # Downloaded OpenMed datasets
│   ├── translated/            # Spanish translations
│   ├── triage/                # Generated triage prompts
│   ├── distilled/             # Teacher-generated data (~10MB)
│   └── preference/            # DPO preference pairs (~10MB)
├── models/
│   ├── cpt-spanish-medical-v1/    # CPT model
│   ├── sft-spanish-triage-v1/     # SFT model
│   ├── grpo-spanish-triage-v1/    # GRPO model
│   ├── dpo-spanish-triage-v1/     # DPO model
│   └── gguf/                      # Quantized models (~2GB total)
├── scripts/
│   ├── 01_download_opendmed.py      # Download datasets
│   ├── 02_translate_to_spanish.py   # Translate to Spanish
│   ├── 03_generate_triage_data.py   # Create triage prompts
│   ├── 04_cpt_spanish_medical.py    # Continued Pre-Training
│   ├── 05_sft_triage.py             # Supervised Fine-Tuning
│   ├── 06_distillation_generate.py  # Knowledge Distillation
│   ├── 07_create_preference_data.py # Create DPO dataset
│   ├── 08_grpo_triage.py            # GRPO training
│   ├── 09_dpo_triage.py             # DPO training
│   ├── 10_quantize_gguf.py          # Quantization
│   └── 11_monitor_grpo.py           # Passive GRPO run monitor
├── checkpoints/               # Training checkpoints
├── reports/                   # Documentation
├── DEPLOYMENT_GUIDES.md      # Edge deployment instructions
└── README.md

Quick Start

Prerequisites

Google Colab Pro (for A100 GPU access) or local GPU (16GB+ VRAM)
MiniMax API Key (for distillation)
Google Drive (for storage)

Execution Order

Data Preparation

python scripts/01_download_opendmed.py
python scripts/02_translate_to_spanish.py
python scripts/03_generate_triage_data.py

Training (on Colab)

python scripts/04_cpt_spanish_medical.py  # CPT
python scripts/05_sft_triage.py           # SFT
python scripts/06_distillation_generate.py # Distillation
python scripts/07_create_preference_data.py # Preference data
python scripts/08_grpo_triage.py          # GRPO
python scripts/09_dpo_triage.py           # DPO

Quantization
```
python scripts/10_quantize_gguf.py
```

Triage System

Uses Manchester Triage System (MTS):

Level	Color	Meaning	Response Time
ROJO	Red	Emergency	Immediate
NARANJA	Orange	Very Urgent	10 min
AMARILLO	Yellow	Urgent	60 min
VERDE	Green	Less Urgent	120 min
AZUL	Blue	Non-urgent	240 min

Configuration

Edit configs/config.py:

BASE_MODEL = "LiquidAI/LFM2.5-1.2B-Base"
TEACHER_MODEL = "MiniMaxAI/MiniMax-M2.5"
MINIMAX_API_KEY = "your-api-key-here"

# Paths (use your drive)
DATA_DIR = "E:/med-llm-es/data"
MODELS_DIR = "E:/med-llm-es/models"

Deployment

See DEPLOYMENT_GUIDES.md for:

Android (Termux)
iOS (MLX)
Desktop
Raspberry Pi

Cost Estimate

Item	Cost
Colab Pro (80 hours)	~$100-150
MiniMax API (distillation)	~$50-100
Total	~$150-250

Limitations & Risks

Model may hallucinate - Incorrect medical information
Limited training data - Not comprehensive medical coverage
No clinical validation - Never tested in real settings
Language bias - Trained on specific Spanish variants
Quantization losses - Accuracy trade-offs from compression

License

This project is for educational/research purposes only.

Base models: LFM2.5 (Liquid AI), MiniMax-M2.5 (MiniMax)
Training: Apache 2.0 / TRL

Acknowledgments

OpenMed - Medical datasets
Liquid AI - LFM2.5 models
MiniMax - Teacher model
TRL - Training library
Unsloth - Efficient fine-tuning

Downloads last month: 75

GGUF

Model size

1B params

Architecture

lfm2

Hardware compatibility

2-bit

4-bit

5-bit

View +3 variants

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

apol
/

med-llm-triage-es

med-llm-es: Spanish Medical Triage LLM

⚠️ MEDICAL DISCLAIMER - IMPORTANT

Project Status

What This Project Provides

Working Models

Training Datasets (Available)

Technical Achievements

Use Cases (Educational)

Pipeline

Directory Structure

Quick Start

Prerequisites

Execution Order

Triage System

Configuration

Deployment

Cost Estimate

Limitations & Risks

License

Acknowledgments

Space using apol/med-llm-triage-es 1