Instructions to use Elixpo/LlamaMedicine with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Elixpo/LlamaMedicine with Transformers:

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("Elixpo/LlamaMedicine", dtype="auto")

llama-cpp-python

How to use Elixpo/LlamaMedicine with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="Elixpo/LlamaMedicine",
	filename="unsloth.Q8_0.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use Elixpo/LlamaMedicine with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Elixpo/LlamaMedicine:Q8_0
# Run inference directly in the terminal:
llama-cli -hf Elixpo/LlamaMedicine:Q8_0

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Elixpo/LlamaMedicine:Q8_0
# Run inference directly in the terminal:
llama-cli -hf Elixpo/LlamaMedicine:Q8_0

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Elixpo/LlamaMedicine:Q8_0
# Run inference directly in the terminal:
./llama-cli -hf Elixpo/LlamaMedicine:Q8_0

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Elixpo/LlamaMedicine:Q8_0
# Run inference directly in the terminal:
./build/bin/llama-cli -hf Elixpo/LlamaMedicine:Q8_0

Use Docker

docker model run hf.co/Elixpo/LlamaMedicine:Q8_0

LM Studio
Jan
Ollama
How to use Elixpo/LlamaMedicine with Ollama:
```
ollama run hf.co/Elixpo/LlamaMedicine:Q8_0
```

Unsloth Studio new

How to use Elixpo/LlamaMedicine with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Elixpo/LlamaMedicine to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Elixpo/LlamaMedicine to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Elixpo/LlamaMedicine to start chatting

Pi new

How to use Elixpo/LlamaMedicine with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf Elixpo/LlamaMedicine:Q8_0

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "Elixpo/LlamaMedicine:Q8_0"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use Elixpo/LlamaMedicine with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf Elixpo/LlamaMedicine:Q8_0

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default Elixpo/LlamaMedicine:Q8_0

Run Hermes

hermes

Docker Model Runner
How to use Elixpo/LlamaMedicine with Docker Model Runner:
```
docker model run hf.co/Elixpo/LlamaMedicine:Q8_0
```

Lemonade

How to use Elixpo/LlamaMedicine with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull Elixpo/LlamaMedicine:Q8_0

Run and chat with the model

lemonade run user.LlamaMedicine-Q8_0

List all available models

lemonade list

Uploaded model

Developed by: Elixpo
License: apache-2.0
Finetuned from model : unsloth/llama-3.2-1b-instruct-bnb-4bit

This llama model was trained 2x faster with Unsloth and Huggingface's TRL library.

MediTalk is a medical assistant model designed to assist users by providing general health-related information based on research and expert knowledge. Trained using LLAMA 3.1, this model responds to queries with empathy, clarity, and professionalism, while also ensuring sensitive or inappropriate questions are addressed with polite negations. The model has been fine-tuned using the lavita/ChatDoctor-HealthCareMagic-100k dataset from Hugging Face, making it suitable for medical-related works.

The model is available for use on Ollama and Hugging Face, and can be interacted with similarly to GPT models.

Features

General Medical Information: MediTalk provides clear and concise responses to common health-related queries.
Polite Negation for Inappropriate Questions: If a user asks an awkward or inappropriate question, MediTalk responds with a polite negation like "I'm afraid I can't answer that. Please ask something else related to health."
Fine-Tuned for Medical Content: The model is fine-tuned using the lavita/ChatDoctor-HealthCareMagic-100k dataset for more relevant medical responses.
Maximum Tokens per Response: The model provides responses with a maximum of 512 tokens.

Setup and Installation

Prerequisites

Ollama: Ensure you have Ollama installed on your system.
Download: Converse with the model by running the following command:

ollama run Elixpo/LlamaMedicine

To run the code on kaggle

# run before testing this 
# pip install unsloth bitsandbytes transformers


from unsloth import FastLanguageModel
from transformers import AutoTokenizer
import torch

# Load the quantized model from Hugging Face
model_name = "Elixpo/llamaMED"
model, _ = FastLanguageModel.from_pretrained(model_name)

# Prepare the model for inference
model = FastLanguageModel.for_inference(model)

# Determine the device (GPU or CPU)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Move the model to the appropriate device
model = model.to(device)

# Load the tokenizer from Hugging Face
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Test the model
input_text = "What are the symptoms of diabetes?"
inputs = tokenizer(input_text, return_tensors="pt")

# Move the input tensors to the same device as the model
inputs = {key: value.to(device) for key, value in inputs.items()}

# Generate output using the model
outputs = model.generate(inputs['input_ids'], max_length=100)

# Decode the output
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Do select T4 X2 GPUs for faster output

Scientific Parameters:

Parameters: 760M
Layers: 16
Size: 4.9GB
Precision: 4bits
Train Precision: bf16
Chat Template: Llama 3.1
Mother Model: unsloth/llama-3.2-1b-instruct-bnb-4bit
Epoches: 60
GPU: T4 x2
System Requirements: GPU > 4GB | CUDA Pipelined
Learning Rate: 2e-4
warmup_steps: 5
Dataset Format: Shared_GPT
Trainer: SFT_Trainer
Primary Dataset (1/5): Elixpo/llamaMediTalk

Key Updates:

Training with LLAMA 3.1: Mentioned that the model is trained using LLAMA 3.1.
Fine-tuned with Dataset: Added information about the fine-tuning using the lavita/ChatDoctor-HealthCareMagic-100k dataset from Hugging Face.
Model Availability: Clearly stated the model's availability on Ollama and Hugging Face.
Response Token Limit: Included the maximum token limit (512) for responses.
Usage Example: Added a command to converse with the model and example questions.

This should provide a comprehensive overview of your model and its capabilities!

Model Availability

Ollama Facility: The model is available for use on Ollama. You can access and run the model at https://ollama.com/Elixpo/LlamaMedicine.
Hugging Face: The model is also available for download or direct use on Hugging Face at Elixpo/LlamaMedicine.

Customization

System Instructions: Modify the system instructions in the Modelfile to adjust the assistant’s behavior.
Fine-Tuning: The model has been fine-tuned using the lavita/ChatDoctor-HealthCareMagic-100k dataset but can be further fine-tuned with other datasets to specialize in specific medical fields.

Acknowledgments

LLAMA 3.1: Powered by LLAMA models.
Ollama: Used for model deployment and management.
Hugging Face: Fine-tuned using lavita/ChatDoctor-HealthCareMagic-100k dataset.
Medical Databases: The model provides general knowledge, but it does not replace professional medical advice.

Downloads last month: 30

GGUF

Model size

1B params

Architecture

llama

Hardware compatibility

8-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support