Instructions to use exafluence/EXF-Medistral-Nemo-12B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use exafluence/EXF-Medistral-Nemo-12B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="exafluence/EXF-Medistral-Nemo-12B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("exafluence/EXF-Medistral-Nemo-12B")
model = AutoModelForCausalLM.from_pretrained("exafluence/EXF-Medistral-Nemo-12B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use exafluence/EXF-Medistral-Nemo-12B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "exafluence/EXF-Medistral-Nemo-12B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "exafluence/EXF-Medistral-Nemo-12B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/exafluence/EXF-Medistral-Nemo-12B

SGLang

How to use exafluence/EXF-Medistral-Nemo-12B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "exafluence/EXF-Medistral-Nemo-12B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "exafluence/EXF-Medistral-Nemo-12B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "exafluence/EXF-Medistral-Nemo-12B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "exafluence/EXF-Medistral-Nemo-12B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Unsloth Studio new

How to use exafluence/EXF-Medistral-Nemo-12B with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for exafluence/EXF-Medistral-Nemo-12B to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for exafluence/EXF-Medistral-Nemo-12B to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for exafluence/EXF-Medistral-Nemo-12B to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="exafluence/EXF-Medistral-Nemo-12B",
    max_seq_length=2048,
)

Docker Model Runner
How to use exafluence/EXF-Medistral-Nemo-12B with Docker Model Runner:
```
docker model run hf.co/exafluence/EXF-Medistral-Nemo-12B
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

EXF-Medistral-Nemo-12B

Model Description

EXF-Medistral-Nemo-12B is a fine-tuned version of the Mistral-Nemo-12B model, optimized for tasks in the medical domain. It has been trained on the Open-Nexus-MedQA dataset, which integrates a wide range of medical knowledge from public datasets like ChatDoctor, icliniq, and others, to enhance the model’s ability to answer medical questions accurately and reliably. This model is designed to assist in clinical decision support, medical coding, and patient care by generating responses based on comprehensive medical knowledge.

Model Architecture

Base Model: Mistral-Nemo-12B
Parameters: 12 billion
Fine-tuning Dataset: Open-Nexus-MedQA
Task: Medical question-answering (QA), medical coding, and healthcare information retrieval.

Training Data

The model was fine-tuned on the Open-Nexus-MedQA dataset, which aggregates data from multiple medical QA sources such as:

ChatDoctor
icliniq.com
HealthCareMagic
CareQA
MedInstruct

The dataset contains medical queries ranging from simple conditions to complex diagnoses, accompanied by accurate, domain-specific responses, making it a robust training source for real-world medical applications.

Intended Use

EXF-Medistral-Nemo-12B is ideal for:

Medical Question-Answering: It can be used for generating responses to patient queries or supporting healthcare professionals with clinical information.
Medical Coding: The model supports tasks related to CMS, OASIS, ICD-10, and other coding systems.
Clinical Decision Support: Assisting doctors and healthcare providers by offering evidence-based suggestions or answers.
Patient Care Tools: Powering medical chatbots or virtual assistants for patients seeking health information.

Performance

The model has been fine-tuned for precision in the medical domain, demonstrating high accuracy in understanding and generating responses to complex medical queries. It excels in:

Medical terminology comprehension
Providing accurate ICD-10 and CMS codes
Generating medically relevant and safe answers

Limitations

Not a Diagnostic Tool: This model is not intended to replace medical professionals or provide definitive medical diagnoses. Always consult with a licensed healthcare provider for medical advice.
Training Data Bias: The dataset is based on publicly available medical QA data, which might not cover all edge cases or international healthcare systems.

How to Use

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("exafluence/EXF-Medistral-Nemo-12B")
model = AutoModelForCausalLM.from_pretrained("exafluence/EXF-Medistral-Nemo-12B")

input_text = "What are the symptoms of type 2 diabetes?"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs)
print(tokenizer.decode(outputs, skip_special_tokens=True))

License

This model is provided under a proprietary license. Usage is restricted to non-commercial purposes unless explicit permission is granted.

Citation If you use this model, please cite:

@inproceedings{exafluence2024EXFMedistralNemo12B,
  title={EXF-Medistral-Nemo-12B: A Fine-Tuned Medical Language Model for Healthcare Applications},
  author={Exafluence Inc.},
  year={2024},
  url={https://huggingface.co/exafluence/EXF-Medistral-Nemo-12B}
  doi={https://doi.org/10.57967/hf/3284}
}