Instructions to use BiMediX/BiMediX-Eng with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use BiMediX/BiMediX-Eng with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="BiMediX/BiMediX-Eng")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("BiMediX/BiMediX-Eng")
model = AutoModelForCausalLM.from_pretrained("BiMediX/BiMediX-Eng")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use BiMediX/BiMediX-Eng with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "BiMediX/BiMediX-Eng"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "BiMediX/BiMediX-Eng",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/BiMediX/BiMediX-Eng

SGLang

How to use BiMediX/BiMediX-Eng with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "BiMediX/BiMediX-Eng" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "BiMediX/BiMediX-Eng",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "BiMediX/BiMediX-Eng" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "BiMediX/BiMediX-Eng",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use BiMediX/BiMediX-Eng with Docker Model Runner:
```
docker model run hf.co/BiMediX/BiMediX-Eng
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Model Card for BiMediX-Bilingual

Model Details

Name: BiMediX
Version: 1.0
Type: Bilingual Medical Mixture of Experts Large Language Model (LLM)
Languages: English
Model Architecture: Mixtral-8x7B-Instruct-v0.1
Training Data: BiMed1.3M-English, a bilingual dataset with diverse medical interactions.

Intended Use

Primary Use: Medical interactions in both English and Arabic.
Capabilities: MCQA, closed QA and chats.

Getting Started

from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "BiMediX/BiMediX-Eng"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
text = "Hello BiMediX! I've been experiencing increased tiredness in the past week."
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=500)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training Procedure

Dataset: BiMed1.3M-English, million healthcare specialized tokens.
QLoRA Adaptation: Implements a low-rank adaptation technique, incorporating learnable low-rank adapter weights into the experts and the routing network. This results in training about 4% of the original parameters.
Training Resources: The model underwent training on approximately 288 million tokens from the BiMed1.3M-English corpus.

Model Performance

Benchmarks: Demonstrates superior performance compared to baseline models in medical benchmarks. This enhancement is attributed to advanced training techniques and a comprehensive dataset, ensuring the model's adeptness in handling complex medical queries and providing accurate information in the healthcare domain.

Model	CKG	CBio	CMed	MedGen	ProMed	Ana	MedMCQA	MedQA	PubmedQA	AVG
PMC-LLaMA-13B	63.0	59.7	52.6	70.0	64.3	61.5	50.5	47.2	75.6	60.5
Med42-70B	75.9	84.0	69.9	83.0	78.7	64.4	61.9	61.3	77.2	72.9
Clinical Camel-70B	69.8	79.2	67.0	69.0	71.3	62.2	47.0	53.4	74.3	65.9
Meditron-70B	72.3	82.5	62.8	77.8	77.9	62.7	65.1	60.7	80.0	71.3
BiMediX	78.9	86.1	68.2	85.0	80.5	74.1	62.7	62.8	80.2	75.4

Safety and Ethical Considerations

Potential issues: hallucinations, toxicity, stereotypes.
Usage: Research purposes only.

Accessibility

Availability: BiMediX GitHub Repository.
arxiv.org/abs/2402.13253

Authors

Sara Pieri, Sahal Shaji Mullappilly, Fahad Shahbaz Khan, Rao Muhammad Anwer Salman Khan, Timothy Baldwin, Hisham Cholakkal
Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI)

Downloads last month: 33

Model tree for BiMediX/BiMediX-Eng

Quantizations

1 model

Paper for BiMediX/BiMediX-Eng

BiMediX: Bilingual Medical Mixture of Experts LLM

Paper • 2402.13253 • Published Feb 20, 2024