Image-Text-to-Text
Transformers
PyTorch
English
llava
text-generation
dental
medical
multimodal
vision-language
clip
sam
lora
orthopantomography
opg
x-ray
diagnosis
Instructions to use jeffrey423/ToothXpert with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use jeffrey423/ToothXpert with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="jeffrey423/ToothXpert")# Load model directly from transformers import AutoProcessor, AutoModelForCausalLM processor = AutoProcessor.from_pretrained("jeffrey423/ToothXpert") model = AutoModelForCausalLM.from_pretrained("jeffrey423/ToothXpert") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use jeffrey423/ToothXpert with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "jeffrey423/ToothXpert" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "jeffrey423/ToothXpert", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/jeffrey423/ToothXpert
- SGLang
How to use jeffrey423/ToothXpert with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "jeffrey423/ToothXpert" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "jeffrey423/ToothXpert", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "jeffrey423/ToothXpert" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "jeffrey423/ToothXpert", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use jeffrey423/ToothXpert with Docker Model Runner:
docker model run hf.co/jeffrey423/ToothXpert
| license: apache-2.0 | |
| datasets: | |
| - jeffrey423/ToothXpert.MM-OPG-Annotations | |
| language: | |
| - en | |
| tags: | |
| - dental | |
| - medical | |
| - multimodal | |
| - vision-language | |
| - llava | |
| - clip | |
| - sam | |
| - lora | |
| - orthopantomography | |
| - opg | |
| - x-ray | |
| - diagnosis | |
| base_model: liuhaotian/llava-v1.5-7b | |
| pipeline_tag: image-text-to-text | |
| library_name: transformers | |
| # ToothXpert Model | |
| ToothXpert is a multimodal AI model for comprehensive dental X-ray (OPG) analysis, combining vision and language understanding for automatic diagnosis and condition detection. | |
| ## Quick Start | |
| ### Installation | |
| ```bash | |
| pip install torch torchvision transformers | |
| pip install opencv-python einops peft medpy | |
| pip install "numpy<2.0" # Important for compatibility | |
| ``` | |
| ### Download Model | |
| ```python | |
| from huggingface_hub import snapshot_download | |
| model_path = snapshot_download( | |
| repo_id='jeffrey423/ToothXpert', | |
| local_dir='./ToothXpert_pretrained' | |
| ) | |
| ``` | |
| ### Simple Inference | |
| ```python | |
| import cv2 | |
| import torch | |
| import torch.nn.functional as F | |
| from transformers import AutoTokenizer, CLIPImageProcessor | |
| from model.ToothXpert_MOE import ToothXpertForCausalLMMOE | |
| from model.llava import conversation as conversation_lib | |
| from model.llava.mm_utils import tokenizer_image_token | |
| from model.segment_anything.utils.transforms import ResizeLongestSide | |
| from utils.utils import (DEFAULT_IM_END_TOKEN, DEFAULT_IM_START_TOKEN, | |
| DEFAULT_IMAGE_TOKEN, IMAGE_TOKEN_INDEX) | |
| # Preprocessing function | |
| def preprocess(x, pixel_mean=torch.Tensor([123.675, 116.28, 103.53]).view(-1, 1, 1), | |
| pixel_std=torch.Tensor([58.395, 57.12, 57.375]).view(-1, 1, 1), img_size=1024): | |
| x = (x - pixel_mean) / pixel_std | |
| h, w = x.shape[-2:] | |
| padh = img_size - h | |
| padw = img_size - w | |
| x = F.pad(x, (0, padw, 0, padh)) | |
| return x | |
| # Load model | |
| model_path = "./ToothXpert_pretrained" | |
| device = "cuda:0" | |
| tokenizer = AutoTokenizer.from_pretrained( | |
| model_path, | |
| model_max_length=512, | |
| padding_side="right", | |
| use_fast=False, | |
| ) | |
| tokenizer.pad_token = tokenizer.unk_token | |
| tokenizer.add_tokens("[SEG]") | |
| seg_token_idx = tokenizer("[SEG]", add_special_tokens=False).input_ids[0] | |
| tokenizer.add_tokens([DEFAULT_IM_START_TOKEN, DEFAULT_IM_END_TOKEN], special_tokens=True) | |
| moe_lora_args = { | |
| "lora_r": 8, | |
| "lora_alpha": 16, | |
| "lora_dropout": 0.05, | |
| "lora_target_modules": "q_proj,v_proj", | |
| "moe_lora": False, | |
| "expert_num": 3, | |
| "guide": True, | |
| "guide_mode": "smmulsm", | |
| "vocab_size": len(tokenizer), | |
| } | |
| model = ToothXpertForCausalLMMOE.from_pretrained( | |
| model_path, | |
| low_cpu_mem_usage=True, | |
| vision_tower="openai/clip-vit-large-patch14", | |
| seg_token_idx=seg_token_idx, | |
| torch_dtype=torch.bfloat16, | |
| train_mask_decoder=True, | |
| out_dim=256, | |
| moe_lora_args=moe_lora_args, | |
| ) | |
| model.config.eos_token_id = tokenizer.eos_token_id | |
| model.config.bos_token_id = tokenizer.bos_token_id | |
| model.config.pad_token_id = tokenizer.pad_token_id | |
| model.get_model().initialize_vision_modules(model.get_model().config) | |
| vision_tower = model.get_model().get_vision_tower() | |
| vision_tower.to(dtype=torch.bfloat16, device=device) | |
| model = model.bfloat16().to(device) | |
| model.eval() | |
| # Load and process image | |
| image_path = "your_dental_xray.png" | |
| image_np = cv2.imread(image_path) | |
| image_np = cv2.cvtColor(image_np, cv2.COLOR_BGR2RGB) | |
| original_size_list = [image_np.shape[:2]] | |
| clip_image_processor = CLIPImageProcessor.from_pretrained("openai/clip-vit-large-patch14") | |
| transform = ResizeLongestSide(1024) | |
| image_clip = ( | |
| clip_image_processor.preprocess(image_np, return_tensors="pt")["pixel_values"][0] | |
| .unsqueeze(0).to(device).bfloat16() | |
| ) | |
| image = transform.apply_image(image_np) | |
| resize_list = [image.shape[:2]] | |
| image = ( | |
| preprocess(torch.from_numpy(image).permute(2, 0, 1).contiguous()) | |
| .unsqueeze(0).to(device).bfloat16() | |
| ) | |
| # Prepare prompt | |
| question = "Can you describe the image for me?" | |
| conv = conversation_lib.conv_templates["llava_v1"].copy() | |
| conv.messages = [] | |
| prompt = DEFAULT_IMAGE_TOKEN + "\n" + question | |
| prompt = prompt.replace(DEFAULT_IMAGE_TOKEN, | |
| DEFAULT_IM_START_TOKEN + DEFAULT_IMAGE_TOKEN + DEFAULT_IM_END_TOKEN) | |
| conv.append_message(conv.roles[0], prompt) | |
| conv.append_message(conv.roles[1], "") | |
| prompt = conv.get_prompt() | |
| input_ids = tokenizer_image_token(prompt, tokenizer, return_tensors="pt") | |
| input_ids = input_ids.unsqueeze(0).to(device) | |
| # Run inference | |
| with torch.no_grad(): | |
| output_ids, pred_masks = model.evaluate( | |
| image_clip, | |
| image, | |
| input_ids, | |
| resize_list, | |
| original_size_list, | |
| max_new_tokens=512, | |
| tokenizer=tokenizer, | |
| ) | |
| output_ids = output_ids[0][output_ids[0] != IMAGE_TOKEN_INDEX] | |
| text_output = tokenizer.decode(output_ids, skip_special_tokens=False) | |
| text_output = text_output.split('ASSISTANT:')[-1].replace('</s>', '').strip() | |
| print(f"Question: {question}") | |
| print(f"Answer: {text_output}") | |
| ``` | |
| ## Example Questions | |
| **General Description:** | |
| - "Can you describe the image for me?" | |
| **Specific Conditions:** | |
| - "Is there any amalgam restorations in the image?" | |
| - "Any R/L suggestive of caries present?" | |
| - "Is there any dental implant present?" | |
| - "Is there any root canal treated teeth?" | |
| ## Supported Conditions | |
| ToothXpert can detect 11 dental conditions: | |
| 1. Amalgam restorations | |
| 2. Caries (R/L) | |
| 3. Crestal bone loss (mandible) | |
| 4. Crestal bone loss (maxillary) | |
| 5. Implant-supported bridge | |
| 6. Dental implant | |
| 7. Metallic/non-metallic post | |
| 8. Non-metallic restorations | |
| 9. Periapical radiolucency | |
| 10. Root canal treated teeth | |
| 11. Tooth-supported bridge | |
| ## Requirements | |
| - **GPU**: NVIDIA GPU with at least 16GB VRAM | |
| - **Python**: 3.11 (recommended) | |
| - **CUDA**: 12.1 or compatible | |
| ## Model Details | |
| - **Base Model**: LLaVA-1.5-7B | |
| - **Vision Encoder**: CLIP ViT-L/14 | |
| - **Segmentation**: SAM (Segment Anything Model) ViT-H | |
| - **Adaptation**: Guided Mixture of LoRA Experts (G-MoLE) | |
| - **Model Size**: ~15GB | |
| ## Citation | |
| If you use ToothXpert in your research, please cite: | |
| ```bibtex | |
| @article{liu2026toothxpert, | |
| title={Developing and Evaluating Multimodal Large Language Model for Orthopantomography Analysis to Support Clinical Dentistry}, | |
| author={Liu, Xinyu and Hung, Kuo Feng and Yu, Weihao and Ng, Ray Anthony W T and Li, Wuyang and Niu, Tianye and Chen, Hui and Yuan, Yixuan}, | |
| journal={Cell Reports Medicine}, | |
| year={2026} | |
| } | |
| ``` | |
| ## Links | |
| - **GitHub Repository**: [CUHK-AIM-Group/ToothXpert](https://github.com/CUHK-AIM-Group/ToothXpert) | |
| - **Dataset**: [jeffrey423/ToothXpert.MM-OPG-Annotations](https://huggingface.co/datasets/jeffrey423/ToothXpert.MM-OPG-Annotations) | |
| ## License | |
| Apache License 2.0 | |