Image-Text-to-Text
Transformers
Safetensors
English
Korean
qwen3_5_moe
vision-language-model
medical-imaging
brain-ct
stroke
region-classification
lora
conversational
Instructions to use JLKGroup/JOOMED with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use JLKGroup/JOOMED with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="JLKGroup/JOOMED") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("JLKGroup/JOOMED") model = AutoModelForMultimodalLM.from_pretrained("JLKGroup/JOOMED") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use JLKGroup/JOOMED with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "JLKGroup/JOOMED" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "JLKGroup/JOOMED", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/JLKGroup/JOOMED
- SGLang
How to use JLKGroup/JOOMED with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "JLKGroup/JOOMED" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "JLKGroup/JOOMED", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "JLKGroup/JOOMED" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "JLKGroup/JOOMED", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use JLKGroup/JOOMED with Docker Model Runner:
docker model run hf.co/JLKGroup/JOOMED
File size: 3,994 Bytes
5389cef 5957262 5389cef 5957262 5389cef 5957262 5389cef 5957262 5389cef 5957262 5389cef 5957262 5389cef 5957262 5389cef 5957262 5389cef 5957262 5389cef 5957262 5389cef 5957262 1c90b5f 5957262 5389cef 5957262 5389cef 5957262 5389cef 5957262 5389cef 5957262 5389cef 5957262 5389cef 5957262 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 | ---
license: apache-2.0
base_model: Qwen/Qwen3.6-35B-A3B
pipeline_tag: image-text-to-text
library_name: transformers
tags:
- vision-language-model
- medical-imaging
- brain-ct
- stroke
- region-classification
- lora
language:
- en
- ko
---
<div align="center">
# JOOMED ยท Brain-CT Lesion Region Classifier
**Qwen3.6-35B-A3B**(MoE Vision-Language Model)์ LoRA๋ก ํ์ธํ๋ํ ๋์กธ์ค ํนํ ์์ญ ๋ถ๋ฅ ๋ชจ๋ธ
[](https://huggingface.co/Qwen/Qwen3.6-35B-A3B)
[](./LICENSE)
[](#)
</div>
---
CT summary ์์์์ ๋ณ๋ณ์ด ์์นํ **ํด๋ถํ์ ์์ญ(anatomical)** ๊ณผ **ํ๊ด ์ง๋ฐฐ ์์ญ(vascular)** ์
์ข์ฐ(L/R) ๊ตฌ๋ถ๊ณผ ํจ๊ป ๋ถ๋ฅํ์ฌ **๊ตฌ์กฐํ JSON** ์ผ๋ก ์ถ๋ ฅํฉ๋๋ค.
```json
{"anatomical_regions": ["basal_ganglia_thalamus_right"],
"vascular_territories": ["MCA_right", "PCA_right"]}
```
## ๋ชจ๋ธ ๊ฐ์
| | |
|---|---|
| **Base model** | `Qwen/Qwen3.6-35B-A3B` โ MoE VLM, 35B total / ~3B active (`qwen3_5_moe`) |
| **Adaptation** | LoRA (r=8, ฮฑ=16, attention `q/k/v/o_proj`) โ ๋ฒ ์ด์ค์ **๋ณํฉ๋ ํ๋ชจ๋ธ** |
| **Input โ Output** | PNG (CT summary) โ ์์ญ๋ถ๋ฅ JSON |
| **Label space** | anatomical 23๊ทธ๋ฃน ยท vascular 14๊ทธ๋ฃน (coarse, ์ข์ฐ ์ ์ง) |
| **Project** | RQT-25-090047 โ ๋ค์ค ๋ชจ๋ฌ AI ๋์กธ์ค ์์์ง์ LLM (ใ์ ์ด์์ผ์ด) |
## ์ฑ๋ฅ
์์ญ ์ถ์ถ ์ ํ๋ (set ๊ธฐ๋ฐ per-sample F1, macro ํ๊ท ):
| ํ๊ฐ์
| n | Anatomical F1 | Vascular F1 | ํ๊ท |
|---|---:|---:|---:|---:|
| ์ ์ฒด test | 12,140 | **0.741** | **0.802** | **0.771** |
๋ฌดํ์ธํ๋ Base(0.29 / 0.30) ๋๋น **2.5๋ฐฐ ์ด์** ํฅ์. ๊ฐ์ ์ ํต์ฌ์ recall ์์น
(anatomical +0.17, vascular +0.13)์ผ๋ก, ๋๋ฝ ๋ณ๋ณ์ด ํฌ๊ฒ ๊ฐ์ํ์ต๋๋ค.
<details>
<summary>๋ณด์กฐ ํ
์คํธ ์งํ (์์ญ ๋ผ๋ฒจ์ ํ
์คํธ๋ก ์ง๋ ฌํํด ์ธก์ ยท ํ๋
๋ฌธ ํ์ง ์๋)</summary>
| BLEU-1 | METEOR | BERTScore F1 | G-Eval |
|---:|---:|---:|---:|
| 0.798 | 0.794 | 0.952 | 3.68 |
> ์ถ๋ ฅ์ด ์งง์ ๋ผ๋ฒจ ๋ฌธ์ฅ์ด๋ผ ์์ ํ
์คํธ ํ๋
๋ฌธ ์งํ์ ์ธก์ ๋์์ด ๋ค๋ฆ
๋๋ค. ๋น๊ต ์ ์ฃผ์.
</details>
## ์ฌ์ฉ๋ฒ
```python
from transformers import AutoModelForImageTextToText, AutoProcessor
from PIL import Image
model_id = "JLKGroup/JOOMED"
model = AutoModelForImageTextToText.from_pretrained(model_id, torch_dtype="bfloat16", device_map="auto")
processor = AutoProcessor.from_pretrained(model_id)
img = Image.open("ct_summary.png").convert("RGB")
prompt = "์ฃผ์ด์ง CT summary ์ด๋ฏธ์ง์์ ๋ณ๋ณ์ด ์ํ๋ anatomical region๊ณผ vascular territory๋ฅผ JSON์ผ๋ก ๋ตํ๋ผ."
messages = [{"role": "user", "content": [{"type": "image", "image": img}, {"type": "text", "text": prompt}]}]
text = processor.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True, enable_thinking=False # ํ์ต ํ
ํ๋ฆฟ๊ณผ ์ผ์น
)
inputs = processor(text=[text], images=[img], return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=256, do_sample=False)
print(processor.tokenizer.decode(out[0, inputs["input_ids"].shape[1]:], skip_special_tokens=True))
```
> **Tip** โ ์ถ๋ก ์ ๋ฐ๋์ `enable_thinking=False` ๋ก ๋์ด์ผ ํ์ต ์ ํ
ํ๋ฆฟ๊ณผ ์ผ์นํฉ๋๋ค.
## ๋ผ๋ฒจ ์ฒด๊ณ
| ์ถ | ๊ทธ๋ฃน |
|---|---|
| Anatomical (รL/R) | frontal ยท parietal ยท temporal ยท occipital ยท insula ยท limbic ยท basal_ganglia_thalamus ยท cerebellum ยท brainstem ยท ventricle ยท white_matter_other |
| Vascular (รL/R) | ACA ยท MCA ยท PCA ยท basilar ยท cerebellar ยท anterior_choroidal ยท lateral_ventricle |
## ํ๊ณ ๋ฐ ์ฃผ์
- **์๋ฃ ์ฐ๊ตฌ์ฉ ๋ชจ๋ธ**์
๋๋ค. ์์ ์์ฌ๊ฒฐ์ ์ ๋จ๋
๊ทผ๊ฑฐ๋ก ์ฌ์ฉํ์ง ๋ง์ญ์์ค.
## ๋ผ์ด์ ์ค
๋ฒ ์ด์ค ๋ชจ๋ธ `Qwen3.6-35B-A3B`์ Apache-2.0๋ฅผ ๋ฐ๋ฆ
๋๋ค.
|