|
|
--- |
|
|
base_model: unsloth/Qwen3-32B |
|
|
language: |
|
|
- en |
|
|
- ja |
|
|
library_name: transformers |
|
|
pipeline_tag: text-generation |
|
|
license: apache-2.0 |
|
|
--- |
|
|
|
|
|
# Preferred-MedRECT-32B |
|
|
|
|
|
## Model Description |
|
|
|
|
|
Preferred-MedRECT-32B is a finetuned model based on [Qwen/Qwen3-32B](https://huggingface.co/Qwen/Qwen3-32B), which has been optimized for medical error detection and correction tasks using LoRA (Low-Rank Adaptation). |
|
|
|
|
|
The model is trained on bilingual (Japanese/English) medical reasoning data with explicit reasoning processes, enabling it to detect errors, extract erroneous sentences, and provide corrections in clinical texts. |
|
|
|
|
|
The model is released under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0). |
|
|
|
|
|
## Model Performance |
|
|
|
|
|
The table below shows cross-lingual performance comparison on MedRECT-ja (Japanese) and MedRECT-en (English) benchmarks. MedRECT evaluates models on three subtasks: error detection (F1), sentence extraction (Acc.), and error correction (EC Avg. Score). |
|
|
|
|
|
| Model | MedRECT-ja Error Det. F1 | MedRECT-ja Sent. Ext. Acc. | MedRECT-ja EC Avg. Score | MedRECT-en Error Det. F1 | MedRECT-en Sent. Ext. Acc. | MedRECT-en EC Avg. Score | |
|
|
|:------|:------------------------:|:--------------------------:|:------------------------:|:------------------------:|:--------------------------:|:------------------------:| |
|
|
| Preferred-MedRECT-32B | **0.743** | **81.5%** | **0.627** | 0.728 | **90.9%** | **0.718** | |
|
|
| Qwen3-32B (think) | 0.723 | 72.5% | 0.549 | 0.740 | 83.5% | 0.550 | |
|
|
| gpt-oss-120b (medium) | 0.721 | 77.4% | 0.581 | 0.777 | 88.1% | 0.630 | |
|
|
| gpt-oss-20b (medium) | 0.718 | 64.3% | 0.543 | 0.762 | 87.2% | 0.590 | |
|
|
| GPT-4.1 | 0.658 | 52.6% | 0.655 | **0.789** | 72.8% | 0.710 | |
|
|
|
|
|
## Training Details |
|
|
|
|
|
- **Base Model**: unsloth/Qwen3-32B |
|
|
- **Fine-tuning Method**: LoRA (Low-Rank Adaptation) |
|
|
- **Training Data**: |
|
|
- Japanese: 5,538 samples from JMLE (2018-2023) |
|
|
- English: 2,439 samples from MEDEC MS Subset |
|
|
- All samples include reasoning processes generated by DeepSeek-R1-0528 |
|
|
|
|
|
## Limitations |
|
|
|
|
|
The model was developed for research purposes and is not intended for clinical diagnosis. |
|
|
It is the users' responsibility to ensure compliance with applicable rules and regulations. |
|
|
|
|
|
## Contributors |
|
|
|
|
|
Preferred Networks, Inc. |
|
|
- Naoto Iwase |
|
|
- Hiroki Okuyama |
|
|
- Junichiro Iwasawa |
|
|
|
|
|
## Publications |
|
|
|
|
|
Detailed evaluation results will be given in the [research paper](https://arxiv.org/abs/2511.00421). |
|
|
|
|
|
## Citations |
|
|
|
|
|
``` |
|
|
@article{medrect2025, |
|
|
title={MedRECT: A Medical Reasoning Benchmark for Error Correction in Clinical Texts}, |
|
|
author={Iwase, Naoto and Okuyama, Hiroki and Iwasawa, Junichiro}, |
|
|
journal={arXiv preprint arXiv:2511.00421}, |
|
|
year={2025} |
|
|
} |
|
|
``` |
|
|
|
|
|
## License |
|
|
|
|
|
[Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
|
|
|