pfnet
/

Preferred-MedRECT-32B

Text Generation

text-generation-inference

Model card Files Files and versions

Preferred-MedRECT-32B / README.md

jiwasawa's picture

update arXiv link in README

b992f1d verified 4 months ago

|

history blame contribute delete

2.8 kB

	---
	base_model: unsloth/Qwen3-32B
	language:
	- en
	- ja
	library_name: transformers
	pipeline_tag: text-generation
	license: apache-2.0
	---

	# Preferred-MedRECT-32B

	## Model Description

	Preferred-MedRECT-32B is a finetuned model based on [Qwen/Qwen3-32B](https://huggingface.co/Qwen/Qwen3-32B), which has been optimized for medical error detection and correction tasks using LoRA (Low-Rank Adaptation).

	The model is trained on bilingual (Japanese/English) medical reasoning data with explicit reasoning processes, enabling it to detect errors, extract erroneous sentences, and provide corrections in clinical texts.

	The model is released under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0).

	## Model Performance

	The table below shows cross-lingual performance comparison on MedRECT-ja (Japanese) and MedRECT-en (English) benchmarks. MedRECT evaluates models on three subtasks: error detection (F1), sentence extraction (Acc.), and error correction (EC Avg. Score).

	\| Model \| MedRECT-ja Error Det. F1 \| MedRECT-ja Sent. Ext. Acc. \| MedRECT-ja EC Avg. Score \| MedRECT-en Error Det. F1 \| MedRECT-en Sent. Ext. Acc. \| MedRECT-en EC Avg. Score \|
	\|:------\|:------------------------:\|:--------------------------:\|:------------------------:\|:------------------------:\|:--------------------------:\|:------------------------:\|
	\| Preferred-MedRECT-32B \| 0.743 \| 81.5% \| 0.627 \| 0.728 \| 90.9% \| 0.718 \|
	\| Qwen3-32B (think) \| 0.723 \| 72.5% \| 0.549 \| 0.740 \| 83.5% \| 0.550 \|
	\| gpt-oss-120b (medium) \| 0.721 \| 77.4% \| 0.581 \| 0.777 \| 88.1% \| 0.630 \|
	\| gpt-oss-20b (medium) \| 0.718 \| 64.3% \| 0.543 \| 0.762 \| 87.2% \| 0.590 \|
	\| GPT-4.1 \| 0.658 \| 52.6% \| 0.655 \| 0.789 \| 72.8% \| 0.710 \|

	## Training Details

	- Base Model: unsloth/Qwen3-32B
	- Fine-tuning Method: LoRA (Low-Rank Adaptation)
	- Training Data:
	- Japanese: 5,538 samples from JMLE (2018-2023)
	- English: 2,439 samples from MEDEC MS Subset
	- All samples include reasoning processes generated by DeepSeek-R1-0528

	## Limitations

	The model was developed for research purposes and is not intended for clinical diagnosis.
	It is the users' responsibility to ensure compliance with applicable rules and regulations.

	## Contributors

	Preferred Networks, Inc.
	- Naoto Iwase
	- Hiroki Okuyama
	- Junichiro Iwasawa

	## Publications

	Detailed evaluation results will be given in the [research paper](https://arxiv.org/abs/2511.00421).

	## Citations

	```
	@article{medrect2025,
	title={MedRECT: A Medical Reasoning Benchmark for Error Correction in Clinical Texts},
	author={Iwase, Naoto and Okuyama, Hiroki and Iwasawa, Junichiro},
	journal={arXiv preprint arXiv:2511.00421},
	year={2025}
	}
	```

	## License

	[Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0)