add: funding source

d9d6f7a verified about 18 hours ago

10.5 kB

	---
	license: apache-2.0
	language:
	- en
	base_model:
	- trillionlabs/Gravity-16B-A3B-Base
	tags:
	- medical
	- clinical
	- mixture-of-experts
	- conversational
	- sft
	library_name: transformers
	pipeline_tag: text-generation
	---

	<p align="center">
	<img src="banner.png" alt="L1" style="width: 80%;">
	</p>

	# Learning Unit 1

	L1 (Learning Unit 1) is the first language model from [Lunit](https://www.lunit.io) and Lunit Consortium, purpose-built for the medical domain. Derived from [Gravity-16B-A3B-Base](https://huggingface.co/trillionlabs/Gravity-16B-A3B-Base), L1 is designed for clinical reasoning and decision support.

	### ✨ Key Highlights
	* 🩺 Medical-Domain Specialized: Developed specifically for clinical reasoning and medical decision support
	* ⚡ Efficient MoE: Only 3B parameters active per token out of 16.24B total — fast inference with high capacity
	* 💭 Thinking Model: Performs step-by-step reasoning in `<think>` tags before generating the final answer

	> Note: L1 reasons internally using `<think>...</think>` blocks before producing a response. This chain-of-thought process improves answer quality but consumes additional tokens. Set `max_tokens` accordingly (recommended: 2048+).

	### 📋 Model Specifications

	- Type: Causal Language Model
	- Base Model: [Gravity-16B-A3B-Base](https://huggingface.co/trillionlabs/Gravity-16B-A3B-Base) from Trillion Labs and Lunit Consortium
	- Architecture: GravityMoE (Sparse Mixture-of-Experts with MLA)
	- Total Parameters: 16.24B
	- Active Parameters: 3B
	- Number of Layers: 28
	- Attention Heads: 16
	- KV Heads: 16
	- Hidden Size: 2048
	- MoE Intermediate Size: 1408
	- Routed Experts: 64 (top-8 selection)
	- Shared Experts: 1
	- Context Length: 32,768 tokens
	- Vocabulary Size: 151,552
	- Tokenizer: GLM-4.5
	- Precision: bf16

	## 🚀 Quickstart

	### SGLang (Recommended)

	Install:
	```bash
	pip install "sglang[all] @ git+https://github.com/trillion-labs/sglang-gravity.git#subdirectory=python"
	```

	Launch server:
	```bash
	python -m sglang.launch_server \
	--model-path learning-unit/L1-16B-A3B \
	--port 9006 --host 0.0.0.0 \
	--tp 1 --dtype bfloat16 --trust-remote-code \
	--attention-backend triton \
	--moe-runner-backend triton
	```

	Query:
	```bash
	curl -X POST http://localhost:9006/v1/chat/completions \
	-H "Content-Type: application/json" \
	-d '{
	"model": "learning-unit/L1-16B-A3B",
	"messages": [
	{"role": "user", "content": "What are the diagnostic criteria for sepsis?"}
	],
	"max_tokens": 2048
	}'
	```

	### Transformers

	Install:
	```bash
	pip install "transformers>=5.0" torch
	```

	```python
	import torch
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_name = "learning-unit/L1-16B-A3B"

	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	torch_dtype=torch.bfloat16,
	device_map="auto",
	trust_remote_code=True,
	)
	tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)

	messages = [
	{"role": "user", "content": "What are the diagnostic criteria for sepsis?"}
	]
	text = tokenizer.apply_chat_template(
	messages,
	tokenize=False,
	add_generation_prompt=True,
	)
	model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

	generated_ids = model.generate(
	**model_inputs,
	max_new_tokens=2048,
	temperature=0.7,
	do_sample=True,
	)
	generated_ids = [
	output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
	]

	response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
	print(response)
	```

	## 💬 Examples

	L1 is specialized for the medical domain and covers a wide range of clinical scenarios. Below are representative examples from real-world clinical use cases.

	### Medical Q&A

	> A 45-year-old woman with lupus nephritis on mycophenolate and prednisone develops fever, dry cough, and bilateral ground-glass opacities on chest CT. Her CD4 count is 180. What is your differential diagnosis and recommended workup?

	### Patient Education

	> I have diabetes and use insulin daily. What is the proper way to store insulin at home?

	### Clinical Documentation

	> Please draft an overnight progress note. Patient labs: RBC 4.5, WBC 8. Vitals: HR 82, BP 118/76, RR 15, Temp 37.1. Nurse reports stable overnight. Plan: continue antibiotics, recheck labs in the morning.

	### Emergency Triage

	> 다음 응급실 환자에 대해 KTAS triage를 수행하고, 초기 진단 및 감별진단을 제시해주세요. 78세 여성 환자가 119 구급차로 응급실에 내원했습니다. 22시경 갑자기 좌측 안면이 처지고 말이 어눌해지는 증상이 발생했습니다. 두통을 호소하며, 고혈압 병력이 있습니다. 활력징후는 혈압 172/88, 심박수 92, 호흡수 14, 체온 36.8, 산소포화도 98%이고 의식은 명료합니다. 사지 위약감은 없습니다.

	### Adverse Drug Reaction (ADR) Causality Assessment

	> 다음 환자의 약물이상반응(ADR)에 대해 WHO-UMC 기준으로 인과관계를 평가해주세요. 80세 여성 환자가 기관지확장증으로 입원 중 moxifloxacin 400mg IV를 투여받았습니다. 투여 중 전신 피부 가려움이 새로 발생했고, 약물 중단 후 환자 본인도 가려움이 줄어드는 양상을 표현했으며 이후 회복되었습니다. 재투여는 시행하지 않았습니다. 기존 약물 알레르기력은 없고, 가려움을 유발할 만한 다른 병용약물이나 피부질환은 확인되지 않았습니다.

	## 📊 Benchmark

	All benchmarks were evaluated using [CoEval](https://github.com/lunit-io/CoEval), Lunit's open-source medical LLM evaluation framework. Evaluations use greedy decoding (temperature=0). To reproduce these results:

	```bash
	git clone https://github.com/lunit-io/CoEval.git
	cd CoEval
	```

	Refer to the [CoEval Quickstart](https://github.com/lunit-io/CoEval#quickstart) for setup and evaluation instructions.

	### MCQA Benchmarks

	\| Model \| [PubMedQA](https://huggingface.co/datasets/qiaojin/PubMedQA) \| [AttrBench](https://huggingface.co/datasets/osunlp/AttributionBench) \| [MedQA](https://huggingface.co/datasets/GBaker/MedQA-USMLE-4-options) \| [CareQA](https://huggingface.co/datasets/HPAI-BSC/CareQA) \| [HeadQA](https://huggingface.co/datasets/alesi12/head_qa_v2) \| [MedMCQA](https://huggingface.co/datasets/lighteval/med_mcqa) \| [MMLU-Pro (Health)](https://huggingface.co/datasets/TIGER-Lab/MMLU-Pro) \| [M-ARC](https://huggingface.co/datasets/mkieffer/M-ARC) \| [MetaMedQA](https://huggingface.co/datasets/maximegmd/MetaMedQA) \| [MedHallu](https://huggingface.co/datasets/UTAustin-AIHealth/MedHallu) \| [MedCalc](https://huggingface.co/datasets/ncbi/MedCalc-Bench) \| [MedBullets](https://huggingface.co/datasets/mkieffer/Medbullets) 4-opt \| [MedBullets](https://huggingface.co/datasets/mkieffer/Medbullets) 5-opt \| [MedXpertQA](https://huggingface.co/datasets/TsinghuaC3I/MedXpertQA)-R \| [MedXpertQA](https://huggingface.co/datasets/TsinghuaC3I/MedXpertQA)-U \| W.Avg \|
	\|:---\|:---:\|:---:\|:---:\|:---:\|:---:\|:---:\|:---:\|:---:\|:---:\|:---:\|:---:\|:---:\|:---:\|:---:\|:---:\|:---:\|
	\| GPT-OSS-120B \| 78.00 \| 76.10 \| 91.10 \| 91.00 \| 88.40 \| 74.80 \| 74.60 \| 40.00 \| 76.50 \| 83.50 \| 30.30 \| 84.70 \| 82.10 \| 35.60 \| 32.90 \| 79.43 \|
	\| GPT-OSS-20B \| 75.80 \| 74.80 \| 83.90 \| 84.80 \| 83.30 \| 65.40 \| 70.50 \| 31.00 \| 70.10 \| 81.30 \| 29.20 \| 73.40 \| 70.50 \| 24.70 \| 21.20 \| 73.38 \|
	\| Qwen3.5-122B \| 76.40 \| 55.68 \| 87.80 \| 86.40 \| 84.00 \| 74.40 \| 73.00 \| 59.00 \| 73.90 \| 37.50 \| 53.70 \| 79.20 \| 79.50 \| 35.90 \| 35.30 \| 75.08 \|
	\| MedGemma-27B \| 73.40 \| 74.80 \| 84.40 \| 85.00 \| 83.80 \| 71.90 \| 73.00 \| 48.00 \| 69.60 \| 81.40 \| 24.10 \| 73.70 \| 68.80 \| 19.10 \| 20.50 \| 73.99 \|
	\| Gemma4-26B-A4B \| 76.40 \| 72.00 \| 81.80 \| 84.50 \| 82.30 \| 67.30 \| 73.50 \| 67.00 \| 71.50 \| 86.50 \| 45.60 \| 73.70 \| 67.50 \| 45.10 \| 39.20 \| 75.34 \|
	\| L1-16B-A3B \| 84.20 \| 78.40 \| 85.50 \| 88.20 \| 85.80 \| 76.70 \| 74.90 \| 82.00 \| 73.10 \| 76.10 \| 43.90 \| 78.90 \| 70.80 \| 27.50 \| 29.20 \| 77.74 \|

	### Chat Task

	\| Model \| [HealthBench-Consensus](https://github.com/openai/simple-evals) \|
	\|:---\|:---:\|
	\| GPT-OSS-120B \| 90.60 \|
	\| GPT-OSS-20B \| 78.70 \|
	\| Qwen3.5-122B \| 92.20 \|
	\| MedGemma-27B \| 90.70 \|
	\| Gemma4-26B-A4B \| 92.60 \|
	\| L1-16B-A3B \| 93.50 \|

	## 📝 Citation

	```bibtex
	@misc{lunit2026l1,
	title={L1: The First Clinical Language Model by Lunit},
	author={Lunit},
	year={2026},
	url={https://huggingface.co/learning-unit/L1-16B-A3B}
	}
	```

	## ⚠️ Limitations

	- Not a substitute for professional medical judgment. L1 may generate factually incorrect, incomplete, or outdated clinical information. All outputs should be verified by qualified healthcare professionals.
	- Thinking overhead. Chain-of-thought reasoning in `<think>` tags increases token consumption and latency compared to non-thinking models of similar size.
	- Context length. Maximum context length is 32,768 tokens.
	- No real-time knowledge. The model's knowledge is limited to its training data cutoff and does not reflect the latest medical guidelines or drug approvals.

	## 🤝 Acknowledgements

	This work was supported by the Domain-Specific Foundation Model Project (인공지능 특화 파운데이션 모델 프로젝트), funded by the Ministry of Science and ICT (과학기술정보통신부) and managed by the National IT Industry Promotion Agency (NIPA).

	L1 is a collaborative effort by the following consortium members:

	Industry
	- Lunit
	- Trillion Labs
	- SK Biopharmaceuticals
	- Kakao Healthcare
	- AIGEN Sciences
	- D-Circle
	- Rebellions
	- Standigm

	Academia
	- Prof. Choi Yun-jae's Lab from KAIST
	- Prof. Hong Seung-hoon's Lab from KAIST
	- Prof. Jung Yu-seong's Lab from SNU
	- Prof. Kim Hyun-woo's Lab from KAIST
	- Prof. Kim Tae-gyun's Lab from KAIST
	- Prof. Ye Jong-cheol's Lab from KAIST

	Hospitals
	- NHIS Ilsan Hospital
	- Ewha Womans University Seoul Hospital
	- Keimyung University Dongsan Medical Center
	- Konyang University Hospital
	- Korea University Research & Business Foundation
	- Kyung Hee University Hospital at Gangdong
	- Kyung Hee University Medical Center
	- Pusan National University Yangsan Hospital
	- Yongin Severance Hospital

	<p align="center">
	<img src="consortium.png" alt="Consortium Members" style="width: 80%;">
	</p>

	## 📄 License

	This model is licensed under the [Apache 2.0 License](LICENSE).

	## 📬 Contact

	- Taesoo Kim (김태수) — [taesoo.kim@lunit.io](mailto:taesoo.kim@lunit.io)
	- Donggeun Yoo (유동근) — [dgyoo@lunit.io](mailto:dgyoo@lunit.io)