File size: 10,519 Bytes
3b553d7
 
 
 
 
0efabda
3b553d7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
39bee82
3b553d7
 
39bee82
5b5895f
3b553d7
 
 
 
 
 
 
39bee82
3b553d7
 
5b5895f
3b553d7
 
 
 
 
 
 
ea4025f
3b553d7
c44b607
3b553d7
 
 
 
 
 
 
 
 
 
 
 
 
 
2e9083d
3b553d7
 
 
 
 
 
 
 
 
 
 
2e9083d
3b553d7
 
 
 
 
 
 
 
 
ab02c4c
 
 
 
 
3b553d7
 
 
 
2e9083d
3b553d7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4903aca
 
 
 
 
 
 
 
3b553d7
 
 
4903aca
 
 
 
 
 
 
 
3b553d7
 
 
 
 
 
 
 
2e9083d
3b553d7
 
 
 
 
852c491
3b553d7
 
 
 
 
 
d9d6f7a
 
3b553d7
 
 
 
 
 
 
 
 
 
 
 
 
39bee82
 
 
 
 
 
3b553d7
 
 
 
 
39bee82
 
 
 
3b553d7
39bee82
3b553d7
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
---
license: apache-2.0
language:
  - en
base_model:
  - trillionlabs/Gravity-16B-A3B-Base
tags:
  - medical
  - clinical
  - mixture-of-experts
  - conversational
  - sft
library_name: transformers
pipeline_tag: text-generation
---

<p align="center">
  <img src="banner.png" alt="L1" style="width: 80%;">
</p>

# Learning Unit 1

**L1** (Learning Unit 1) is the first language model from [Lunit](https://www.lunit.io) and Lunit Consortium, purpose-built for the medical domain. Derived from [Gravity-16B-A3B-Base](https://huggingface.co/trillionlabs/Gravity-16B-A3B-Base), L1 is designed for clinical reasoning and decision support.

### ✨ Key Highlights
* 🩺 **Medical-Domain Specialized**: Developed specifically for clinical reasoning and medical decision support
* ⚑ **Efficient MoE**: Only 3B parameters active per token out of 16.24B total β€” fast inference with high capacity
* πŸ’­ **Thinking Model**: Performs step-by-step reasoning in `<think>` tags before generating the final answer

> **Note:** L1 reasons internally using `<think>...</think>` blocks before producing a response. This chain-of-thought process improves answer quality but consumes additional tokens. Set `max_tokens` accordingly (recommended: 2048+).

### πŸ“‹ Model Specifications

- Type: Causal Language Model
- Base Model: [Gravity-16B-A3B-Base](https://huggingface.co/trillionlabs/Gravity-16B-A3B-Base) from Trillion Labs and Lunit Consortium
- Architecture: GravityMoE (Sparse Mixture-of-Experts with MLA)
- Total Parameters: 16.24B
- Active Parameters: 3B
- Number of Layers: 28
- Attention Heads: 16
- KV Heads: 16
- Hidden Size: 2048
- MoE Intermediate Size: 1408
- Routed Experts: 64 (top-8 selection)
- Shared Experts: 1
- Context Length: 32,768 tokens
- Vocabulary Size: 151,552
- Tokenizer: GLM-4.5
- Precision: bf16

## πŸš€ Quickstart

### SGLang (Recommended)

**Install:**
```bash
pip install "sglang[all] @ git+https://github.com/trillion-labs/sglang-gravity.git#subdirectory=python"
```

**Launch server:**
```bash
python -m sglang.launch_server \
  --model-path learning-unit/L1-16B-A3B \
  --port 9006 --host 0.0.0.0 \
  --tp 1 --dtype bfloat16 --trust-remote-code \
  --attention-backend triton \
  --moe-runner-backend triton
```

**Query:**
```bash
curl -X POST http://localhost:9006/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "learning-unit/L1-16B-A3B",
    "messages": [
      {"role": "user", "content": "What are the diagnostic criteria for sepsis?"}
    ],
    "max_tokens": 2048
  }'
```

### Transformers

**Install:**
```bash
pip install "transformers>=5.0" torch
```

```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "learning-unit/L1-16B-A3B"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)

messages = [
    {"role": "user", "content": "What are the diagnostic criteria for sepsis?"}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=2048,
    temperature=0.7,
    do_sample=True,
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)
```

## πŸ’¬ Examples

L1 is specialized for the medical domain and covers a wide range of clinical scenarios. Below are representative examples from real-world clinical use cases.

### Medical Q&A

> A 45-year-old woman with lupus nephritis on mycophenolate and prednisone develops fever, dry cough, and bilateral ground-glass opacities on chest CT. Her CD4 count is 180. What is your differential diagnosis and recommended workup?

### Patient Education

> I have diabetes and use insulin daily. What is the proper way to store insulin at home?

### Clinical Documentation

> Please draft an overnight progress note. Patient labs: RBC 4.5, WBC 8. Vitals: HR 82, BP 118/76, RR 15, Temp 37.1. Nurse reports stable overnight. Plan: continue antibiotics, recheck labs in the morning.

### Emergency Triage

> λ‹€μŒ 응급싀 ν™˜μžμ— λŒ€ν•΄ KTAS triageλ₯Ό μˆ˜ν–‰ν•˜κ³ , 초기 진단 및 감별진단을 μ œμ‹œν•΄μ£Όμ„Έμš”. 78μ„Έ μ—¬μ„± ν™˜μžκ°€ 119 κ΅¬κΈ‰μ°¨λ‘œ 응급싀에 λ‚΄μ›ν–ˆμŠ΅λ‹ˆλ‹€. 22μ‹œκ²½ κ°‘μžκΈ° 쒌츑 μ•ˆλ©΄μ΄ μ²˜μ§€κ³  말이 μ–΄λˆŒν•΄μ§€λŠ” 증상이 λ°œμƒν–ˆμŠ΅λ‹ˆλ‹€. 두톡을 ν˜Έμ†Œν•˜λ©°, κ³ ν˜ˆμ•• 병λ ₯이 μžˆμŠ΅λ‹ˆλ‹€. ν™œλ ₯μ§•ν›„λŠ” ν˜ˆμ•• 172/88, μ‹¬λ°•μˆ˜ 92, 호흑수 14, 체온 36.8, μ‚°μ†Œν¬ν™”λ„ 98%이고 μ˜μ‹μ€ λͺ…λ£Œν•©λ‹ˆλ‹€. 사지 μœ„μ•½κ°μ€ μ—†μŠ΅λ‹ˆλ‹€.

### Adverse Drug Reaction (ADR) Causality Assessment

> λ‹€μŒ ν™˜μžμ˜ μ•½λ¬Όμ΄μƒλ°˜μ‘(ADR)에 λŒ€ν•΄ WHO-UMC κΈ°μ€€μœΌλ‘œ 인과관계λ₯Ό ν‰κ°€ν•΄μ£Όμ„Έμš”. 80μ„Έ μ—¬μ„± ν™˜μžκ°€ κΈ°κ΄€μ§€ν™•μž₯증으둜 μž…μ› 쀑 moxifloxacin 400mg IVλ₯Ό νˆ¬μ—¬λ°›μ•˜μŠ΅λ‹ˆλ‹€. νˆ¬μ—¬ 쀑 μ „μ‹  ν”ΌλΆ€ 가렀움이 μƒˆλ‘œ λ°œμƒν–ˆκ³ , μ•½λ¬Ό 쀑단 ν›„ ν™˜μž 본인도 가렀움이 μ€„μ–΄λ“œλŠ” 양상을 ν‘œν˜„ν–ˆμœΌλ©° 이후 νšŒλ³΅λ˜μ—ˆμŠ΅λ‹ˆλ‹€. μž¬νˆ¬μ—¬λŠ” μ‹œν–‰ν•˜μ§€ μ•Šμ•˜μŠ΅λ‹ˆλ‹€. κΈ°μ‘΄ μ•½λ¬Ό μ•Œλ ˆλ₯΄κΈ°λ ₯은 μ—†κ³ , 가렀움을 μœ λ°œν•  λ§Œν•œ λ‹€λ₯Έ λ³‘μš©μ•½λ¬Όμ΄λ‚˜ ν”ΌλΆ€μ§ˆν™˜μ€ ν™•μΈλ˜μ§€ μ•Šμ•˜μŠ΅λ‹ˆλ‹€.

## πŸ“Š Benchmark

All benchmarks were evaluated using [CoEval](https://github.com/lunit-io/CoEval), Lunit's open-source medical LLM evaluation framework. Evaluations use greedy decoding (temperature=0). To reproduce these results:

```bash
git clone https://github.com/lunit-io/CoEval.git
cd CoEval
```

Refer to the [CoEval Quickstart](https://github.com/lunit-io/CoEval#quickstart) for setup and evaluation instructions.

### MCQA Benchmarks

| Model | [PubMedQA](https://huggingface.co/datasets/qiaojin/PubMedQA) | [AttrBench](https://huggingface.co/datasets/osunlp/AttributionBench) | [MedQA](https://huggingface.co/datasets/GBaker/MedQA-USMLE-4-options) | [CareQA](https://huggingface.co/datasets/HPAI-BSC/CareQA) | [HeadQA](https://huggingface.co/datasets/alesi12/head_qa_v2) | [MedMCQA](https://huggingface.co/datasets/lighteval/med_mcqa) | [MMLU-Pro (Health)](https://huggingface.co/datasets/TIGER-Lab/MMLU-Pro) | [M-ARC](https://huggingface.co/datasets/mkieffer/M-ARC) | [MetaMedQA](https://huggingface.co/datasets/maximegmd/MetaMedQA) | [MedHallu](https://huggingface.co/datasets/UTAustin-AIHealth/MedHallu) | [MedCalc](https://huggingface.co/datasets/ncbi/MedCalc-Bench) | [MedBullets](https://huggingface.co/datasets/mkieffer/Medbullets) 4-opt | [MedBullets](https://huggingface.co/datasets/mkieffer/Medbullets) 5-opt | [MedXpertQA](https://huggingface.co/datasets/TsinghuaC3I/MedXpertQA)-R | [MedXpertQA](https://huggingface.co/datasets/TsinghuaC3I/MedXpertQA)-U | W.Avg |
|:---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
| GPT-OSS-120B | 78.00 | 76.10 | 91.10 | 91.00 | 88.40 | 74.80 | 74.60 | 40.00 | 76.50 | 83.50 | 30.30 | 84.70 | 82.10 | 35.60 | 32.90 | 79.43 |
| GPT-OSS-20B | 75.80 | 74.80 | 83.90 | 84.80 | 83.30 | 65.40 | 70.50 | 31.00 | 70.10 | 81.30 | 29.20 | 73.40 | 70.50 | 24.70 | 21.20 | 73.38 |
| Qwen3.5-122B | 76.40 | 55.68 | 87.80 | 86.40 | 84.00 | 74.40 | 73.00 | 59.00 | 73.90 | 37.50 | 53.70 | 79.20 | 79.50 | 35.90 | 35.30 | 75.08 |
| MedGemma-27B | 73.40 | 74.80 | 84.40 | 85.00 | 83.80 | 71.90 | 73.00 | 48.00 | 69.60 | 81.40 | 24.10 | 73.70 | 68.80 | 19.10 | 20.50 | 73.99 |
| Gemma4-26B-A4B | 76.40 | 72.00 | 81.80 | 84.50 | 82.30 | 67.30 | 73.50 | 67.00 | 71.50 | 86.50 | 45.60 | 73.70 | 67.50 | 45.10 | 39.20 | 75.34 |
| L1-16B-A3B | 84.20 | 78.40 | 85.50 | 88.20 | 85.80 | 76.70 | 74.90 | 82.00 | 73.10 | 76.10 | 43.90 | 78.90 | 70.80 | 27.50 | 29.20 | 77.74 |

### Chat Task

| Model | [HealthBench-Consensus](https://github.com/openai/simple-evals) |
|:---|:---:|
| GPT-OSS-120B | 90.60 |
| GPT-OSS-20B | 78.70 |
| Qwen3.5-122B | 92.20 |
| MedGemma-27B | 90.70 |
| Gemma4-26B-A4B | 92.60 |
| L1-16B-A3B | 93.50 |

## πŸ“ Citation

```bibtex
@misc{lunit2026l1,
  title={L1: The First Clinical Language Model by Lunit},
  author={Lunit},
  year={2026},
  url={https://huggingface.co/learning-unit/L1-16B-A3B}
}
```

## ⚠️ Limitations

- **Not a substitute for professional medical judgment.** L1 may generate factually incorrect, incomplete, or outdated clinical information. All outputs should be verified by qualified healthcare professionals.
- **Thinking overhead.** Chain-of-thought reasoning in `<think>` tags increases token consumption and latency compared to non-thinking models of similar size.
- **Context length.** Maximum context length is 32,768 tokens.
- **No real-time knowledge.** The model's knowledge is limited to its training data cutoff and does not reflect the latest medical guidelines or drug approvals.

## 🀝 Acknowledgements

This work was supported by the Domain-Specific Foundation Model Project (인곡지λŠ₯ νŠΉν™” νŒŒμš΄λ°μ΄μ…˜ λͺ¨λΈ ν”„λ‘œμ νŠΈ), funded by the Ministry of Science and ICT (κ³Όν•™κΈ°μˆ μ •λ³΄ν†΅μ‹ λΆ€) and managed by the National IT Industry Promotion Agency (NIPA).

L1 is a collaborative effort by the following consortium members:

**Industry**
- Lunit
- Trillion Labs
- SK Biopharmaceuticals
- Kakao Healthcare
- AIGEN Sciences
- D-Circle
- Rebellions
- Standigm

**Academia**
- Prof. Choi Yun-jae's Lab from KAIST
- Prof. Hong Seung-hoon's Lab from KAIST
- Prof. Jung Yu-seong's Lab from SNU
- Prof. Kim Hyun-woo's Lab from KAIST
- Prof. Kim Tae-gyun's Lab from KAIST
- Prof. Ye Jong-cheol's Lab from KAIST

**Hospitals**
- NHIS Ilsan Hospital
- Ewha Womans University Seoul Hospital
- Keimyung University Dongsan Medical Center
- Konyang University Hospital
- Korea University Research & Business Foundation
- Kyung Hee University Hospital at Gangdong
- Kyung Hee University Medical Center
- Pusan National University Yangsan Hospital
- Yongin Severance Hospital

<p align="center">
  <img src="consortium.png" alt="Consortium Members" style="width: 80%;">
</p>

## πŸ“„ License

This model is licensed under the [Apache 2.0 License](LICENSE).

## πŸ“¬ Contact

- Taesoo Kim (κΉ€νƒœμˆ˜) β€” [taesoo.kim@lunit.io](mailto:taesoo.kim@lunit.io)
- Donggeun Yoo (μœ λ™κ·Ό) β€” [dgyoo@lunit.io](mailto:dgyoo@lunit.io)