--- language: - ko license: gemma library_name: transformers pipeline_tag: image-text-to-text tags: - darwin - darwin-v8 - gemma4 - korean - administrative-ai - public-sector - government - multimodal - image-text-to-text - reasoning - thinking - conversational - gpqa - benchmark - leaderboard - k-ai - k-ai-leaderboard - vidraft - jgos - text-generation - ffn-transfer - model-merge --- # JGOS-31B-Citizen

#1 on the K-AI Leaderboard

๐Ÿ† #1 on the K-AI Leaderboard · Korea's national Korean-language AI benchmark (leaderboard.aihub.or.kr)

**JGOS-31B-Citizen** is a Korean, multimodal large language model **specialized for administrative & public-sector AI services** โ€” civil-complaint response, public-document understanding, and government-domain question answering. ## Overview JGOS-31B-Citizen is built on VIDRAFT's **Darwin V8** platform. - **Base + FFN transfer, breeding & evolution (Darwin V8).** Starting from our in-house **gemma4-31b** base, the **feed-forward network (FFN)** blocks of multiple source models are extracted and grafted, then bred (merged) and evolved across **multiple generations** through the Darwin V8 pipeline to accumulate capability. - **Korean administrative-domain fine-tuning.** The evolved model is further trained on **Korean-specialized datasets** to strengthen Korean comprehension, reasoning, and **administrative/public-sector domain** performance. > The set of grafted source models, the number of evolution generations, the breeding strategy, dataset composition, and training configuration are proprietary and not disclosed. ## Specifications | Item | Value | |------|-------| | Parameters | ~31B (dense) | | Modality | Text + Image (multimodal) | | Context length | up to 256K tokens | | Base family | gemma4-31b (Gemma-compatible architecture) | | Focus | Administrative & public-sector AI services | ## Highlights - ๐Ÿ† **#1 on the K-AI Leaderboard** โ€” Korea's national Korean-language AI benchmark (KMMLU-Pro ยท CLIcK ยท HLE ยท MuSR ยท Com2) - **GPQA Diamond: 84.34%** ## Evaluation ### GPQA Diamond (198 questions) | Method (test-time compute) | Score | |----------------------------|-------| | maj@8 + tie-retry + DELPHI + near-miss maj@32-64 (weighted vote) | **84.34%** (167/198) | ## Training Datasets JGOS-31B-Citizen was trained using large-scale Korean corpora sourced from the **Korean AI Hub (AIHub)** โ€” Korea's national AI data repository operated by NIA. The following datasets were used to optimize performance on the **K-AI Leaderboard** benchmarks (KoMMLU-Pro, CLIcK, HLE, MuSR, Com2): | # | Dataset Name | AIHub Link | |---|---|---| | 1 | Medical and Legal Professional Books Corpus | [71487](https://aihub.or.kr/aihubdata/data/view.do?dataSetSn=71487) | | 2 | Financial and Legal Document Machine Reading Comprehension | [71610](https://aihub.or.kr/aihubdata/data/view.do?dataSetSn=71610) | | 3 | Large-scale Web-based Korean Corpus | [624](https://aihub.or.kr/aihubdata/data/view.do?dataSetSn=624) | | 4 | Large-scale Book-based Korean Corpus | [653](https://aihub.or.kr/aihubdata/data/view.do?dataSetSn=653) | | 5 | National Records Large-scale AI Learning Corpus | [71788](https://aihub.or.kr/aihubdata/data/view.do?dataSetSn=71788) | | 6 | Korean Generation-based Common Sense Reasoning Dataset | [459](https://aihub.or.kr/aihubdata/data/view.do?dataSetSn=459) | | 7 | Multi-session Dialogue Corpus | [pkg1](https://aihub.or.kr/aihubdata/data/view.do?currMenu=511&topMenu=100&aihubDataSe=dataPckage&dataPckageSn=1) | | 8 | Essential Medical Knowledge Data (142GB) | [71875](https://aihub.or.kr/aihubdata/data/view.do?dataSetSn=71875) | | 9 | Specialized Medical Knowledge Data (206GB) | [71874](https://aihub.or.kr/aihubdata/data/view.do?dataSetSn=71874) | | 10 | Korean Dialogue Dataset | [272](https://aihub.or.kr/aihubdata/data/view.do?dataSetSn=272) | > All datasets are publicly available via [AIHub](https://aihub.or.kr) (registration required). ## License This model is built on a Gemma-family architecture and is distributed under the [**Gemma Terms of Use**](https://ai.google.dev/gemma/terms). By using this model, you agree to the Gemma license terms.