Author and Acknowledgments

Author: Toshiki Demizu (出水利樹) — GitHub/Hugging Face ID: @demimomi
Affiliation: ソフトバンク株式会社、MONET Technologies株式会社
Course: Large Language Model Development Lecture (Autumn 2025)

Special thanks to the instructors and collaborators at the Matsuo Laboratory and SoftBank Corp. for their guidance in practical LLM research and fine-tuning experiments.

Project Background

This model was developed as part of the Large Language Model Development Lecture (大規模言語モデル開発講座基礎編) at the Matsuo Laboratory, The University of Tokyo.
基礎編4,000名参加　修了率46.9％　（メモ：応用編は3,800名が参加） The course provides practical training in fine-tuning, preference optimization (DPO), and deployment of large-scale Japanese language models.
More details about the course can be found here:
🔗 The University of Tokyo WebLab – Large Language Model Course

This specific fine-tuning ("sarashina/dpo") was carried out using the base model
sbintuitions/sarashina2.2-0.5b
as part of the course’s applied experiments.

Model Summary (Japanese)

本モデルは、東京大学松尾研究室が主催する「大規模言語モデル開発講座」における実践的な開発課題として構築されたものであり、講座で習得した理論・実装技術・評価手法を統合して生み出された成果物です。受講者が企画から設計、学習、チューニング、検証までを通しで実践するプロセスを経て、大規模言語モデル（LLM）の内部構造・学習原理・応用設計を深く理解し、その知見を反映した独自モデルとして完成させています。

ベースモデルには sbintuitions/sarashina2.2-0.5b を採用しています。 Sarashina シリーズは、ソフトバンク株式会社の子会社である sb Intuitions によって開発されている日本語特化型 LLM ファミリーで、軽量モデルでありながら高い自然言語理解能力と推論能力を実現している点が特徴です。特に 0.5b クラスは、省メモリ環境でも高速に動作し、研究からプロトタイピングまで幅広い用途で扱いやすい構成となっています。また、ファインチューニング適性が高く、指向性調整の効果が現れやすいことから、実験的なモデル開発のベースとしても非常に適しています。

本モデルでは、DPO（Direct Preference Optimization）を用いた指向性チューニングを実施しています。 DPO は、人間の選好データに基づいてモデル出力の品質と一貫性を直接最適化する手法であり、従来の RLHF に比べて実装がシンプルかつ高効率で、意図したスタイルや応答傾向をより確実にモデルへ反映できます。今回のチューニングでは、

・自然で流れのある対話性

・文脈保持と論理的一貫性

ユーザー指示への忠実かつ正確な応答といった要素に重点を置き、軽量モデルでありながら高い実用性を発揮するよう設計しました。

また、学習データの設計、プロンプトスタイルの調整、評価指標のチューニングなど、多角的な改善を行い、ユーザー体験の向上と汎用性の確保を目指しています。これにより、本モデルは日常的なアシスタント用途から技術文書生成、研究用途のプロトタイプ開発まで幅広い場面で活用できる柔軟なモデルへと仕上がっています。

開発者：出水利樹（demimomi）講座課題を超えて、今後の研究・業務・サービス検証にも展開可能な「扱いやすく高性能な小型 LLM」を目標に取り組みました。小規模モデルながら応答品質と操作性の両立を追求しており、軽量モデル開発・LLM 応用研究の基盤としても価値のある構成となっています。

大規模言語モデル開発講座の詳細や、そこで扱われる最新技術・講義内容については以下をご参照ください： https://weblab.t.u-tokyo.ac.jp/large-language-model/

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Reinforcement Learning

Model tree for demimomi/U-Tokyo-2025Basic

Base model

sbintuitions/sarashina2.2-1b

Finetuned

sbintuitions/sarashina-embedding-v2-1b

Finetuned

(6)

this model