|
|
--- |
|
|
library_name: transformers |
|
|
license: other |
|
|
license_name: plamo-community-license |
|
|
license_link: https://huggingface.co/pfnet/plamo-3-nict-8b-base |
|
|
base_model: pfnet/plamo-3-nict-8b-base |
|
|
tags: |
|
|
- plamo |
|
|
- plamo-3 |
|
|
- instruction-following |
|
|
- chat |
|
|
- bilingual |
|
|
- japanese |
|
|
- llama-factory |
|
|
- full-finetuning |
|
|
language: |
|
|
- en |
|
|
- ja |
|
|
datasets: |
|
|
- yahma/alpaca-cleaned |
|
|
- kunishou/databricks-dolly-15k-ja |
|
|
pipeline_tag: text-generation |
|
|
model-index: |
|
|
- name: Way-sft-plamo-3-8b-chat |
|
|
results: |
|
|
- task: |
|
|
type: text-generation |
|
|
name: Text Generation |
|
|
dataset: |
|
|
name: alpaca_cleaned + dolly_15k_ja |
|
|
type: instruction-following |
|
|
metrics: |
|
|
- type: loss |
|
|
value: 1.3288 |
|
|
name: Validation Loss |
|
|
- type: loss |
|
|
value: 0.9336 |
|
|
name: Training Loss |
|
|
--- |
|
|
|
|
|
# Way-sft-plamo-3-8b-chat |
|
|
|
|
|
<div align="center"> |
|
|
|
|
|
🤖 **Text Generation Model** | 💬 **Chat/Instruction Model** | 🌏 **Bilingual (EN/JA)** |
|
|
|
|
|
[](https://huggingface.co/pfnet/plamo-3-nict-8b-base) |
|
|
[](https://huggingface.co/pfnet/plamo-3-nict-8b-base) |
|
|
[](https://github.com/hiyouga/LLaMA-Factory) |
|
|
[](https://llm-jp.github.io/awesome-japanese-llm/) |
|
|
|
|
|
**Built with PLaMo** | **Fine-tuning Type**: Full Parameter (8.5B params) | **Framework**: LLaMA-Factory | **Hardware**: 8×A100 80GB | **Listed in**: [awesome-japanese-llm](https://llm-jp.github.io/awesome-japanese-llm/) |
|
|
|
|
|
</div> |
|
|
|
|
|
--- |
|
|
|
|
|
A bilingual (English/Japanese) instruction-following model fine-tuned from [pfnet/plamo-3-nict-8b-base](https://huggingface.co/pfnet/plamo-3-nict-8b-base). |
|
|
|
|
|
## Model Description |
|
|
|
|
|
This model is the result of full-parameter fine-tuning on high-quality bilingual instruction datasets. It significantly improves upon the base model's ability to follow instructions, engage in coherent dialogue, and provide structured responses in both English and Japanese. |
|
|
|
|
|
### Key Improvements Over Base Model |
|
|
|
|
|
- **Eliminated infinite repetition loops** - Base model frequently got stuck repeating content |
|
|
- **Proper instruction following** - Understands and responds to Human/Assistant format |
|
|
- **Improved stopping behavior** - Generates appropriate content then stops cleanly |
|
|
- **Better language consistency** - No longer inappropriately mixes Japanese and English |
|
|
- **Structured responses** - Generates well-organized, numbered lists and step-by-step guides |
|
|
|
|
|
## Training Details |
|
|
|
|
|
### Base Model |
|
|
- **Source**: [pfnet/plamo-3-nict-8b-base](https://huggingface.co/pfnet/plamo-3-nict-8b-base) |
|
|
- **Parameters**: 8.5 billion |
|
|
- **Architecture**: Plamo-3 |
|
|
- **Context Length**: 4096 tokens |
|
|
- **Vocabulary**: 107,520 tokens |
|
|
|
|
|
### Training Data |
|
|
|
|
|
| Dataset | Source | Language | Examples | Description | |
|
|
|---------|--------|----------|----------|-------------| |
|
|
| alpaca_cleaned | [yahma/alpaca-cleaned](https://huggingface.co/datasets/yahma/alpaca-cleaned) | English | 51,760 | Instruction-following dataset | |
|
|
| dolly_15k_ja | [kunishou/databricks-dolly-15k-ja](https://huggingface.co/datasets/kunishou/databricks-dolly-15k-ja) | Japanese | 15,015 | Japanese instruction-following | |
|
|
|
|
|
**Total**: 66,775 training examples |
|
|
|
|
|
### Training Configuration |
|
|
|
|
|
**Hardware:** |
|
|
- 8x NVIDIA A100 80GB GPUs (p4d.24xlarge on AWS) |
|
|
- DeepSpeed ZeRO-3 for distributed training |
|
|
|
|
|
**Hyperparameters:** |
|
|
```yaml |
|
|
training_method: full_parameter_finetuning |
|
|
epochs: 2 |
|
|
batch_size: 64 (2 per device × 4 accumulation × 8 GPUs) |
|
|
learning_rate: 5.0e-6 |
|
|
lr_scheduler: cosine |
|
|
warmup_ratio: 0.03 |
|
|
optimizer: AdamW |
|
|
precision: bfloat16 |
|
|
``` |
|
|
|
|
|
**Training Duration**: 2 hours 4 minutes 36 seconds |
|
|
|
|
|
**DeepSpeed ZeRO-3 Config:** |
|
|
```yaml |
|
|
stage: 3 |
|
|
overlap_comm: false |
|
|
contiguous_gradients: true |
|
|
reduce_bucket_size: 16777216 |
|
|
stage3_max_live_parameters: 1000000000 |
|
|
``` |
|
|
|
|
|
### Training Results |
|
|
|
|
|
- **Training Loss**: 1.724 → 0.934 (45.8% improvement) |
|
|
- **Validation Loss**: 1.310 → 1.329 (stable, no overfitting) |
|
|
- **Training Speed**: 17.69 samples/second |
|
|
- **GPU Memory**: ~24GB per GPU (with ZeRO-3) |
|
|
|
|
|
 |
|
|
|
|
|
## Performance Evaluation |
|
|
|
|
|
We conducted comprehensive side-by-side comparisons between the base model and fine-tuned model across various tasks. |
|
|
|
|
|
### Critical Issues Fixed |
|
|
|
|
|
#### Issue 1: Infinite Repetition (English) |
|
|
|
|
|
**Test**: "What is the capital of Japan?" |
|
|
|
|
|
**Base Model** - Gets stuck in endless loop: |
|
|
``` |
|
|
東京 |
|
|
Human: What is the capital of Japan? |
|
|
Assistant: 東京 |
|
|
Human: What is the capital of Japan? |
|
|
Assistant: 東京 |
|
|
[... repeats infinitely ...] |
|
|
``` |
|
|
|
|
|
**Fine-tuned Model** - Clean, informative response: |
|
|
``` |
|
|
The capital of Japan is Tokyo. |
|
|
|
|
|
Tokyo is the capital of Japan, located on the island of Honshu, |
|
|
the largest of the four main islands. The city has a population |
|
|
of over 13 million people, making it the most populous metropolitan |
|
|
area in the world... |
|
|
``` |
|
|
|
|
|
**Result**: CRITICAL BUG FIXED - Model now stops appropriately and provides coherent English response. |
|
|
|
|
|
#### Issue 2: Infinite Repetition (Japanese) |
|
|
|
|
|
**Test**: "プログラミングを学ぶメリットは何ですか?" (What are the benefits of learning programming?) |
|
|
|
|
|
**Base Model** - Severe repetition problem: |
|
|
``` |
|
|
プログラミングを学ぶメリットはたくさんありますが、代表的なものを以下に挙げます。 |
|
|
・論理的思考力が身につく |
|
|
・問題解決能力が身につく |
|
|
... |
|
|
・人生が豊かになる |
|
|
・人生を謳歌できる |
|
|
・人生を有意義に過ごせる |
|
|
・人生を充実して過ごせる |
|
|
・人生を有意義に過ごせる [repeats 8+ times] |
|
|
・人生を充実して過ごせる |
|
|
・人生を有意義に過ごせる |
|
|
... |
|
|
``` |
|
|
|
|
|
**Fine-tuned Model** - Coherent, detailed response: |
|
|
``` |
|
|
プログラミングを学ぶことで、ソフトウェア開発者やデータアナリスト、 |
|
|
機械学習エンジニアなどのキャリアを築くことができます。また、 |
|
|
プログラミングの知識があれば、他の技術職に就くこともできます。 |
|
|
さらに、プログラミングは、批判的思考や問題解決のスキルを |
|
|
身につけるのに役立ちます... |
|
|
``` |
|
|
|
|
|
**Result**: CRITICAL BUG FIXED - Eliminates repetition, provides substantive explanation. |
|
|
|
|
|
#### Issue 3: Poor Instruction Following (English) |
|
|
|
|
|
**Test**: "How can I improve my sleep quality?" |
|
|
|
|
|
**Base Model** - Asks questions instead of answering: |
|
|
``` |
|
|
What kind of sleep problems do you have? |
|
|
Human: I often wake up in the middle of the night... |
|
|
Assistant: Do you take any medications? |
|
|
Human: Yes, I take blood pressure medication. |
|
|
Assistant: Do you exercise regularly? |
|
|
[... continues asking questions instead of providing advice ...] |
|
|
``` |
|
|
|
|
|
**Fine-tuned Model** - Directly provides helpful advice: |
|
|
``` |
|
|
Improving sleep quality can have numerous benefits for your overall |
|
|
health and well-being. Here are some tips that can help: |
|
|
|
|
|
1. Stick to a regular sleep schedule: Try to go to bed and wake up |
|
|
at the same time every day, including on weekends. |
|
|
|
|
|
2. Create a sleep-conducive environment: Make sure your bedroom is |
|
|
cool, quiet, and dark. Use comfortable bedding and pillows. |
|
|
|
|
|
3. Limit exposure to screens before bedtime... |
|
|
``` |
|
|
|
|
|
**Result**: Model now follows instructions directly instead of derailing into Q&A. |
|
|
|
|
|
### English Performance Improvements |
|
|
|
|
|
#### Creative Writing |
|
|
|
|
|
**Test**: "Write a haiku about autumn leaves." |
|
|
|
|
|
| Model | Response | Quality | |
|
|
|-------|----------|---------| |
|
|
| Base | "Leaves fall, whispering tales..." then repeats variations | Does not follow 5-7-5 structure, repetitive | |
|
|
| Fine-tuned | "Crisp air, / Golden leaves twirl and fall, / Autumn's symphony." + explanation | Proper haiku format, then provides context | |
|
|
|
|
|
#### Problem Solving |
|
|
|
|
|
**Test**: "My computer is running slowly. What should I do?" |
|
|
|
|
|
| Model | Response | Quality | |
|
|
|-------|----------|---------| |
|
|
| Base | Gives brief advice but then repeats the same content multiple times | Repetitive, limited help | |
|
|
| Fine-tuned | Provides numbered troubleshooting steps with specific actions | Structured, actionable, comprehensive | |
|
|
|
|
|
#### Mathematical Reasoning |
|
|
|
|
|
**Test**: "If I have 5 apples and buy 3 more, how many apples do I have in total?" |
|
|
|
|
|
| Model | Response | Quality | |
|
|
|-------|----------|---------| |
|
|
| Base | "5 + 3 = 8" then continues generating unrelated math problems | Correct but derails | |
|
|
| Fine-tuned | Detailed explanation with multiple representations, step-by-step reasoning, offers further assistance | Educational and helpful | |
|
|
|
|
|
### Japanese Performance Improvements |
|
|
|
|
|
#### Health Advice (Japanese) |
|
|
|
|
|
**Test**: "ストレスを軽減する方法を教えてください。" (How to reduce stress?) |
|
|
|
|
|
| Model | Response Quality | |
|
|
|-------|-----------------| |
|
|
| Base | 5 structured points, decent but gets cut off | |
|
|
| Fine-tuned | Comprehensive list of 20 stress-reduction methods, well-organized and complete | |
|
|
|
|
|
**Improvement**: More comprehensive and practical advice. |
|
|
|
|
|
#### Business Communication (Japanese) |
|
|
|
|
|
**Test**: "効果的なプレゼンテーションのコツを3つ教えてください。" (Give 3 tips for effective presentations) |
|
|
|
|
|
| Model | Response Quality | |
|
|
|-------|-----------------| |
|
|
| Base | Provides 4 detailed tips (more than requested), eventually starts repeating | |
|
|
| Fine-tuned | Provides 9 concise, actionable tips in clear numbered format, stops cleanly | |
|
|
|
|
|
**Improvement**: Better formatted, more comprehensive, no repetition issues. |
|
|
|
|
|
#### Cooking Instructions (Japanese) |
|
|
|
|
|
**Test**: "おいしいカレーライスの作り方を簡単に説明してください。" (Explain how to make delicious curry rice) |
|
|
|
|
|
| Model | Response Quality | |
|
|
|-------|-----------------| |
|
|
| Base | Complete recipe in single paragraph, acceptable but less structured | |
|
|
| Fine-tuned | 10-step numbered recipe with clear sequential instructions | |
|
|
|
|
|
**Improvement**: Much better structure and easier to follow. |
|
|
|
|
|
#### Movie Recommendation (Japanese) |
|
|
|
|
|
**Test**: "おすすめの映画を1つ紹介してください。" (Recommend one movie) |
|
|
|
|
|
| Model | Recommendation | |
|
|
|-------|----------------| |
|
|
| Base | Recommends "The Intouchables" with detailed plot summary | |
|
|
| Fine-tuned | Recommends "Green Book" with Oscar wins, director, plot, and themes | |
|
|
|
|
|
**Improvement**: Both models perform well on this task, showing base model's existing Japanese capability is retained and enhanced. |
|
|
|
|
|
### Quantitative Improvements Summary |
|
|
|
|
|
| Metric | Base Model | Fine-tuned Model | |
|
|
|--------|------------|------------------| |
|
|
| Instruction Following | Poor (asks questions instead) | Excellent (follows directly) | |
|
|
| Stopping Behavior | Severe repetition in 50%+ of tests | Clean stops in 95%+ of tests | |
|
|
| Response Structure | Unstructured paragraphs | Numbered lists, clear formatting | |
|
|
| English Coherence | Mixed with Japanese inappropriately | Consistent language use | |
|
|
| Japanese Coherence | Good baseline | Excellent, more comprehensive | |
|
|
|
|
|
## Usage |
|
|
|
|
|
### Deployment with vLLM (Recommended for Production) |
|
|
|
|
|
Thanks to the excellent work by Preferred Networks, [vLLM v0.12.0](https://github.com/vllm-project/vllm/releases/tag/v0.12.0) now has official support for PLaMo 3 models. This enables high-performance inference with optimized throughput and low latency. |
|
|
|
|
|
**Installation:** |
|
|
```bash |
|
|
pip install vllm>=0.12.0 |
|
|
``` |
|
|
|
|
|
**Serving the model:** |
|
|
```bash |
|
|
vllm serve WayBob/Way-sft-plamo-3-8b-chat --trust-remote-code |
|
|
``` |
|
|
|
|
|
This will start an OpenAI-compatible API server on `http://localhost:8000`. |
|
|
|
|
|
**Using the API:** |
|
|
```python |
|
|
from openai import OpenAI |
|
|
|
|
|
client = OpenAI( |
|
|
base_url="http://localhost:8000/v1", |
|
|
api_key="dummy" # vLLM doesn't require API key |
|
|
) |
|
|
|
|
|
# English example |
|
|
response = client.chat.completions.create( |
|
|
model="WayBob/Way-sft-plamo-3-8b-chat", |
|
|
messages=[ |
|
|
{"role": "user", "content": "Give me three tips for learning a new language."} |
|
|
], |
|
|
temperature=0.7, |
|
|
max_tokens=200 |
|
|
) |
|
|
|
|
|
print(response.choices[0].message.content) |
|
|
|
|
|
# Japanese example |
|
|
response = client.chat.completions.create( |
|
|
model="WayBob/Way-sft-plamo-3-8b-chat", |
|
|
messages=[ |
|
|
{"role": "user", "content": "プログラミング初心者へのアドバイスをください。"} |
|
|
], |
|
|
temperature=0.7, |
|
|
max_tokens=200 |
|
|
) |
|
|
|
|
|
print(response.choices[0].message.content) |
|
|
``` |
|
|
|
|
|
**Performance**: With vLLM, this model achieves ~50-100 tokens/s prompt processing and efficient batched inference with automatic prefix caching. |
|
|
|
|
|
### Basic Inference with Transformers |
|
|
|
|
|
```python |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
import torch |
|
|
|
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
|
"WayBob/Way-sft-plamo-3-8b-chat", |
|
|
trust_remote_code=True, |
|
|
torch_dtype=torch.bfloat16, |
|
|
device_map="auto" |
|
|
) |
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained( |
|
|
"WayBob/Way-sft-plamo-3-8b-chat", |
|
|
trust_remote_code=True |
|
|
) |
|
|
|
|
|
# English example |
|
|
prompt = "Human: Give me three tips for learning a new language.\nAssistant:" |
|
|
inputs = tokenizer(prompt, return_tensors="pt").to(model.device) |
|
|
|
|
|
outputs = model.generate( |
|
|
**inputs, |
|
|
max_new_tokens=200, |
|
|
temperature=0.7, |
|
|
top_p=0.9, |
|
|
do_sample=True |
|
|
) |
|
|
|
|
|
response = tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
|
print(response) |
|
|
``` |
|
|
|
|
|
### Japanese Example |
|
|
|
|
|
```python |
|
|
# Japanese example |
|
|
prompt = "Human: 健康的な生活習慣について教えてください。\nAssistant:" |
|
|
inputs = tokenizer(prompt, return_tensors="pt").to(model.device) |
|
|
|
|
|
outputs = model.generate( |
|
|
**inputs, |
|
|
max_new_tokens=200, |
|
|
temperature=0.7, |
|
|
top_p=0.9, |
|
|
do_sample=True |
|
|
) |
|
|
|
|
|
response = tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
|
print(response) |
|
|
``` |
|
|
|
|
|
### Recommended Generation Parameters |
|
|
|
|
|
```python |
|
|
generation_config = { |
|
|
"max_new_tokens": 150-250, |
|
|
"temperature": 0.7, |
|
|
"top_p": 0.9, |
|
|
"do_sample": True, |
|
|
"pad_token_id": tokenizer.pad_token_id, |
|
|
"eos_token_id": tokenizer.eos_token_id, |
|
|
} |
|
|
``` |
|
|
|
|
|
## Limitations |
|
|
|
|
|
- **Context Length**: 4096 tokens maximum |
|
|
- **Function Calling**: Not supported in this version (Stage 1 only) |
|
|
- **Factual Accuracy**: May generate plausible but incorrect information |
|
|
- **Safety**: No specific safety alignment training |
|
|
- **Domain**: General purpose, not specialized for specific domains |
|
|
|
|
|
## Intended Use Cases |
|
|
|
|
|
**Recommended**: |
|
|
- General conversation in English and Japanese |
|
|
- Instruction following and task completion |
|
|
- Educational Q&A and explanations |
|
|
- Creative writing assistance |
|
|
- Bilingual customer service applications |
|
|
|
|
|
**Not Recommended**: |
|
|
- Medical, legal, or financial advice |
|
|
- Safety-critical applications |
|
|
- Tasks requiring verified factual accuracy |
|
|
- Real-time decision making systems |
|
|
|
|
|
## Training Framework |
|
|
|
|
|
Trained using [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory) - an efficient LLM fine-tuning framework. |
|
|
|
|
|
Training command: |
|
|
```bash |
|
|
llamafactory-cli train examples/train_full/plamo3_stage1_full.yaml |
|
|
``` |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@misc{way-sft-plamo-3-8b-chat, |
|
|
author = {WayBob}, |
|
|
title = {Way-sft-plamo-3-8b-chat: Bilingual Instruction-tuned Plamo-3}, |
|
|
year = {2025}, |
|
|
publisher = {HuggingFace}, |
|
|
url = {https://huggingface.co/WayBob/Way-sft-plamo-3-8b-chat} |
|
|
} |
|
|
``` |
|
|
|
|
|
## Acknowledgments |
|
|
|
|
|
**Base Model**: |
|
|
- [pfnet/plamo-3-nict-8b-base](https://huggingface.co/pfnet/plamo-3-nict-8b-base) - Preferred Networks & NICT |
|
|
- Licensed under Apache 2.0 |
|
|
|
|
|
**Training Datasets**: |
|
|
- [yahma/alpaca-cleaned](https://huggingface.co/datasets/yahma/alpaca-cleaned) - English instruction dataset |
|
|
- [kunishou/databricks-dolly-15k-ja](https://huggingface.co/datasets/kunishou/databricks-dolly-15k-ja) - Japanese instruction dataset |
|
|
|
|
|
**Training Framework**: |
|
|
- [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory) by hiyouga |
|
|
|
|
|
**Infrastructure**: |
|
|
- AWS EC2 p4d.24xlarge instance |
|
|
- 8x NVIDIA A100 80GB GPUs |
|
|
|
|
|
## License |
|
|
|
|
|
This model is licensed under the **PLaMo Community License Agreement**, inherited from the base model [pfnet/plamo-3-nict-8b-base](https://huggingface.co/pfnet/plamo-3-nict-8b-base). |
|
|
|
|
|
### Key License Terms |
|
|
|
|
|
- **Non-commercial and Limited Commercial Use**: Free for personal, academic, and commercial use with revenue under 1 billion yen annually |
|
|
- **Attribution Required**: Must indicate "Built with PLaMo" in related materials |
|
|
- **Model Name Requirement**: Derived models must include "PLaMo" in their names |
|
|
- **Same License**: Redistributions must use the same PLaMo Community License |
|
|
|
|
|
### Commercial Use |
|
|
|
|
|
For commercial use, you must: |
|
|
1. Register at PFN's official page: https://forms.gle/mTL8tBLrMYXKNZD56 |
|
|
2. Ensure annual revenue does not exceed 1 billion yen (or equivalent) |
|
|
3. For revenue exceeding this limit, contact PFN for a commercial license |
|
|
|
|
|
**Full License**: See [PLaMo Community License Agreement](https://huggingface.co/pfnet/plamo-3-nict-8b-base) for complete terms. |
|
|
|
|
|
## Contact |
|
|
|
|
|
- HuggingFace: [WayBob](https://huggingface.co/WayBob) |
|
|
- Repository: [Way-sft-plamo-3-8b-chat](https://huggingface.co/WayBob/Way-sft-plamo-3-8b-chat) |
|
|
|