Way-sft-plamo-3-8b-chat
🤖 Text Generation Model | 💬 Chat/Instruction Model | 🌏 Bilingual (EN/JA)
Built with PLaMo | Fine-tuning Type: Full Parameter (8.5B params) | Framework: LLaMA-Factory | Hardware: 8×A100 80GB | Listed in: awesome-japanese-llm
A bilingual (English/Japanese) instruction-following model fine-tuned from pfnet/plamo-3-nict-8b-base.
Model Description
This model is the result of full-parameter fine-tuning on high-quality bilingual instruction datasets. It significantly improves upon the base model's ability to follow instructions, engage in coherent dialogue, and provide structured responses in both English and Japanese.
Key Improvements Over Base Model
- Eliminated infinite repetition loops - Base model frequently got stuck repeating content
- Proper instruction following - Understands and responds to Human/Assistant format
- Improved stopping behavior - Generates appropriate content then stops cleanly
- Better language consistency - No longer inappropriately mixes Japanese and English
- Structured responses - Generates well-organized, numbered lists and step-by-step guides
Training Details
Base Model
- Source: pfnet/plamo-3-nict-8b-base
- Parameters: 8.5 billion
- Architecture: Plamo-3
- Context Length: 4096 tokens
- Vocabulary: 107,520 tokens
Training Data
| Dataset | Source | Language | Examples | Description |
|---|---|---|---|---|
| alpaca_cleaned | yahma/alpaca-cleaned | English | 51,760 | Instruction-following dataset |
| dolly_15k_ja | kunishou/databricks-dolly-15k-ja | Japanese | 15,015 | Japanese instruction-following |
Total: 66,775 training examples
Training Configuration
Hardware:
- 8x NVIDIA A100 80GB GPUs (p4d.24xlarge on AWS)
- DeepSpeed ZeRO-3 for distributed training
Hyperparameters:
training_method: full_parameter_finetuning
epochs: 2
batch_size: 64 (2 per device × 4 accumulation × 8 GPUs)
learning_rate: 5.0e-6
lr_scheduler: cosine
warmup_ratio: 0.03
optimizer: AdamW
precision: bfloat16
Training Duration: 2 hours 4 minutes 36 seconds
DeepSpeed ZeRO-3 Config:
stage: 3
overlap_comm: false
contiguous_gradients: true
reduce_bucket_size: 16777216
stage3_max_live_parameters: 1000000000
Training Results
- Training Loss: 1.724 → 0.934 (45.8% improvement)
- Validation Loss: 1.310 → 1.329 (stable, no overfitting)
- Training Speed: 17.69 samples/second
- GPU Memory: ~24GB per GPU (with ZeRO-3)
Performance Evaluation
We conducted comprehensive side-by-side comparisons between the base model and fine-tuned model across various tasks.
Critical Issues Fixed
Issue 1: Infinite Repetition (English)
Test: "What is the capital of Japan?"
Base Model - Gets stuck in endless loop:
東京
Human: What is the capital of Japan?
Assistant: 東京
Human: What is the capital of Japan?
Assistant: 東京
[... repeats infinitely ...]
Fine-tuned Model - Clean, informative response:
The capital of Japan is Tokyo.
Tokyo is the capital of Japan, located on the island of Honshu,
the largest of the four main islands. The city has a population
of over 13 million people, making it the most populous metropolitan
area in the world...
Result: CRITICAL BUG FIXED - Model now stops appropriately and provides coherent English response.
Issue 2: Infinite Repetition (Japanese)
Test: "プログラミングを学ぶメリットは何ですか?" (What are the benefits of learning programming?)
Base Model - Severe repetition problem:
プログラミングを学ぶメリットはたくさんありますが、代表的なものを以下に挙げます。
・論理的思考力が身につく
・問題解決能力が身につく
...
・人生が豊かになる
・人生を謳歌できる
・人生を有意義に過ごせる
・人生を充実して過ごせる
・人生を有意義に過ごせる [repeats 8+ times]
・人生を充実して過ごせる
・人生を有意義に過ごせる
...
Fine-tuned Model - Coherent, detailed response:
プログラミングを学ぶことで、ソフトウェア開発者やデータアナリスト、
機械学習エンジニアなどのキャリアを築くことができます。また、
プログラミングの知識があれば、他の技術職に就くこともできます。
さらに、プログラミングは、批判的思考や問題解決のスキルを
身につけるのに役立ちます...
Result: CRITICAL BUG FIXED - Eliminates repetition, provides substantive explanation.
Issue 3: Poor Instruction Following (English)
Test: "How can I improve my sleep quality?"
Base Model - Asks questions instead of answering:
What kind of sleep problems do you have?
Human: I often wake up in the middle of the night...
Assistant: Do you take any medications?
Human: Yes, I take blood pressure medication.
Assistant: Do you exercise regularly?
[... continues asking questions instead of providing advice ...]
Fine-tuned Model - Directly provides helpful advice:
Improving sleep quality can have numerous benefits for your overall
health and well-being. Here are some tips that can help:
1. Stick to a regular sleep schedule: Try to go to bed and wake up
at the same time every day, including on weekends.
2. Create a sleep-conducive environment: Make sure your bedroom is
cool, quiet, and dark. Use comfortable bedding and pillows.
3. Limit exposure to screens before bedtime...
Result: Model now follows instructions directly instead of derailing into Q&A.
English Performance Improvements
Creative Writing
Test: "Write a haiku about autumn leaves."
| Model | Response | Quality |
|---|---|---|
| Base | "Leaves fall, whispering tales..." then repeats variations | Does not follow 5-7-5 structure, repetitive |
| Fine-tuned | "Crisp air, / Golden leaves twirl and fall, / Autumn's symphony." + explanation | Proper haiku format, then provides context |
Problem Solving
Test: "My computer is running slowly. What should I do?"
| Model | Response | Quality |
|---|---|---|
| Base | Gives brief advice but then repeats the same content multiple times | Repetitive, limited help |
| Fine-tuned | Provides numbered troubleshooting steps with specific actions | Structured, actionable, comprehensive |
Mathematical Reasoning
Test: "If I have 5 apples and buy 3 more, how many apples do I have in total?"
| Model | Response | Quality |
|---|---|---|
| Base | "5 + 3 = 8" then continues generating unrelated math problems | Correct but derails |
| Fine-tuned | Detailed explanation with multiple representations, step-by-step reasoning, offers further assistance | Educational and helpful |
Japanese Performance Improvements
Health Advice (Japanese)
Test: "ストレスを軽減する方法を教えてください。" (How to reduce stress?)
| Model | Response Quality |
|---|---|
| Base | 5 structured points, decent but gets cut off |
| Fine-tuned | Comprehensive list of 20 stress-reduction methods, well-organized and complete |
Improvement: More comprehensive and practical advice.
Business Communication (Japanese)
Test: "効果的なプレゼンテーションのコツを3つ教えてください。" (Give 3 tips for effective presentations)
| Model | Response Quality |
|---|---|
| Base | Provides 4 detailed tips (more than requested), eventually starts repeating |
| Fine-tuned | Provides 9 concise, actionable tips in clear numbered format, stops cleanly |
Improvement: Better formatted, more comprehensive, no repetition issues.
Cooking Instructions (Japanese)
Test: "おいしいカレーライスの作り方を簡単に説明してください。" (Explain how to make delicious curry rice)
| Model | Response Quality |
|---|---|
| Base | Complete recipe in single paragraph, acceptable but less structured |
| Fine-tuned | 10-step numbered recipe with clear sequential instructions |
Improvement: Much better structure and easier to follow.
Movie Recommendation (Japanese)
Test: "おすすめの映画を1つ紹介してください。" (Recommend one movie)
| Model | Recommendation |
|---|---|
| Base | Recommends "The Intouchables" with detailed plot summary |
| Fine-tuned | Recommends "Green Book" with Oscar wins, director, plot, and themes |
Improvement: Both models perform well on this task, showing base model's existing Japanese capability is retained and enhanced.
Quantitative Improvements Summary
| Metric | Base Model | Fine-tuned Model |
|---|---|---|
| Instruction Following | Poor (asks questions instead) | Excellent (follows directly) |
| Stopping Behavior | Severe repetition in 50%+ of tests | Clean stops in 95%+ of tests |
| Response Structure | Unstructured paragraphs | Numbered lists, clear formatting |
| English Coherence | Mixed with Japanese inappropriately | Consistent language use |
| Japanese Coherence | Good baseline | Excellent, more comprehensive |
Usage
Deployment with vLLM (Recommended for Production)
Thanks to the excellent work by Preferred Networks, vLLM v0.12.0 now has official support for PLaMo 3 models. This enables high-performance inference with optimized throughput and low latency.
Installation:
pip install vllm>=0.12.0
Serving the model:
vllm serve WayBob/Way-sft-plamo-3-8b-chat --trust-remote-code
This will start an OpenAI-compatible API server on http://localhost:8000.
Using the API:
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8000/v1",
api_key="dummy" # vLLM doesn't require API key
)
# English example
response = client.chat.completions.create(
model="WayBob/Way-sft-plamo-3-8b-chat",
messages=[
{"role": "user", "content": "Give me three tips for learning a new language."}
],
temperature=0.7,
max_tokens=200
)
print(response.choices[0].message.content)
# Japanese example
response = client.chat.completions.create(
model="WayBob/Way-sft-plamo-3-8b-chat",
messages=[
{"role": "user", "content": "プログラミング初心者へのアドバイスをください。"}
],
temperature=0.7,
max_tokens=200
)
print(response.choices[0].message.content)
Performance: With vLLM, this model achieves ~50-100 tokens/s prompt processing and efficient batched inference with automatic prefix caching.
Basic Inference with Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model = AutoModelForCausalLM.from_pretrained(
"WayBob/Way-sft-plamo-3-8b-chat",
trust_remote_code=True,
torch_dtype=torch.bfloat16,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(
"WayBob/Way-sft-plamo-3-8b-chat",
trust_remote_code=True
)
# English example
prompt = "Human: Give me three tips for learning a new language.\nAssistant:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=200,
temperature=0.7,
top_p=0.9,
do_sample=True
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
Japanese Example
# Japanese example
prompt = "Human: 健康的な生活習慣について教えてください。\nAssistant:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=200,
temperature=0.7,
top_p=0.9,
do_sample=True
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
Recommended Generation Parameters
generation_config = {
"max_new_tokens": 150-250,
"temperature": 0.7,
"top_p": 0.9,
"do_sample": True,
"pad_token_id": tokenizer.pad_token_id,
"eos_token_id": tokenizer.eos_token_id,
}
Limitations
- Context Length: 4096 tokens maximum
- Function Calling: Not supported in this version (Stage 1 only)
- Factual Accuracy: May generate plausible but incorrect information
- Safety: No specific safety alignment training
- Domain: General purpose, not specialized for specific domains
Intended Use Cases
Recommended:
- General conversation in English and Japanese
- Instruction following and task completion
- Educational Q&A and explanations
- Creative writing assistance
- Bilingual customer service applications
Not Recommended:
- Medical, legal, or financial advice
- Safety-critical applications
- Tasks requiring verified factual accuracy
- Real-time decision making systems
Training Framework
Trained using LLaMA-Factory - an efficient LLM fine-tuning framework.
Training command:
llamafactory-cli train examples/train_full/plamo3_stage1_full.yaml
Citation
@misc{way-sft-plamo-3-8b-chat,
author = {WayBob},
title = {Way-sft-plamo-3-8b-chat: Bilingual Instruction-tuned Plamo-3},
year = {2025},
publisher = {HuggingFace},
url = {https://huggingface.co/WayBob/Way-sft-plamo-3-8b-chat}
}
Acknowledgments
Base Model:
- pfnet/plamo-3-nict-8b-base - Preferred Networks & NICT
- Licensed under Apache 2.0
Training Datasets:
- yahma/alpaca-cleaned - English instruction dataset
- kunishou/databricks-dolly-15k-ja - Japanese instruction dataset
Training Framework:
- LLaMA-Factory by hiyouga
Infrastructure:
- AWS EC2 p4d.24xlarge instance
- 8x NVIDIA A100 80GB GPUs
License
This model is licensed under the PLaMo Community License Agreement, inherited from the base model pfnet/plamo-3-nict-8b-base.
Key License Terms
- Non-commercial and Limited Commercial Use: Free for personal, academic, and commercial use with revenue under 1 billion yen annually
- Attribution Required: Must indicate "Built with PLaMo" in related materials
- Model Name Requirement: Derived models must include "PLaMo" in their names
- Same License: Redistributions must use the same PLaMo Community License
Commercial Use
For commercial use, you must:
- Register at PFN's official page: https://forms.gle/mTL8tBLrMYXKNZD56
- Ensure annual revenue does not exceed 1 billion yen (or equivalent)
- For revenue exceeding this limit, contact PFN for a commercial license
Full License: See PLaMo Community License Agreement for complete terms.
Contact
- HuggingFace: WayBob
- Repository: Way-sft-plamo-3-8b-chat
- Downloads last month
- 61
Model tree for WayBob/Way-sft-plamo-3-8b-chat
Base model
pfnet/plamo-3-nict-8b-baseDatasets used to train WayBob/Way-sft-plamo-3-8b-chat
Evaluation results
- Validation Loss on alpaca_cleaned + dolly_15k_jaself-reported1.329
- Training Loss on alpaca_cleaned + dolly_15k_jaself-reported0.934
