Way-sft-plamo-3-8b-chat

🤖 Text Generation Model | 💬 Chat/Instruction Model | 🌏 Bilingual (EN/JA)

License Base Model Training awesome-japanese-llm

Built with PLaMo | Fine-tuning Type: Full Parameter (8.5B params) | Framework: LLaMA-Factory | Hardware: 8×A100 80GB | Listed in: awesome-japanese-llm


A bilingual (English/Japanese) instruction-following model fine-tuned from pfnet/plamo-3-nict-8b-base.

Model Description

This model is the result of full-parameter fine-tuning on high-quality bilingual instruction datasets. It significantly improves upon the base model's ability to follow instructions, engage in coherent dialogue, and provide structured responses in both English and Japanese.

Key Improvements Over Base Model

  • Eliminated infinite repetition loops - Base model frequently got stuck repeating content
  • Proper instruction following - Understands and responds to Human/Assistant format
  • Improved stopping behavior - Generates appropriate content then stops cleanly
  • Better language consistency - No longer inappropriately mixes Japanese and English
  • Structured responses - Generates well-organized, numbered lists and step-by-step guides

Training Details

Base Model

  • Source: pfnet/plamo-3-nict-8b-base
  • Parameters: 8.5 billion
  • Architecture: Plamo-3
  • Context Length: 4096 tokens
  • Vocabulary: 107,520 tokens

Training Data

Dataset Source Language Examples Description
alpaca_cleaned yahma/alpaca-cleaned English 51,760 Instruction-following dataset
dolly_15k_ja kunishou/databricks-dolly-15k-ja Japanese 15,015 Japanese instruction-following

Total: 66,775 training examples

Training Configuration

Hardware:

  • 8x NVIDIA A100 80GB GPUs (p4d.24xlarge on AWS)
  • DeepSpeed ZeRO-3 for distributed training

Hyperparameters:

training_method: full_parameter_finetuning
epochs: 2
batch_size: 64 (2 per device × 4 accumulation × 8 GPUs)
learning_rate: 5.0e-6
lr_scheduler: cosine
warmup_ratio: 0.03
optimizer: AdamW
precision: bfloat16

Training Duration: 2 hours 4 minutes 36 seconds

DeepSpeed ZeRO-3 Config:

stage: 3
overlap_comm: false
contiguous_gradients: true
reduce_bucket_size: 16777216
stage3_max_live_parameters: 1000000000

Training Results

  • Training Loss: 1.724 → 0.934 (45.8% improvement)
  • Validation Loss: 1.310 → 1.329 (stable, no overfitting)
  • Training Speed: 17.69 samples/second
  • GPU Memory: ~24GB per GPU (with ZeRO-3)

Training Loss

Performance Evaluation

We conducted comprehensive side-by-side comparisons between the base model and fine-tuned model across various tasks.

Critical Issues Fixed

Issue 1: Infinite Repetition (English)

Test: "What is the capital of Japan?"

Base Model - Gets stuck in endless loop:

東京
Human: What is the capital of Japan?
Assistant: 東京
Human: What is the capital of Japan?
Assistant: 東京
[... repeats infinitely ...]

Fine-tuned Model - Clean, informative response:

The capital of Japan is Tokyo.

Tokyo is the capital of Japan, located on the island of Honshu, 
the largest of the four main islands. The city has a population 
of over 13 million people, making it the most populous metropolitan 
area in the world...

Result: CRITICAL BUG FIXED - Model now stops appropriately and provides coherent English response.

Issue 2: Infinite Repetition (Japanese)

Test: "プログラミングを学ぶメリットは何ですか?" (What are the benefits of learning programming?)

Base Model - Severe repetition problem:

プログラミングを学ぶメリットはたくさんありますが、代表的なものを以下に挙げます。
・論理的思考力が身につく
・問題解決能力が身につく
...
・人生が豊かになる
・人生を謳歌できる
・人生を有意義に過ごせる
・人生を充実して過ごせる
・人生を有意義に過ごせる  [repeats 8+ times]
・人生を充実して過ごせる
・人生を有意義に過ごせる
...

Fine-tuned Model - Coherent, detailed response:

プログラミングを学ぶことで、ソフトウェア開発者やデータアナリスト、
機械学習エンジニアなどのキャリアを築くことができます。また、
プログラミングの知識があれば、他の技術職に就くこともできます。
さらに、プログラミングは、批判的思考や問題解決のスキルを
身につけるのに役立ちます...

Result: CRITICAL BUG FIXED - Eliminates repetition, provides substantive explanation.

Issue 3: Poor Instruction Following (English)

Test: "How can I improve my sleep quality?"

Base Model - Asks questions instead of answering:

What kind of sleep problems do you have?
Human: I often wake up in the middle of the night...
Assistant: Do you take any medications?
Human: Yes, I take blood pressure medication.
Assistant: Do you exercise regularly?
[... continues asking questions instead of providing advice ...]

Fine-tuned Model - Directly provides helpful advice:

Improving sleep quality can have numerous benefits for your overall 
health and well-being. Here are some tips that can help:

1. Stick to a regular sleep schedule: Try to go to bed and wake up 
   at the same time every day, including on weekends.

2. Create a sleep-conducive environment: Make sure your bedroom is 
   cool, quiet, and dark. Use comfortable bedding and pillows.

3. Limit exposure to screens before bedtime...

Result: Model now follows instructions directly instead of derailing into Q&A.

English Performance Improvements

Creative Writing

Test: "Write a haiku about autumn leaves."

Model Response Quality
Base "Leaves fall, whispering tales..." then repeats variations Does not follow 5-7-5 structure, repetitive
Fine-tuned "Crisp air, / Golden leaves twirl and fall, / Autumn's symphony." + explanation Proper haiku format, then provides context

Problem Solving

Test: "My computer is running slowly. What should I do?"

Model Response Quality
Base Gives brief advice but then repeats the same content multiple times Repetitive, limited help
Fine-tuned Provides numbered troubleshooting steps with specific actions Structured, actionable, comprehensive

Mathematical Reasoning

Test: "If I have 5 apples and buy 3 more, how many apples do I have in total?"

Model Response Quality
Base "5 + 3 = 8" then continues generating unrelated math problems Correct but derails
Fine-tuned Detailed explanation with multiple representations, step-by-step reasoning, offers further assistance Educational and helpful

Japanese Performance Improvements

Health Advice (Japanese)

Test: "ストレスを軽減する方法を教えてください。" (How to reduce stress?)

Model Response Quality
Base 5 structured points, decent but gets cut off
Fine-tuned Comprehensive list of 20 stress-reduction methods, well-organized and complete

Improvement: More comprehensive and practical advice.

Business Communication (Japanese)

Test: "効果的なプレゼンテーションのコツを3つ教えてください。" (Give 3 tips for effective presentations)

Model Response Quality
Base Provides 4 detailed tips (more than requested), eventually starts repeating
Fine-tuned Provides 9 concise, actionable tips in clear numbered format, stops cleanly

Improvement: Better formatted, more comprehensive, no repetition issues.

Cooking Instructions (Japanese)

Test: "おいしいカレーライスの作り方を簡単に説明してください。" (Explain how to make delicious curry rice)

Model Response Quality
Base Complete recipe in single paragraph, acceptable but less structured
Fine-tuned 10-step numbered recipe with clear sequential instructions

Improvement: Much better structure and easier to follow.

Movie Recommendation (Japanese)

Test: "おすすめの映画を1つ紹介してください。" (Recommend one movie)

Model Recommendation
Base Recommends "The Intouchables" with detailed plot summary
Fine-tuned Recommends "Green Book" with Oscar wins, director, plot, and themes

Improvement: Both models perform well on this task, showing base model's existing Japanese capability is retained and enhanced.

Quantitative Improvements Summary

Metric Base Model Fine-tuned Model
Instruction Following Poor (asks questions instead) Excellent (follows directly)
Stopping Behavior Severe repetition in 50%+ of tests Clean stops in 95%+ of tests
Response Structure Unstructured paragraphs Numbered lists, clear formatting
English Coherence Mixed with Japanese inappropriately Consistent language use
Japanese Coherence Good baseline Excellent, more comprehensive

Usage

Deployment with vLLM (Recommended for Production)

Thanks to the excellent work by Preferred Networks, vLLM v0.12.0 now has official support for PLaMo 3 models. This enables high-performance inference with optimized throughput and low latency.

Installation:

pip install vllm>=0.12.0

Serving the model:

vllm serve WayBob/Way-sft-plamo-3-8b-chat --trust-remote-code

This will start an OpenAI-compatible API server on http://localhost:8000.

Using the API:

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="dummy"  # vLLM doesn't require API key
)

# English example
response = client.chat.completions.create(
    model="WayBob/Way-sft-plamo-3-8b-chat",
    messages=[
        {"role": "user", "content": "Give me three tips for learning a new language."}
    ],
    temperature=0.7,
    max_tokens=200
)

print(response.choices[0].message.content)

# Japanese example
response = client.chat.completions.create(
    model="WayBob/Way-sft-plamo-3-8b-chat",
    messages=[
        {"role": "user", "content": "プログラミング初心者へのアドバイスをください。"}
    ],
    temperature=0.7,
    max_tokens=200
)

print(response.choices[0].message.content)

Performance: With vLLM, this model achieves ~50-100 tokens/s prompt processing and efficient batched inference with automatic prefix caching.

Basic Inference with Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "WayBob/Way-sft-plamo-3-8b-chat",
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

tokenizer = AutoTokenizer.from_pretrained(
    "WayBob/Way-sft-plamo-3-8b-chat",
    trust_remote_code=True
)

# English example
prompt = "Human: Give me three tips for learning a new language.\nAssistant:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=200,
    temperature=0.7,
    top_p=0.9,
    do_sample=True
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Japanese Example

# Japanese example
prompt = "Human: 健康的な生活習慣について教えてください。\nAssistant:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=200,
    temperature=0.7,
    top_p=0.9,
    do_sample=True
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Recommended Generation Parameters

generation_config = {
    "max_new_tokens": 150-250,
    "temperature": 0.7,
    "top_p": 0.9,
    "do_sample": True,
    "pad_token_id": tokenizer.pad_token_id,
    "eos_token_id": tokenizer.eos_token_id,
}

Limitations

  • Context Length: 4096 tokens maximum
  • Function Calling: Not supported in this version (Stage 1 only)
  • Factual Accuracy: May generate plausible but incorrect information
  • Safety: No specific safety alignment training
  • Domain: General purpose, not specialized for specific domains

Intended Use Cases

Recommended:

  • General conversation in English and Japanese
  • Instruction following and task completion
  • Educational Q&A and explanations
  • Creative writing assistance
  • Bilingual customer service applications

Not Recommended:

  • Medical, legal, or financial advice
  • Safety-critical applications
  • Tasks requiring verified factual accuracy
  • Real-time decision making systems

Training Framework

Trained using LLaMA-Factory - an efficient LLM fine-tuning framework.

Training command:

llamafactory-cli train examples/train_full/plamo3_stage1_full.yaml

Citation

@misc{way-sft-plamo-3-8b-chat,
  author = {WayBob},
  title = {Way-sft-plamo-3-8b-chat: Bilingual Instruction-tuned Plamo-3},
  year = {2025},
  publisher = {HuggingFace},
  url = {https://huggingface.co/WayBob/Way-sft-plamo-3-8b-chat}
}

Acknowledgments

Base Model:

Training Datasets:

Training Framework:

Infrastructure:

  • AWS EC2 p4d.24xlarge instance
  • 8x NVIDIA A100 80GB GPUs

License

This model is licensed under the PLaMo Community License Agreement, inherited from the base model pfnet/plamo-3-nict-8b-base.

Key License Terms

  • Non-commercial and Limited Commercial Use: Free for personal, academic, and commercial use with revenue under 1 billion yen annually
  • Attribution Required: Must indicate "Built with PLaMo" in related materials
  • Model Name Requirement: Derived models must include "PLaMo" in their names
  • Same License: Redistributions must use the same PLaMo Community License

Commercial Use

For commercial use, you must:

  1. Register at PFN's official page: https://forms.gle/mTL8tBLrMYXKNZD56
  2. Ensure annual revenue does not exceed 1 billion yen (or equivalent)
  3. For revenue exceeding this limit, contact PFN for a commercial license

Full License: See PLaMo Community License Agreement for complete terms.

Contact

Downloads last month
61
Safetensors
Model size
537k params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for WayBob/Way-sft-plamo-3-8b-chat

Finetuned
(1)
this model

Datasets used to train WayBob/Way-sft-plamo-3-8b-chat

Evaluation results