Instructions to use WayBob/Way-sft-plamo-3-8b-chat with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use WayBob/Way-sft-plamo-3-8b-chat with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="WayBob/Way-sft-plamo-3-8b-chat", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("WayBob/Way-sft-plamo-3-8b-chat", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use WayBob/Way-sft-plamo-3-8b-chat with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "WayBob/Way-sft-plamo-3-8b-chat"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "WayBob/Way-sft-plamo-3-8b-chat",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/WayBob/Way-sft-plamo-3-8b-chat

SGLang

How to use WayBob/Way-sft-plamo-3-8b-chat with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "WayBob/Way-sft-plamo-3-8b-chat" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "WayBob/Way-sft-plamo-3-8b-chat",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "WayBob/Way-sft-plamo-3-8b-chat" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "WayBob/Way-sft-plamo-3-8b-chat",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use WayBob/Way-sft-plamo-3-8b-chat with Docker Model Runner:
```
docker model run hf.co/WayBob/Way-sft-plamo-3-8b-chat
```

Way-sft-plamo-3-8b-chat

🤖 Text Generation Model | 💬 Chat/Instruction Model | 🌏 Bilingual (EN/JA)

Built with PLaMo | Fine-tuning Type: Full Parameter (8.5B params) | Framework: LLaMA-Factory | Hardware: 8×A100 80GB | Listed in: awesome-japanese-llm

A bilingual (English/Japanese) instruction-following model fine-tuned from pfnet/plamo-3-nict-8b-base.

Model Description

This model is the result of full-parameter fine-tuning on high-quality bilingual instruction datasets. It significantly improves upon the base model's ability to follow instructions, engage in coherent dialogue, and provide structured responses in both English and Japanese.

Key Improvements Over Base Model

Eliminated infinite repetition loops - Base model frequently got stuck repeating content
Proper instruction following - Understands and responds to Human/Assistant format
Improved stopping behavior - Generates appropriate content then stops cleanly
Better language consistency - No longer inappropriately mixes Japanese and English
Structured responses - Generates well-organized, numbered lists and step-by-step guides

Training Details

Base Model

Source: pfnet/plamo-3-nict-8b-base
Parameters: 8.5 billion
Architecture: Plamo-3
Context Length: 4096 tokens
Vocabulary: 107,520 tokens

Training Data

Dataset	Source	Language	Examples	Description
alpaca_cleaned	yahma/alpaca-cleaned	English	51,760	Instruction-following dataset
dolly_15k_ja	kunishou/databricks-dolly-15k-ja	Japanese	15,015	Japanese instruction-following

Total: 66,775 training examples

Training Configuration

Hardware:

8x NVIDIA A100 80GB GPUs (p4d.24xlarge on AWS)
DeepSpeed ZeRO-3 for distributed training

Hyperparameters:

training_method: full_parameter_finetuning
epochs: 2
batch_size: 64 (2 per device × 4 accumulation × 8 GPUs)
learning_rate: 5.0e-6
lr_scheduler: cosine
warmup_ratio: 0.03
optimizer: AdamW
precision: bfloat16

Training Duration: 2 hours 4 minutes 36 seconds

DeepSpeed ZeRO-3 Config:

stage: 3
overlap_comm: false
contiguous_gradients: true
reduce_bucket_size: 16777216
stage3_max_live_parameters: 1000000000

Training Results

Training Loss: 1.724 → 0.934 (45.8% improvement)
Validation Loss: 1.310 → 1.329 (stable, no overfitting)
Training Speed: 17.69 samples/second
GPU Memory: ~24GB per GPU (with ZeRO-3)

Performance Evaluation

We conducted comprehensive side-by-side comparisons between the base model and fine-tuned model across various tasks.

Critical Issues Fixed

Issue 1: Infinite Repetition (English)

Test: "What is the capital of Japan?"

Base Model - Gets stuck in endless loop:

東京
Human: What is the capital of Japan?
Assistant: 東京
Human: What is the capital of Japan?
Assistant: 東京
[... repeats infinitely ...]

Fine-tuned Model - Clean, informative response:

The capital of Japan is Tokyo.

Tokyo is the capital of Japan, located on the island of Honshu, 
the largest of the four main islands. The city has a population 
of over 13 million people, making it the most populous metropolitan 
area in the world...

Result: CRITICAL BUG FIXED - Model now stops appropriately and provides coherent English response.

Issue 2: Infinite Repetition (Japanese)

Test: "プログラミングを学ぶメリットは何ですか？" (What are the benefits of learning programming?)

Base Model - Severe repetition problem:

プログラミングを学ぶメリットはたくさんありますが、代表的なものを以下に挙げます。
・論理的思考力が身につく
・問題解決能力が身につく
...
・人生が豊かになる
・人生を謳歌できる
・人生を有意義に過ごせる
・人生を充実して過ごせる
・人生を有意義に過ごせる  [repeats 8+ times]
・人生を充実して過ごせる
・人生を有意義に過ごせる
...

Fine-tuned Model - Coherent, detailed response:

プログラミングを学ぶことで、ソフトウェア開発者やデータアナリスト、
機械学習エンジニアなどのキャリアを築くことができます。また、
プログラミングの知識があれば、他の技術職に就くこともできます。
さらに、プログラミングは、批判的思考や問題解決のスキルを
身につけるのに役立ちます...

Result: CRITICAL BUG FIXED - Eliminates repetition, provides substantive explanation.

Issue 3: Poor Instruction Following (English)

Test: "How can I improve my sleep quality?"

Base Model - Asks questions instead of answering:

What kind of sleep problems do you have?
Human: I often wake up in the middle of the night...
Assistant: Do you take any medications?
Human: Yes, I take blood pressure medication.
Assistant: Do you exercise regularly?
[... continues asking questions instead of providing advice ...]

Fine-tuned Model - Directly provides helpful advice:

Improving sleep quality can have numerous benefits for your overall 
health and well-being. Here are some tips that can help:

1. Stick to a regular sleep schedule: Try to go to bed and wake up 
   at the same time every day, including on weekends.

2. Create a sleep-conducive environment: Make sure your bedroom is 
   cool, quiet, and dark. Use comfortable bedding and pillows.

3. Limit exposure to screens before bedtime...

Result: Model now follows instructions directly instead of derailing into Q&A.

English Performance Improvements

Creative Writing

Test: "Write a haiku about autumn leaves."

Model	Response	Quality
Base	"Leaves fall, whispering tales..." then repeats variations	Does not follow 5-7-5 structure, repetitive
Fine-tuned	"Crisp air, / Golden leaves twirl and fall, / Autumn's symphony." + explanation	Proper haiku format, then provides context

Problem Solving

Test: "My computer is running slowly. What should I do?"

Model	Response	Quality
Base	Gives brief advice but then repeats the same content multiple times	Repetitive, limited help
Fine-tuned	Provides numbered troubleshooting steps with specific actions	Structured, actionable, comprehensive

Mathematical Reasoning

Test: "If I have 5 apples and buy 3 more, how many apples do I have in total?"

Model	Response	Quality
Base	"5 + 3 = 8" then continues generating unrelated math problems	Correct but derails
Fine-tuned	Detailed explanation with multiple representations, step-by-step reasoning, offers further assistance	Educational and helpful

Japanese Performance Improvements

Health Advice (Japanese)

Test: "ストレスを軽減する方法を教えてください。" (How to reduce stress?)

Model	Response Quality
Base	5 structured points, decent but gets cut off
Fine-tuned	Comprehensive list of 20 stress-reduction methods, well-organized and complete

Improvement: More comprehensive and practical advice.

Business Communication (Japanese)

Test: "効果的なプレゼンテーションのコツを3つ教えてください。" (Give 3 tips for effective presentations)

Model	Response Quality
Base	Provides 4 detailed tips (more than requested), eventually starts repeating
Fine-tuned	Provides 9 concise, actionable tips in clear numbered format, stops cleanly

Improvement: Better formatted, more comprehensive, no repetition issues.

Cooking Instructions (Japanese)

Test: "おいしいカレーライスの作り方を簡単に説明してください。" (Explain how to make delicious curry rice)

Model	Response Quality
Base	Complete recipe in single paragraph, acceptable but less structured
Fine-tuned	10-step numbered recipe with clear sequential instructions

Improvement: Much better structure and easier to follow.

Movie Recommendation (Japanese)

Test: "おすすめの映画を1つ紹介してください。" (Recommend one movie)

Model	Recommendation
Base	Recommends "The Intouchables" with detailed plot summary
Fine-tuned	Recommends "Green Book" with Oscar wins, director, plot, and themes

Improvement: Both models perform well on this task, showing base model's existing Japanese capability is retained and enhanced.

Quantitative Improvements Summary

Metric	Base Model	Fine-tuned Model
Instruction Following	Poor (asks questions instead)	Excellent (follows directly)
Stopping Behavior	Severe repetition in 50%+ of tests	Clean stops in 95%+ of tests
Response Structure	Unstructured paragraphs	Numbered lists, clear formatting
English Coherence	Mixed with Japanese inappropriately	Consistent language use
Japanese Coherence	Good baseline	Excellent, more comprehensive

Usage

Deployment with vLLM (Recommended for Production)

Thanks to the excellent work by Preferred Networks, vLLM v0.12.0 now has official support for PLaMo 3 models. This enables high-performance inference with optimized throughput and low latency.

Installation:

pip install vllm>=0.12.0

Serving the model:

vllm serve WayBob/Way-sft-plamo-3-8b-chat --trust-remote-code

This will start an OpenAI-compatible API server on http://localhost:8000.

Using the API:

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="dummy"  # vLLM doesn't require API key
)

# English example
response = client.chat.completions.create(
    model="WayBob/Way-sft-plamo-3-8b-chat",
    messages=[
        {"role": "user", "content": "Give me three tips for learning a new language."}
    ],
    temperature=0.7,
    max_tokens=200
)

print(response.choices[0].message.content)

# Japanese example
response = client.chat.completions.create(
    model="WayBob/Way-sft-plamo-3-8b-chat",
    messages=[
        {"role": "user", "content": "プログラミング初心者へのアドバイスをください。"}
    ],
    temperature=0.7,
    max_tokens=200
)

print(response.choices[0].message.content)

Performance: With vLLM, this model achieves ~50-100 tokens/s prompt processing and efficient batched inference with automatic prefix caching.

Basic Inference with Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "WayBob/Way-sft-plamo-3-8b-chat",
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

tokenizer = AutoTokenizer.from_pretrained(
    "WayBob/Way-sft-plamo-3-8b-chat",
    trust_remote_code=True
)

# English example
prompt = "Human: Give me three tips for learning a new language.\nAssistant:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=200,
    temperature=0.7,
    top_p=0.9,
    do_sample=True
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Japanese Example

# Japanese example
prompt = "Human: 健康的な生活習慣について教えてください。\nAssistant:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=200,
    temperature=0.7,
    top_p=0.9,
    do_sample=True
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Recommended Generation Parameters

generation_config = {
    "max_new_tokens": 150-250,
    "temperature": 0.7,
    "top_p": 0.9,
    "do_sample": True,
    "pad_token_id": tokenizer.pad_token_id,
    "eos_token_id": tokenizer.eos_token_id,
}

Limitations

Context Length: 4096 tokens maximum
Function Calling: Not supported in this version (Stage 1 only)
Factual Accuracy: May generate plausible but incorrect information
Safety: No specific safety alignment training
Domain: General purpose, not specialized for specific domains

Intended Use Cases

Recommended:

General conversation in English and Japanese
Instruction following and task completion
Educational Q&A and explanations
Creative writing assistance
Bilingual customer service applications

Not Recommended:

Medical, legal, or financial advice
Safety-critical applications
Tasks requiring verified factual accuracy
Real-time decision making systems

Training Framework

Trained using LLaMA-Factory - an efficient LLM fine-tuning framework.

Training command:

llamafactory-cli train examples/train_full/plamo3_stage1_full.yaml

Citation

@misc{way-sft-plamo-3-8b-chat,
  author = {WayBob},
  title = {Way-sft-plamo-3-8b-chat: Bilingual Instruction-tuned Plamo-3},
  year = {2025},
  publisher = {HuggingFace},
  url = {https://huggingface.co/WayBob/Way-sft-plamo-3-8b-chat}
}

Acknowledgments

Base Model:

pfnet/plamo-3-nict-8b-base - Preferred Networks & NICT
Licensed under Apache 2.0

Training Datasets:

yahma/alpaca-cleaned - English instruction dataset
kunishou/databricks-dolly-15k-ja - Japanese instruction dataset

Training Framework:

LLaMA-Factory by hiyouga

Infrastructure:

AWS EC2 p4d.24xlarge instance
8x NVIDIA A100 80GB GPUs

License

This model is licensed under the PLaMo Community License Agreement, inherited from the base model pfnet/plamo-3-nict-8b-base.

Key License Terms

Non-commercial and Limited Commercial Use: Free for personal, academic, and commercial use with revenue under 1 billion yen annually
Attribution Required: Must indicate "Built with PLaMo" in related materials
Model Name Requirement: Derived models must include "PLaMo" in their names
Same License: Redistributions must use the same PLaMo Community License

Commercial Use

For commercial use, you must:

Register at PFN's official page: https://forms.gle/mTL8tBLrMYXKNZD56
Ensure annual revenue does not exceed 1 billion yen (or equivalent)
For revenue exceeding this limit, contact PFN for a commercial license

Full License: See PLaMo Community License Agreement for complete terms.

Contact

HuggingFace: WayBob
Repository: Way-sft-plamo-3-8b-chat

Downloads last month: 8

Safetensors

Model size

537k params

Tensor type

BF16

Model tree for WayBob/Way-sft-plamo-3-8b-chat

Base model

pfnet/plamo-3-nict-8b-base

Finetuned

(1)

this model

Datasets used to train WayBob/Way-sft-plamo-3-8b-chat

Evaluation results

Validation Loss on alpaca_cleaned + dolly_15k_ja
self-reported

1.329
Training Loss on alpaca_cleaned + dolly_15k_ja
self-reported

0.934