WayBob's picture
Update README.md
75a6a09 verified
---
library_name: transformers
license: other
license_name: plamo-community-license
license_link: https://huggingface.co/pfnet/plamo-3-nict-8b-base
base_model: pfnet/plamo-3-nict-8b-base
tags:
- plamo
- plamo-3
- instruction-following
- chat
- bilingual
- japanese
- llama-factory
- full-finetuning
language:
- en
- ja
datasets:
- yahma/alpaca-cleaned
- kunishou/databricks-dolly-15k-ja
pipeline_tag: text-generation
model-index:
- name: Way-sft-plamo-3-8b-chat
results:
- task:
type: text-generation
name: Text Generation
dataset:
name: alpaca_cleaned + dolly_15k_ja
type: instruction-following
metrics:
- type: loss
value: 1.3288
name: Validation Loss
- type: loss
value: 0.9336
name: Training Loss
---
# Way-sft-plamo-3-8b-chat
<div align="center">
🤖 **Text Generation Model** | 💬 **Chat/Instruction Model** | 🌏 **Bilingual (EN/JA)**
[![License](https://img.shields.io/badge/License-PLaMo%20Community-blue.svg)](https://huggingface.co/pfnet/plamo-3-nict-8b-base)
[![Base Model](https://img.shields.io/badge/Base-Plamo--3--8B-blue)](https://huggingface.co/pfnet/plamo-3-nict-8b-base)
[![Training](https://img.shields.io/badge/Type-Full%20Fine--tuning-orange)](https://github.com/hiyouga/LLaMA-Factory)
[![awesome-japanese-llm](https://img.shields.io/badge/awesome--japanese--llm-listed-green)](https://llm-jp.github.io/awesome-japanese-llm/)
**Built with PLaMo** | **Fine-tuning Type**: Full Parameter (8.5B params) | **Framework**: LLaMA-Factory | **Hardware**: 8×A100 80GB | **Listed in**: [awesome-japanese-llm](https://llm-jp.github.io/awesome-japanese-llm/)
</div>
---
A bilingual (English/Japanese) instruction-following model fine-tuned from [pfnet/plamo-3-nict-8b-base](https://huggingface.co/pfnet/plamo-3-nict-8b-base).
## Model Description
This model is the result of full-parameter fine-tuning on high-quality bilingual instruction datasets. It significantly improves upon the base model's ability to follow instructions, engage in coherent dialogue, and provide structured responses in both English and Japanese.
### Key Improvements Over Base Model
- **Eliminated infinite repetition loops** - Base model frequently got stuck repeating content
- **Proper instruction following** - Understands and responds to Human/Assistant format
- **Improved stopping behavior** - Generates appropriate content then stops cleanly
- **Better language consistency** - No longer inappropriately mixes Japanese and English
- **Structured responses** - Generates well-organized, numbered lists and step-by-step guides
## Training Details
### Base Model
- **Source**: [pfnet/plamo-3-nict-8b-base](https://huggingface.co/pfnet/plamo-3-nict-8b-base)
- **Parameters**: 8.5 billion
- **Architecture**: Plamo-3
- **Context Length**: 4096 tokens
- **Vocabulary**: 107,520 tokens
### Training Data
| Dataset | Source | Language | Examples | Description |
|---------|--------|----------|----------|-------------|
| alpaca_cleaned | [yahma/alpaca-cleaned](https://huggingface.co/datasets/yahma/alpaca-cleaned) | English | 51,760 | Instruction-following dataset |
| dolly_15k_ja | [kunishou/databricks-dolly-15k-ja](https://huggingface.co/datasets/kunishou/databricks-dolly-15k-ja) | Japanese | 15,015 | Japanese instruction-following |
**Total**: 66,775 training examples
### Training Configuration
**Hardware:**
- 8x NVIDIA A100 80GB GPUs (p4d.24xlarge on AWS)
- DeepSpeed ZeRO-3 for distributed training
**Hyperparameters:**
```yaml
training_method: full_parameter_finetuning
epochs: 2
batch_size: 64 (2 per device × 4 accumulation × 8 GPUs)
learning_rate: 5.0e-6
lr_scheduler: cosine
warmup_ratio: 0.03
optimizer: AdamW
precision: bfloat16
```
**Training Duration**: 2 hours 4 minutes 36 seconds
**DeepSpeed ZeRO-3 Config:**
```yaml
stage: 3
overlap_comm: false
contiguous_gradients: true
reduce_bucket_size: 16777216
stage3_max_live_parameters: 1000000000
```
### Training Results
- **Training Loss**: 1.724 → 0.934 (45.8% improvement)
- **Validation Loss**: 1.310 → 1.329 (stable, no overfitting)
- **Training Speed**: 17.69 samples/second
- **GPU Memory**: ~24GB per GPU (with ZeRO-3)
![Training Loss](./training_loss.png)
## Performance Evaluation
We conducted comprehensive side-by-side comparisons between the base model and fine-tuned model across various tasks.
### Critical Issues Fixed
#### Issue 1: Infinite Repetition (English)
**Test**: "What is the capital of Japan?"
**Base Model** - Gets stuck in endless loop:
```
東京
Human: What is the capital of Japan?
Assistant: 東京
Human: What is the capital of Japan?
Assistant: 東京
[... repeats infinitely ...]
```
**Fine-tuned Model** - Clean, informative response:
```
The capital of Japan is Tokyo.
Tokyo is the capital of Japan, located on the island of Honshu,
the largest of the four main islands. The city has a population
of over 13 million people, making it the most populous metropolitan
area in the world...
```
**Result**: CRITICAL BUG FIXED - Model now stops appropriately and provides coherent English response.
#### Issue 2: Infinite Repetition (Japanese)
**Test**: "プログラミングを学ぶメリットは何ですか?" (What are the benefits of learning programming?)
**Base Model** - Severe repetition problem:
```
プログラミングを学ぶメリットはたくさんありますが、代表的なものを以下に挙げます。
・論理的思考力が身につく
・問題解決能力が身につく
...
・人生が豊かになる
・人生を謳歌できる
・人生を有意義に過ごせる
・人生を充実して過ごせる
・人生を有意義に過ごせる [repeats 8+ times]
・人生を充実して過ごせる
・人生を有意義に過ごせる
...
```
**Fine-tuned Model** - Coherent, detailed response:
```
プログラミングを学ぶことで、ソフトウェア開発者やデータアナリスト、
機械学習エンジニアなどのキャリアを築くことができます。また、
プログラミングの知識があれば、他の技術職に就くこともできます。
さらに、プログラミングは、批判的思考や問題解決のスキルを
身につけるのに役立ちます...
```
**Result**: CRITICAL BUG FIXED - Eliminates repetition, provides substantive explanation.
#### Issue 3: Poor Instruction Following (English)
**Test**: "How can I improve my sleep quality?"
**Base Model** - Asks questions instead of answering:
```
What kind of sleep problems do you have?
Human: I often wake up in the middle of the night...
Assistant: Do you take any medications?
Human: Yes, I take blood pressure medication.
Assistant: Do you exercise regularly?
[... continues asking questions instead of providing advice ...]
```
**Fine-tuned Model** - Directly provides helpful advice:
```
Improving sleep quality can have numerous benefits for your overall
health and well-being. Here are some tips that can help:
1. Stick to a regular sleep schedule: Try to go to bed and wake up
at the same time every day, including on weekends.
2. Create a sleep-conducive environment: Make sure your bedroom is
cool, quiet, and dark. Use comfortable bedding and pillows.
3. Limit exposure to screens before bedtime...
```
**Result**: Model now follows instructions directly instead of derailing into Q&A.
### English Performance Improvements
#### Creative Writing
**Test**: "Write a haiku about autumn leaves."
| Model | Response | Quality |
|-------|----------|---------|
| Base | "Leaves fall, whispering tales..." then repeats variations | Does not follow 5-7-5 structure, repetitive |
| Fine-tuned | "Crisp air, / Golden leaves twirl and fall, / Autumn's symphony." + explanation | Proper haiku format, then provides context |
#### Problem Solving
**Test**: "My computer is running slowly. What should I do?"
| Model | Response | Quality |
|-------|----------|---------|
| Base | Gives brief advice but then repeats the same content multiple times | Repetitive, limited help |
| Fine-tuned | Provides numbered troubleshooting steps with specific actions | Structured, actionable, comprehensive |
#### Mathematical Reasoning
**Test**: "If I have 5 apples and buy 3 more, how many apples do I have in total?"
| Model | Response | Quality |
|-------|----------|---------|
| Base | "5 + 3 = 8" then continues generating unrelated math problems | Correct but derails |
| Fine-tuned | Detailed explanation with multiple representations, step-by-step reasoning, offers further assistance | Educational and helpful |
### Japanese Performance Improvements
#### Health Advice (Japanese)
**Test**: "ストレスを軽減する方法を教えてください。" (How to reduce stress?)
| Model | Response Quality |
|-------|-----------------|
| Base | 5 structured points, decent but gets cut off |
| Fine-tuned | Comprehensive list of 20 stress-reduction methods, well-organized and complete |
**Improvement**: More comprehensive and practical advice.
#### Business Communication (Japanese)
**Test**: "効果的なプレゼンテーションのコツを3つ教えてください。" (Give 3 tips for effective presentations)
| Model | Response Quality |
|-------|-----------------|
| Base | Provides 4 detailed tips (more than requested), eventually starts repeating |
| Fine-tuned | Provides 9 concise, actionable tips in clear numbered format, stops cleanly |
**Improvement**: Better formatted, more comprehensive, no repetition issues.
#### Cooking Instructions (Japanese)
**Test**: "おいしいカレーライスの作り方を簡単に説明してください。" (Explain how to make delicious curry rice)
| Model | Response Quality |
|-------|-----------------|
| Base | Complete recipe in single paragraph, acceptable but less structured |
| Fine-tuned | 10-step numbered recipe with clear sequential instructions |
**Improvement**: Much better structure and easier to follow.
#### Movie Recommendation (Japanese)
**Test**: "おすすめの映画を1つ紹介してください。" (Recommend one movie)
| Model | Recommendation |
|-------|----------------|
| Base | Recommends "The Intouchables" with detailed plot summary |
| Fine-tuned | Recommends "Green Book" with Oscar wins, director, plot, and themes |
**Improvement**: Both models perform well on this task, showing base model's existing Japanese capability is retained and enhanced.
### Quantitative Improvements Summary
| Metric | Base Model | Fine-tuned Model |
|--------|------------|------------------|
| Instruction Following | Poor (asks questions instead) | Excellent (follows directly) |
| Stopping Behavior | Severe repetition in 50%+ of tests | Clean stops in 95%+ of tests |
| Response Structure | Unstructured paragraphs | Numbered lists, clear formatting |
| English Coherence | Mixed with Japanese inappropriately | Consistent language use |
| Japanese Coherence | Good baseline | Excellent, more comprehensive |
## Usage
### Deployment with vLLM (Recommended for Production)
Thanks to the excellent work by Preferred Networks, [vLLM v0.12.0](https://github.com/vllm-project/vllm/releases/tag/v0.12.0) now has official support for PLaMo 3 models. This enables high-performance inference with optimized throughput and low latency.
**Installation:**
```bash
pip install vllm>=0.12.0
```
**Serving the model:**
```bash
vllm serve WayBob/Way-sft-plamo-3-8b-chat --trust-remote-code
```
This will start an OpenAI-compatible API server on `http://localhost:8000`.
**Using the API:**
```python
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8000/v1",
api_key="dummy" # vLLM doesn't require API key
)
# English example
response = client.chat.completions.create(
model="WayBob/Way-sft-plamo-3-8b-chat",
messages=[
{"role": "user", "content": "Give me three tips for learning a new language."}
],
temperature=0.7,
max_tokens=200
)
print(response.choices[0].message.content)
# Japanese example
response = client.chat.completions.create(
model="WayBob/Way-sft-plamo-3-8b-chat",
messages=[
{"role": "user", "content": "プログラミング初心者へのアドバイスをください。"}
],
temperature=0.7,
max_tokens=200
)
print(response.choices[0].message.content)
```
**Performance**: With vLLM, this model achieves ~50-100 tokens/s prompt processing and efficient batched inference with automatic prefix caching.
### Basic Inference with Transformers
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model = AutoModelForCausalLM.from_pretrained(
"WayBob/Way-sft-plamo-3-8b-chat",
trust_remote_code=True,
torch_dtype=torch.bfloat16,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(
"WayBob/Way-sft-plamo-3-8b-chat",
trust_remote_code=True
)
# English example
prompt = "Human: Give me three tips for learning a new language.\nAssistant:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=200,
temperature=0.7,
top_p=0.9,
do_sample=True
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
```
### Japanese Example
```python
# Japanese example
prompt = "Human: 健康的な生活習慣について教えてください。\nAssistant:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=200,
temperature=0.7,
top_p=0.9,
do_sample=True
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
```
### Recommended Generation Parameters
```python
generation_config = {
"max_new_tokens": 150-250,
"temperature": 0.7,
"top_p": 0.9,
"do_sample": True,
"pad_token_id": tokenizer.pad_token_id,
"eos_token_id": tokenizer.eos_token_id,
}
```
## Limitations
- **Context Length**: 4096 tokens maximum
- **Function Calling**: Not supported in this version (Stage 1 only)
- **Factual Accuracy**: May generate plausible but incorrect information
- **Safety**: No specific safety alignment training
- **Domain**: General purpose, not specialized for specific domains
## Intended Use Cases
**Recommended**:
- General conversation in English and Japanese
- Instruction following and task completion
- Educational Q&A and explanations
- Creative writing assistance
- Bilingual customer service applications
**Not Recommended**:
- Medical, legal, or financial advice
- Safety-critical applications
- Tasks requiring verified factual accuracy
- Real-time decision making systems
## Training Framework
Trained using [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory) - an efficient LLM fine-tuning framework.
Training command:
```bash
llamafactory-cli train examples/train_full/plamo3_stage1_full.yaml
```
## Citation
```bibtex
@misc{way-sft-plamo-3-8b-chat,
author = {WayBob},
title = {Way-sft-plamo-3-8b-chat: Bilingual Instruction-tuned Plamo-3},
year = {2025},
publisher = {HuggingFace},
url = {https://huggingface.co/WayBob/Way-sft-plamo-3-8b-chat}
}
```
## Acknowledgments
**Base Model**:
- [pfnet/plamo-3-nict-8b-base](https://huggingface.co/pfnet/plamo-3-nict-8b-base) - Preferred Networks & NICT
- Licensed under Apache 2.0
**Training Datasets**:
- [yahma/alpaca-cleaned](https://huggingface.co/datasets/yahma/alpaca-cleaned) - English instruction dataset
- [kunishou/databricks-dolly-15k-ja](https://huggingface.co/datasets/kunishou/databricks-dolly-15k-ja) - Japanese instruction dataset
**Training Framework**:
- [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory) by hiyouga
**Infrastructure**:
- AWS EC2 p4d.24xlarge instance
- 8x NVIDIA A100 80GB GPUs
## License
This model is licensed under the **PLaMo Community License Agreement**, inherited from the base model [pfnet/plamo-3-nict-8b-base](https://huggingface.co/pfnet/plamo-3-nict-8b-base).
### Key License Terms
- **Non-commercial and Limited Commercial Use**: Free for personal, academic, and commercial use with revenue under 1 billion yen annually
- **Attribution Required**: Must indicate "Built with PLaMo" in related materials
- **Model Name Requirement**: Derived models must include "PLaMo" in their names
- **Same License**: Redistributions must use the same PLaMo Community License
### Commercial Use
For commercial use, you must:
1. Register at PFN's official page: https://forms.gle/mTL8tBLrMYXKNZD56
2. Ensure annual revenue does not exceed 1 billion yen (or equivalent)
3. For revenue exceeding this limit, contact PFN for a commercial license
**Full License**: See [PLaMo Community License Agreement](https://huggingface.co/pfnet/plamo-3-nict-8b-base) for complete terms.
## Contact
- HuggingFace: [WayBob](https://huggingface.co/WayBob)
- Repository: [Way-sft-plamo-3-8b-chat](https://huggingface.co/WayBob/Way-sft-plamo-3-8b-chat)