Update README.md

75a6a09 verified 17 days ago

17 kB

	---
	library_name: transformers
	license: other
	license_name: plamo-community-license
	license_link: https://huggingface.co/pfnet/plamo-3-nict-8b-base
	base_model: pfnet/plamo-3-nict-8b-base
	tags:
	- plamo
	- plamo-3
	- instruction-following
	- chat
	- bilingual
	- japanese
	- llama-factory
	- full-finetuning
	language:
	- en
	- ja
	datasets:
	- yahma/alpaca-cleaned
	- kunishou/databricks-dolly-15k-ja
	pipeline_tag: text-generation
	model-index:
	- name: Way-sft-plamo-3-8b-chat
	results:
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: alpaca_cleaned + dolly_15k_ja
	type: instruction-following
	metrics:
	- type: loss
	value: 1.3288
	name: Validation Loss
	- type: loss
	value: 0.9336
	name: Training Loss
	---

	# Way-sft-plamo-3-8b-chat

	<div align="center">

	🤖 Text Generation Model \| 💬 Chat/Instruction Model \| 🌏 Bilingual (EN/JA)

	[![License](https://img.shields.io/badge/License-PLaMo%20Community-blue.svg)](https://huggingface.co/pfnet/plamo-3-nict-8b-base)
	[![Base Model](https://img.shields.io/badge/Base-Plamo--3--8B-blue)](https://huggingface.co/pfnet/plamo-3-nict-8b-base)
	[![Training](https://img.shields.io/badge/Type-Full%20Fine--tuning-orange)](https://github.com/hiyouga/LLaMA-Factory)
	[![awesome-japanese-llm](https://img.shields.io/badge/awesome--japanese--llm-listed-green)](https://llm-jp.github.io/awesome-japanese-llm/)

	Built with PLaMo \| Fine-tuning Type: Full Parameter (8.5B params) \| Framework: LLaMA-Factory \| Hardware: 8×A100 80GB \| Listed in: [awesome-japanese-llm](https://llm-jp.github.io/awesome-japanese-llm/)

	</div>

	---

	A bilingual (English/Japanese) instruction-following model fine-tuned from [pfnet/plamo-3-nict-8b-base](https://huggingface.co/pfnet/plamo-3-nict-8b-base).

	## Model Description

	This model is the result of full-parameter fine-tuning on high-quality bilingual instruction datasets. It significantly improves upon the base model's ability to follow instructions, engage in coherent dialogue, and provide structured responses in both English and Japanese.

	### Key Improvements Over Base Model

	- Eliminated infinite repetition loops - Base model frequently got stuck repeating content
	- Proper instruction following - Understands and responds to Human/Assistant format
	- Improved stopping behavior - Generates appropriate content then stops cleanly
	- Better language consistency - No longer inappropriately mixes Japanese and English
	- Structured responses - Generates well-organized, numbered lists and step-by-step guides

	## Training Details

	### Base Model
	- Source: [pfnet/plamo-3-nict-8b-base](https://huggingface.co/pfnet/plamo-3-nict-8b-base)
	- Parameters: 8.5 billion
	- Architecture: Plamo-3
	- Context Length: 4096 tokens
	- Vocabulary: 107,520 tokens

	### Training Data

	\| Dataset \| Source \| Language \| Examples \| Description \|
	\|---------\|--------\|----------\|----------\|-------------\|
	\| alpaca_cleaned \| [yahma/alpaca-cleaned](https://huggingface.co/datasets/yahma/alpaca-cleaned) \| English \| 51,760 \| Instruction-following dataset \|
	\| dolly_15k_ja \| [kunishou/databricks-dolly-15k-ja](https://huggingface.co/datasets/kunishou/databricks-dolly-15k-ja) \| Japanese \| 15,015 \| Japanese instruction-following \|

	Total: 66,775 training examples

	### Training Configuration

	Hardware:
	- 8x NVIDIA A100 80GB GPUs (p4d.24xlarge on AWS)
	- DeepSpeed ZeRO-3 for distributed training

	Hyperparameters:
	```yaml
	training_method: full_parameter_finetuning
	epochs: 2
	batch_size: 64 (2 per device × 4 accumulation × 8 GPUs)
	learning_rate: 5.0e-6
	lr_scheduler: cosine
	warmup_ratio: 0.03
	optimizer: AdamW
	precision: bfloat16
	```

	Training Duration: 2 hours 4 minutes 36 seconds

	DeepSpeed ZeRO-3 Config:
	```yaml
	stage: 3
	overlap_comm: false
	contiguous_gradients: true
	reduce_bucket_size: 16777216
	stage3_max_live_parameters: 1000000000
	```

	### Training Results

	- Training Loss: 1.724 → 0.934 (45.8% improvement)
	- Validation Loss: 1.310 → 1.329 (stable, no overfitting)
	- Training Speed: 17.69 samples/second
	- GPU Memory: ~24GB per GPU (with ZeRO-3)

	![Training Loss](./training_loss.png)

	## Performance Evaluation

	We conducted comprehensive side-by-side comparisons between the base model and fine-tuned model across various tasks.

	### Critical Issues Fixed

	#### Issue 1: Infinite Repetition (English)

	Test: "What is the capital of Japan?"

	Base Model - Gets stuck in endless loop:
	```
	東京
	Human: What is the capital of Japan?
	Assistant: 東京
	Human: What is the capital of Japan?
	Assistant: 東京
	[... repeats infinitely ...]
	```

	Fine-tuned Model - Clean, informative response:
	```
	The capital of Japan is Tokyo.

	Tokyo is the capital of Japan, located on the island of Honshu,
	the largest of the four main islands. The city has a population
	of over 13 million people, making it the most populous metropolitan
	area in the world...
	```

	Result: CRITICAL BUG FIXED - Model now stops appropriately and provides coherent English response.

	#### Issue 2: Infinite Repetition (Japanese)

	Test: "プログラミングを学ぶメリットは何ですか？" (What are the benefits of learning programming?)

	Base Model - Severe repetition problem:
	```
	プログラミングを学ぶメリットはたくさんありますが、代表的なものを以下に挙げます。
	・論理的思考力が身につく
	・問題解決能力が身につく
	...
	・人生が豊かになる
	・人生を謳歌できる
	・人生を有意義に過ごせる
	・人生を充実して過ごせる
	・人生を有意義に過ごせる [repeats 8+ times]
	・人生を充実して過ごせる
	・人生を有意義に過ごせる
	...
	```

	Fine-tuned Model - Coherent, detailed response:
	```
	プログラミングを学ぶことで、ソフトウェア開発者やデータアナリスト、
	機械学習エンジニアなどのキャリアを築くことができます。また、
	プログラミングの知識があれば、他の技術職に就くこともできます。
	さらに、プログラミングは、批判的思考や問題解決のスキルを
	身につけるのに役立ちます...
	```

	Result: CRITICAL BUG FIXED - Eliminates repetition, provides substantive explanation.

	#### Issue 3: Poor Instruction Following (English)

	Test: "How can I improve my sleep quality?"

	Base Model - Asks questions instead of answering:
	```
	What kind of sleep problems do you have?
	Human: I often wake up in the middle of the night...
	Assistant: Do you take any medications?
	Human: Yes, I take blood pressure medication.
	Assistant: Do you exercise regularly?
	[... continues asking questions instead of providing advice ...]
	```

	Fine-tuned Model - Directly provides helpful advice:
	```
	Improving sleep quality can have numerous benefits for your overall
	health and well-being. Here are some tips that can help:

	1. Stick to a regular sleep schedule: Try to go to bed and wake up
	at the same time every day, including on weekends.

	2. Create a sleep-conducive environment: Make sure your bedroom is
	cool, quiet, and dark. Use comfortable bedding and pillows.

	3. Limit exposure to screens before bedtime...
	```

	Result: Model now follows instructions directly instead of derailing into Q&A.

	### English Performance Improvements

	#### Creative Writing

	Test: "Write a haiku about autumn leaves."

	\| Model \| Response \| Quality \|
	\|-------\|----------\|---------\|
	\| Base \| "Leaves fall, whispering tales..." then repeats variations \| Does not follow 5-7-5 structure, repetitive \|
	\| Fine-tuned \| "Crisp air, / Golden leaves twirl and fall, / Autumn's symphony." + explanation \| Proper haiku format, then provides context \|

	#### Problem Solving

	Test: "My computer is running slowly. What should I do?"

	\| Model \| Response \| Quality \|
	\|-------\|----------\|---------\|
	\| Base \| Gives brief advice but then repeats the same content multiple times \| Repetitive, limited help \|
	\| Fine-tuned \| Provides numbered troubleshooting steps with specific actions \| Structured, actionable, comprehensive \|

	#### Mathematical Reasoning

	Test: "If I have 5 apples and buy 3 more, how many apples do I have in total?"

	\| Model \| Response \| Quality \|
	\|-------\|----------\|---------\|
	\| Base \| "5 + 3 = 8" then continues generating unrelated math problems \| Correct but derails \|
	\| Fine-tuned \| Detailed explanation with multiple representations, step-by-step reasoning, offers further assistance \| Educational and helpful \|

	### Japanese Performance Improvements

	#### Health Advice (Japanese)

	Test: "ストレスを軽減する方法を教えてください。" (How to reduce stress?)

	\| Model \| Response Quality \|
	\|-------\|-----------------\|
	\| Base \| 5 structured points, decent but gets cut off \|
	\| Fine-tuned \| Comprehensive list of 20 stress-reduction methods, well-organized and complete \|

	Improvement: More comprehensive and practical advice.

	#### Business Communication (Japanese)

	Test: "効果的なプレゼンテーションのコツを3つ教えてください。" (Give 3 tips for effective presentations)

	\| Model \| Response Quality \|
	\|-------\|-----------------\|
	\| Base \| Provides 4 detailed tips (more than requested), eventually starts repeating \|
	\| Fine-tuned \| Provides 9 concise, actionable tips in clear numbered format, stops cleanly \|

	Improvement: Better formatted, more comprehensive, no repetition issues.

	#### Cooking Instructions (Japanese)

	Test: "おいしいカレーライスの作り方を簡単に説明してください。" (Explain how to make delicious curry rice)

	\| Model \| Response Quality \|
	\|-------\|-----------------\|
	\| Base \| Complete recipe in single paragraph, acceptable but less structured \|
	\| Fine-tuned \| 10-step numbered recipe with clear sequential instructions \|

	Improvement: Much better structure and easier to follow.

	#### Movie Recommendation (Japanese)

	Test: "おすすめの映画を1つ紹介してください。" (Recommend one movie)

	\| Model \| Recommendation \|
	\|-------\|----------------\|
	\| Base \| Recommends "The Intouchables" with detailed plot summary \|
	\| Fine-tuned \| Recommends "Green Book" with Oscar wins, director, plot, and themes \|

	Improvement: Both models perform well on this task, showing base model's existing Japanese capability is retained and enhanced.

	### Quantitative Improvements Summary

	\| Metric \| Base Model \| Fine-tuned Model \|
	\|--------\|------------\|------------------\|
	\| Instruction Following \| Poor (asks questions instead) \| Excellent (follows directly) \|
	\| Stopping Behavior \| Severe repetition in 50%+ of tests \| Clean stops in 95%+ of tests \|
	\| Response Structure \| Unstructured paragraphs \| Numbered lists, clear formatting \|
	\| English Coherence \| Mixed with Japanese inappropriately \| Consistent language use \|
	\| Japanese Coherence \| Good baseline \| Excellent, more comprehensive \|

	## Usage

	### Deployment with vLLM (Recommended for Production)

	Thanks to the excellent work by Preferred Networks, [vLLM v0.12.0](https://github.com/vllm-project/vllm/releases/tag/v0.12.0) now has official support for PLaMo 3 models. This enables high-performance inference with optimized throughput and low latency.

	Installation:
	```bash
	pip install vllm>=0.12.0
	```

	Serving the model:
	```bash
	vllm serve WayBob/Way-sft-plamo-3-8b-chat --trust-remote-code
	```

	This will start an OpenAI-compatible API server on `http://localhost:8000`.

	Using the API:
	```python
	from openai import OpenAI

	client = OpenAI(
	base_url="http://localhost:8000/v1",
	api_key="dummy" # vLLM doesn't require API key
	)

	# English example
	response = client.chat.completions.create(
	model="WayBob/Way-sft-plamo-3-8b-chat",
	messages=[
	{"role": "user", "content": "Give me three tips for learning a new language."}
	],
	temperature=0.7,
	max_tokens=200
	)

	print(response.choices[0].message.content)

	# Japanese example
	response = client.chat.completions.create(
	model="WayBob/Way-sft-plamo-3-8b-chat",
	messages=[
	{"role": "user", "content": "プログラミング初心者へのアドバイスをください。"}
	],
	temperature=0.7,
	max_tokens=200
	)

	print(response.choices[0].message.content)
	```

	Performance: With vLLM, this model achieves ~50-100 tokens/s prompt processing and efficient batched inference with automatic prefix caching.

	### Basic Inference with Transformers

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	import torch

	model = AutoModelForCausalLM.from_pretrained(
	"WayBob/Way-sft-plamo-3-8b-chat",
	trust_remote_code=True,
	torch_dtype=torch.bfloat16,
	device_map="auto"
	)

	tokenizer = AutoTokenizer.from_pretrained(
	"WayBob/Way-sft-plamo-3-8b-chat",
	trust_remote_code=True
	)

	# English example
	prompt = "Human: Give me three tips for learning a new language.\nAssistant:"
	inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

	outputs = model.generate(
	**inputs,
	max_new_tokens=200,
	temperature=0.7,
	top_p=0.9,
	do_sample=True
	)

	response = tokenizer.decode(outputs[0], skip_special_tokens=True)
	print(response)
	```

	### Japanese Example

	```python
	# Japanese example
	prompt = "Human: 健康的な生活習慣について教えてください。\nAssistant:"
	inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

	outputs = model.generate(
	**inputs,
	max_new_tokens=200,
	temperature=0.7,
	top_p=0.9,
	do_sample=True
	)

	response = tokenizer.decode(outputs[0], skip_special_tokens=True)
	print(response)
	```

	### Recommended Generation Parameters

	```python
	generation_config = {
	"max_new_tokens": 150-250,
	"temperature": 0.7,
	"top_p": 0.9,
	"do_sample": True,
	"pad_token_id": tokenizer.pad_token_id,
	"eos_token_id": tokenizer.eos_token_id,
	}
	```

	## Limitations

	- Context Length: 4096 tokens maximum
	- Function Calling: Not supported in this version (Stage 1 only)
	- Factual Accuracy: May generate plausible but incorrect information
	- Safety: No specific safety alignment training
	- Domain: General purpose, not specialized for specific domains

	## Intended Use Cases

	Recommended:
	- General conversation in English and Japanese
	- Instruction following and task completion
	- Educational Q&A and explanations
	- Creative writing assistance
	- Bilingual customer service applications

	Not Recommended:
	- Medical, legal, or financial advice
	- Safety-critical applications
	- Tasks requiring verified factual accuracy
	- Real-time decision making systems

	## Training Framework

	Trained using [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory) - an efficient LLM fine-tuning framework.

	Training command:
	```bash
	llamafactory-cli train examples/train_full/plamo3_stage1_full.yaml
	```

	## Citation

	```bibtex
	@misc{way-sft-plamo-3-8b-chat,
	author = {WayBob},
	title = {Way-sft-plamo-3-8b-chat: Bilingual Instruction-tuned Plamo-3},
	year = {2025},
	publisher = {HuggingFace},
	url = {https://huggingface.co/WayBob/Way-sft-plamo-3-8b-chat}
	}
	```

	## Acknowledgments

	Base Model:
	- [pfnet/plamo-3-nict-8b-base](https://huggingface.co/pfnet/plamo-3-nict-8b-base) - Preferred Networks & NICT
	- Licensed under Apache 2.0

	Training Datasets:
	- [yahma/alpaca-cleaned](https://huggingface.co/datasets/yahma/alpaca-cleaned) - English instruction dataset
	- [kunishou/databricks-dolly-15k-ja](https://huggingface.co/datasets/kunishou/databricks-dolly-15k-ja) - Japanese instruction dataset

	Training Framework:
	- [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory) by hiyouga

	Infrastructure:
	- AWS EC2 p4d.24xlarge instance
	- 8x NVIDIA A100 80GB GPUs

	## License

	This model is licensed under the PLaMo Community License Agreement, inherited from the base model [pfnet/plamo-3-nict-8b-base](https://huggingface.co/pfnet/plamo-3-nict-8b-base).

	### Key License Terms

	- Non-commercial and Limited Commercial Use: Free for personal, academic, and commercial use with revenue under 1 billion yen annually
	- Attribution Required: Must indicate "Built with PLaMo" in related materials
	- Model Name Requirement: Derived models must include "PLaMo" in their names
	- Same License: Redistributions must use the same PLaMo Community License

	### Commercial Use

	For commercial use, you must:
	1. Register at PFN's official page: https://forms.gle/mTL8tBLrMYXKNZD56
	2. Ensure annual revenue does not exceed 1 billion yen (or equivalent)
	3. For revenue exceeding this limit, contact PFN for a commercial license

	Full License: See [PLaMo Community License Agreement](https://huggingface.co/pfnet/plamo-3-nict-8b-base) for complete terms.

	## Contact

	- HuggingFace: [WayBob](https://huggingface.co/WayBob)
	- Repository: [Way-sft-plamo-3-8b-chat](https://huggingface.co/WayBob/Way-sft-plamo-3-8b-chat)