LLMでの出力方法
Sample Codeを使用して生成
同リポジトリ内のLoRa_inference.pyを使用。
開発手順
- base: model_id = "llm-jp/llm-jp-3-13b"
- ichikara-instruction-003-001-1.jsonを用いてSFT学習
- cyberagent/chatbot-arena-ja-calm2-7b-chat-experimentalでDPO学習
SFT: LoRa.py
DPO: DPO.py
Uploaded model
- Developed by: hiro877
- License: apache-2.0
- Finetuned from model : llm-jp/llm-jp-3-13b
This llama model was trained 2x faster with Unsloth and Huggingface's TRL library.
DPO Trained Model
This repository contains a DPO (Direct Preference Optimization) trained model. The model was fine-tuned using Ichikara Instruction for SFT (Supervised Fine-Tuning) and subsequently trained with cyberagent/chatbot-arena-ja-calm2-7b-chat-experimental for DPO training. It is optimized for generating high-quality chatbot responses in Japanese.
🚀 Model Overview
- Model Name: DPO Trained Model (1215 version)
- Base Model: llm-jp/llm-jp-3-13b
- Training Steps:
- SFT (Supervised Fine-Tuning) using Ichikara Instruction
- License: CC-BY-NC-SA 4.0
- DPO Training using cyberagent/chatbot-arena-ja-calm2-7b-chat-experimental
- License: CC-BY 4.0
- SFT (Supervised Fine-Tuning) using Ichikara Instruction
🔧 How to Use
Load the model using Hugging Face Transformers:
from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained("hiro877/dpo_trained_model_1215") tokenizer = AutoTokenizer.from_pretrained("hiro877/dpo_trained_model_1215")Generate text:
input_text = "こんにちは、今日はどうされましたか?" inputs = tokenizer(input_text, return_tensors="pt") outputs = model.generate(inputs["input_ids"], max_length=50) print(tokenizer.decode(outputs[0], skip_special_tokens=True))
📊 Training Details
1. Datasets Used
Ichikara Instruction
- Source: Ichikara Instruction Dataset
- Purpose: Supervised fine-tuning (SFT)
- License: CC-BY-NC-SA 4.0
cyberagent/chatbot-arena-ja-calm2-7b-chat-experimental
- Source: CyberAgent Dataset
- Purpose: DPO training
- License: CC-BY 4.0
2. Pre-trained Model
- Base Model: llm-jp/llm-jp-3-13b
- License: Apache License 2.0
🔒 License
This repository is licensed under CC-BY-NC-SA 4.0.
You are free to share and adapt the material under the following terms:
- Attribution: Provide appropriate credit.
- NonCommercial: You may not use the material for commercial purposes.
- ShareAlike: Distribute your contributions under the same license.
For more details, see the full license text here: CC-BY-NC-SA 4.0.
⚙️ Acknowledgements
- Special thanks to CyberAgent and Ichikara for providing high-quality datasets.
- The base model was developed by llm-jp.
🔄 Update Log
[2024-12-15] Initial Release
- Fine-tuned with Ichikara Instruction (SFT)
- Trained with cyberagent/chatbot-arena-ja-calm2-7b-chat-experimental (DPO)
Model tree for hiro877/dpo_trained_model_1215
Base model
llm-jp/llm-jp-3-13b