LLMでの出力方法

Sample Codeを使用して生成
同リポジトリ内のLoRa_inference.pyを使用。

開発手順

base: model_id = "llm-jp/llm-jp-3-13b"
ichikara-instruction-003-001-1.jsonを用いてSFT学習
cyberagent/chatbot-arena-ja-calm2-7b-chat-experimentalでDPO学習
SFT: LoRa.py
DPO: DPO.py

Uploaded model

Developed by: hiro877
License: apache-2.0
Finetuned from model : llm-jp/llm-jp-3-13b

This llama model was trained 2x faster with Unsloth and Huggingface's TRL library.

DPO Trained Model

This repository contains a DPO (Direct Preference Optimization) trained model. The model was fine-tuned using Ichikara Instruction for SFT (Supervised Fine-Tuning) and subsequently trained with cyberagent/chatbot-arena-ja-calm2-7b-chat-experimental for DPO training. It is optimized for generating high-quality chatbot responses in Japanese.

🚀 Model Overview

Model Name: DPO Trained Model (1215 version)
Base Model: llm-jp/llm-jp-3-13b
Training Steps:
1. SFT (Supervised Fine-Tuning) using Ichikara Instruction
  - License: CC-BY-NC-SA 4.0
2. DPO Training using cyberagent/chatbot-arena-ja-calm2-7b-chat-experimental
  - License: CC-BY 4.0

🔧 How to Use

Load the model using Hugging Face Transformers:

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("hiro877/dpo_trained_model_1215")
tokenizer = AutoTokenizer.from_pretrained("hiro877/dpo_trained_model_1215")

Generate text:

input_text = "こんにちは、今日はどうされましたか？"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(inputs["input_ids"], max_length=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

📊 Training Details

1. Datasets Used

Ichikara Instruction
- Source: Ichikara Instruction Dataset
- Purpose: Supervised fine-tuning (SFT)
- License: CC-BY-NC-SA 4.0
cyberagent/chatbot-arena-ja-calm2-7b-chat-experimental
- Source: CyberAgent Dataset
- Purpose: DPO training
- License: CC-BY 4.0

2. Pre-trained Model

Base Model: llm-jp/llm-jp-3-13b
- License: Apache License 2.0

🔒 License

This repository is licensed under CC-BY-NC-SA 4.0.
You are free to share and adapt the material under the following terms:

Attribution: Provide appropriate credit.
NonCommercial: You may not use the material for commercial purposes.
ShareAlike: Distribute your contributions under the same license.

For more details, see the full license text here: CC-BY-NC-SA 4.0.

⚙️ Acknowledgements

Special thanks to CyberAgent and Ichikara for providing high-quality datasets.
The base model was developed by llm-jp.

🔄 Update Log

[2024-12-15] Initial Release

Fine-tuned with Ichikara Instruction (SFT)
Trained with cyberagent/chatbot-arena-ja-calm2-7b-chat-experimental (DPO)

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for hiro877/dpo_trained_model_1215

Base model

llm-jp/llm-jp-3-13b

Finetuned

(1081)

this model