LLMでの出力方法

Sample Codeを使用して生成
同リポジトリ内のLoRa_inference.pyを使用。

開発手順

  1. base: model_id = "llm-jp/llm-jp-3-13b"
  2. ichikara-instruction-003-001-1.jsonを用いてSFT学習
  3. cyberagent/chatbot-arena-ja-calm2-7b-chat-experimentalでDPO学習
    SFT: LoRa.py
    DPO: DPO.py

Uploaded model

  • Developed by: hiro877
  • License: apache-2.0
  • Finetuned from model : llm-jp/llm-jp-3-13b

This llama model was trained 2x faster with Unsloth and Huggingface's TRL library.


DPO Trained Model

This repository contains a DPO (Direct Preference Optimization) trained model. The model was fine-tuned using Ichikara Instruction for SFT (Supervised Fine-Tuning) and subsequently trained with cyberagent/chatbot-arena-ja-calm2-7b-chat-experimental for DPO training. It is optimized for generating high-quality chatbot responses in Japanese.


🚀 Model Overview


🔧 How to Use

  1. Load the model using Hugging Face Transformers:

    from transformers import AutoModelForCausalLM, AutoTokenizer
    
    model = AutoModelForCausalLM.from_pretrained("hiro877/dpo_trained_model_1215")
    tokenizer = AutoTokenizer.from_pretrained("hiro877/dpo_trained_model_1215")
    
  2. Generate text:

    input_text = "こんにちは、今日はどうされましたか?"
    inputs = tokenizer(input_text, return_tensors="pt")
    outputs = model.generate(inputs["input_ids"], max_length=50)
    print(tokenizer.decode(outputs[0], skip_special_tokens=True))
    

📊 Training Details

1. Datasets Used

  1. Ichikara Instruction

  2. cyberagent/chatbot-arena-ja-calm2-7b-chat-experimental

2. Pre-trained Model


🔒 License

This repository is licensed under CC-BY-NC-SA 4.0.
You are free to share and adapt the material under the following terms:

  1. Attribution: Provide appropriate credit.
  2. NonCommercial: You may not use the material for commercial purposes.
  3. ShareAlike: Distribute your contributions under the same license.

For more details, see the full license text here: CC-BY-NC-SA 4.0.


⚙️ Acknowledgements

  • Special thanks to CyberAgent and Ichikara for providing high-quality datasets.
  • The base model was developed by llm-jp.

🔄 Update Log

[2024-12-15] Initial Release

  • Fine-tuned with Ichikara Instruction (SFT)
  • Trained with cyberagent/chatbot-arena-ja-calm2-7b-chat-experimental (DPO)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for hiro877/dpo_trained_model_1215

Finetuned
(1075)
this model