--- base_model: Unsloth/qwen3-8b library_name: peft model_name: output_orpo tags: - base_model:adapter:Unsloth/qwen3-8b - lora - orpo - transformers - trl - unsloth licence: license pipeline_tag: text-generation --- # Model Card for output_orpo This model is a fine-tuned version of [Unsloth/qwen3-8b](https://huggingface.co/Unsloth/qwen3-8b). It has been trained using [TRL](https://github.com/huggingface/trl). ## Quick start ```python from transformers import pipeline question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?" generator = pipeline("text-generation", model="None", device="cuda") output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0] print(output["generated_text"]) ``` ## Training procedure This model was trained with ORPO, a method introduced in [ORPO: Monolithic Preference Optimization without Reference Model](https://huggingface.co/papers/2403.07691). ### Framework versions - PEFT 0.18.0 - TRL: 0.24.0 - Transformers: 4.57.1 - Pytorch: 2.9.0+cu128 - Datasets: 4.3.0 - Tokenizers: 0.22.1 ## Citations Cite ORPO as: ```bibtex @article{hong2024orpo, title = {{ORPO: Monolithic Preference Optimization without Reference Model}}, author = {Jiwoo Hong and Noah Lee and James Thorne}, year = 2024, eprint = {arXiv:2403.07691} } ``` Cite TRL as: ```bibtex @misc{vonwerra2022trl, title = {{TRL: Transformer Reinforcement Learning}}, author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec}, year = 2020, journal = {GitHub repository}, publisher = {GitHub}, howpublished = {\url{https://github.com/huggingface/trl}} } ``` 分析して unsloth@e150a2d13ef8:/workspace$ git clone https://github.com/gitpullpull/Introspective_Temperature Cloning into 'Introspective_Temperature'... remote: Enumerating objects: 24, done. remote: Counting objects: 100% (24/24), done. remote: Compressing objects: 100% (17/17), done. remote: Total 24 (delta 6), reused 21 (delta 3), pack-reused 0 (from 0) Receiving objects: 100% (24/24), 9.12 MiB | 20.25 MiB/s, done. Resolving deltas: 100% (6/6), done. unsloth@e150a2d13ef8:/workspace$ cd Introspective_Temperature/ unsloth@e150a2d13ef8:/workspace/Introspective_Temperature$ bash run_job.sh  === Job Started at 20251218_140407 === Logs will be saved to: training_log_20251218_140407.txt [1/2] Installing dependencies... Requirement already satisfied: pandas in /opt/conda/lib/python3.11/site-packages (2.3.3) Requirement already satisfied: matplotlib in /opt/conda/lib/python3.11/site-packages (3.10.8) Requirement already satisfied: huggingface_hub in /opt/conda/lib/python3.11/site-packages (0.36.0) Requirement already satisfied: numpy>=1.23.2 in /opt/conda/lib/python3.11/site-packages (from pandas) (2.2.6) Requirement already satisfied: python-dateutil>=2.8.2 in /opt/conda/lib/python3.11/site-packages (from pandas) (2.9.0.post0) Requirement already satisfied: pytz>=2020.1 in /opt/conda/lib/python3.11/site-packages (from pandas) (2025.2) Requirement already satisfied: tzdata>=2022.7 in /opt/conda/lib/python3.11/site-packages (from pandas) (2025.2) Requirement already satisfied: contourpy>=1.0.1 in /opt/conda/lib/python3.11/site-packages (from matplotlib) (1.3.3) Requirement already satisfied: cycler>=0.10 in /opt/conda/lib/python3.11/site-packages (from matplotlib) (0.12.1) Requirement already satisfied: fonttools>=4.22.0 in /opt/conda/lib/python3.11/site-packages (from matplotlib) (4.61.0) Requirement already satisfied: kiwisolver>=1.3.1 in /opt/conda/lib/python3.11/site-packages (from matplotlib) (1.4.9) Requirement already satisfied: packaging>=20.0 in /opt/conda/lib/python3.11/site-packages (from matplotlib) (25.0) Requirement already satisfied: pillow>=8 in /opt/conda/lib/python3.11/site-packages (from matplotlib) (11.3.0) Requirement already satisfied: pyparsing>=3 in /opt/conda/lib/python3.11/site-packages (from matplotlib) (3.2.5) Requirement already satisfied: filelock in /opt/conda/lib/python3.11/site-packages (from huggingface_hub) (3.20.0) Requirement already satisfied: fsspec>=2023.5.0 in /opt/conda/lib/python3.11/site-packages (from huggingface_hub) (2025.3.0) Requirement already satisfied: pyyaml>=5.1 in /opt/conda/lib/python3.11/site-packages (from huggingface_hub) (6.0.3) Requirement already satisfied: requests in /opt/conda/lib/python3.11/site-packages (from huggingface_hub) (2.32.5) Requirement already satisfied: tqdm>=4.42.1 in /opt/conda/lib/python3.11/site-packages (from huggingface_hub) (4.67.1) Requirement already satisfied: typing-extensions>=3.7.4.3 in /opt/conda/lib/python3.11/site-packages (from huggingface_hub) (4.15.0) Requirement already satisfied: hf-xet<2.0.0,>=1.1.3 in /opt/conda/lib/python3.11/site-packages (from huggingface_hub) (1.2.0) Requirement already satisfied: six>=1.5 in /opt/conda/lib/python3.11/site-packages (from python-dateutil>=2.8.2->pandas) (1.17.0) Requirement already satisfied: charset_normalizer<4,>=2 in /opt/conda/lib/python3.11/site-packages (from requests->huggingface_hub) (3.4.4) Requirement already satisfied: idna<4,>=2.5 in /opt/conda/lib/python3.11/site-packages (from requests->huggingface_hub) (3.11) Requirement already satisfied: urllib3<3,>=1.21.1 in /opt/conda/lib/python3.11/site-packages (from requests->huggingface_hub) (2.6.2) Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.11/site-packages (from requests->huggingface_hub) (2025.11.12) [2/2] Starting Train-ORPO.py... 🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning. TMA benchmarks will be running without grid constant TMA descriptor. 🦥 Unsloth Zoo will now patch everything to make training faster! Unsloth: FBGEMM on the current GPU cannot load - will switch to Triton kernels [unsloth_zoo.log|WARNING]Unsloth: Failed to import trl openenv: No module named 'trl.experimental.openenv' Run Timestamp: 20251218_140435 Hugging Face logged in. Target Repo: gitpullpull/Introspective_Temperature_test Loading model: Unsloth/qwen3-8b... ==((====))==  Unsloth 2025.12.4: Fast Qwen3 patching. Transformers: 4.57.1. vLLM: 0.11.2.    \\   /|    NVIDIA GeForce RTX 5090. Num GPUs = 1. Max memory: 31.367 GB. Platform: Linux. O^O/ \_/ \    Torch: 2.9.0+cu128. CUDA: 12.0. CUDA Toolkit: 12.8. Triton: 3.5.0 \        /    Bfloat16 = TRUE. FA [Xformers = 0.0.33.post1. FA2 = False]  "-____-"     Free license: http://github.com/unslothai/unsloth Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored! The following TP rules were not applied on any of the layers: {'layers.*.self_attn.q_proj': 'colwise', 'layers.*.self_attn.k_proj': 'colwise', 'layers.*.self_attn.v_proj': 'colwise', 'layers.*.self_attn.o_proj': 'rowwise', 'layers.*.mlp.gate_proj': 'colwise', 'layers.*.mlp.up_proj': 'colwise', 'layers.*.mlp.down_proj': 'rowwise'} The following layers were not sharded: lm_head.weight, model.embed_tokens.weight, model.layers.*.self_attn.k_norm.weight, model.layers.*.post_attention_layernorm.weight, model.layers.*.self_attn.q_norm.weight, model.norm.weight, model.layers.*.input_layernorm.weight Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:01<00:00,  2.46it/s] Unsloth: Dropout = 0 is supported for fast patching. You are using dropout = 0.05. Unsloth will patch all other layers, except LoRA matrices, causing a performance hit. Unsloth 2025.12.4 patched 36 layers with 0 QKV layers, 0 O layers and 0 MLP layers. Generating train split: 2897 examples [00:00, 48596.01 examples/s] Filter: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2897/2897 [00:00<00:00, 45470.16 examples/s] Map: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2897/2897 [00:00<00:00, 5254.68 examples/s] Map (num_proc=64): 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2897/2897 [00:07<00:00, 373.47 examples/s] Map (num_proc=64): 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2897/2897 [00:16<00:00, 170.57 examples/s] Map (num_proc=64): 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2897/2897 [00:16<00:00, 174.46 examples/s] Lion 8bit ORPOトレーニングを開始します (LR: 1e-06)... The model is already on multiple devices. Skipping the move to device specified in `args`. ==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1    \\   /|    Num examples = 2,897 | Num Epochs = 1 | Total steps = 363 O^O/ \_/ \    Batch size per device = 1 | Gradient accumulation steps = 8 \        /    Data Parallel GPUs = 1 | Total batch size (1 x 8 x 1) = 8  "-____-"     Trainable parameters = 87,293,952 of 8,278,029,312 (1.05% trained)   0%|                                                                                                                                                                                                                                     | 0/363 [00:00