--- library_name: transformers license: other base_model: Qwen/Qwen2.5-Coder-7B-Instruct tags: - llama-factory - full - generated_from_trainer model-index: - name: swe-xml results: [] --- # swe-xml This model is a fine-tuned version of [Qwen/Qwen2.5-Coder-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct) on the swe-xml dataset. It achieves the following results on the evaluation set: - Loss: 0.1605 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 1e-05 - train_batch_size: 1 - eval_batch_size: 1 - seed: 42 - distributed_type: multi-GPU - num_devices: 4 - total_train_batch_size: 4 - total_eval_batch_size: 4 - optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments - lr_scheduler_type: cosine - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:------:|:----:|:---------------:| | 0.2309 | 0.0156 | 100 | 0.2555 | | 0.2066 | 0.0311 | 200 | 0.2431 | | 0.2334 | 0.0467 | 300 | 0.2352 | | 0.2611 | 0.0622 | 400 | 0.2318 | | 0.2485 | 0.0778 | 500 | 0.2280 | | 0.2496 | 0.0933 | 600 | 0.2243 | | 0.2798 | 0.1089 | 700 | 0.2193 | | 0.2143 | 0.1244 | 800 | 0.2171 | | 0.2127 | 0.1400 | 900 | 0.2187 | | 0.1501 | 0.1555 | 1000 | 0.2137 | | 0.1507 | 0.1711 | 1100 | 0.2100 | | 0.3055 | 0.1866 | 1200 | 0.2101 | | 0.1649 | 0.2022 | 1300 | 0.2087 | | 0.1152 | 0.2177 | 1400 | 0.2055 | | 0.1799 | 0.2333 | 1500 | 0.2038 | | 0.1547 | 0.2488 | 1600 | 0.2037 | | 0.2323 | 0.2644 | 1700 | 0.1994 | | 0.1962 | 0.2799 | 1800 | 0.1943 | | 0.1785 | 0.2955 | 1900 | 0.1958 | | 0.1977 | 0.3110 | 2000 | 0.1913 | | 0.1919 | 0.3266 | 2100 | 0.1889 | | 0.1463 | 0.3421 | 2200 | 0.1894 | | 0.1946 | 0.3577 | 2300 | 0.1892 | | 0.1867 | 0.3733 | 2400 | 0.1869 | | 0.1452 | 0.3888 | 2500 | 0.1855 | | 0.1442 | 0.4044 | 2600 | 0.1839 | | 0.1449 | 0.4199 | 2700 | 0.1840 | | 0.109 | 0.4355 | 2800 | 0.1816 | | 0.1445 | 0.4510 | 2900 | 0.1804 | | 0.1717 | 0.4666 | 3000 | 0.1797 | | 0.1591 | 0.4821 | 3100 | 0.1795 | | 0.1177 | 0.4977 | 3200 | 0.1793 | | 0.221 | 0.5132 | 3300 | 0.1781 | | 0.148 | 0.5288 | 3400 | 0.1780 | | 0.1365 | 0.5443 | 3500 | 0.1779 | | 0.2491 | 0.5599 | 3600 | 0.1728 | | 0.108 | 0.5754 | 3700 | 0.1722 | | 0.1334 | 0.5910 | 3800 | 0.1728 | | 0.1057 | 0.6065 | 3900 | 0.1714 | | 0.1513 | 0.6221 | 4000 | 0.1702 | | 0.0988 | 0.6376 | 4100 | 0.1697 | | 0.2126 | 0.6532 | 4200 | 0.1681 | | 0.2117 | 0.6687 | 4300 | 0.1687 | | 0.2683 | 0.6843 | 4400 | 0.1671 | | 0.1124 | 0.6998 | 4500 | 0.1649 | | 0.2138 | 0.7154 | 4600 | 0.1651 | | 0.2013 | 0.7309 | 4700 | 0.1638 | | 0.0985 | 0.7465 | 4800 | 0.1646 | | 0.1566 | 0.7621 | 4900 | 0.1638 | | 0.1004 | 0.7776 | 5000 | 0.1641 | | 0.1242 | 0.7932 | 5100 | 0.1632 | | 0.1069 | 0.8087 | 5200 | 0.1623 | | 0.1956 | 0.8243 | 5300 | 0.1616 | | 0.1319 | 0.8398 | 5400 | 0.1616 | | 0.0767 | 0.8554 | 5500 | 0.1611 | | 0.1163 | 0.8709 | 5600 | 0.1610 | | 0.0927 | 0.8865 | 5700 | 0.1607 | | 0.1271 | 0.9020 | 5800 | 0.1607 | | 0.0913 | 0.9176 | 5900 | 0.1604 | | 0.1398 | 0.9331 | 6000 | 0.1603 | | 0.1328 | 0.9487 | 6100 | 0.1605 | | 0.1169 | 0.9642 | 6200 | 0.1603 | | 0.1498 | 0.9798 | 6300 | 0.1604 | | 0.1662 | 0.9953 | 6400 | 0.1603 | ### Framework versions - Transformers 4.46.1 - Pytorch 2.6.0+cu124 - Datasets 3.1.0 - Tokenizers 0.20.3