Text Generation
PEFT
Safetensors
Transformers
gemma2
axolotl
lora
conversational
text-generation-inference
4-bit precision
bitsandbytes
Instructions to use AiAF/rp-2b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use AiAF/rp-2b with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("google/gemma-2-2b-it") model = PeftModel.from_pretrained(base_model, "AiAF/rp-2b") - Transformers
How to use AiAF/rp-2b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="AiAF/rp-2b") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("AiAF/rp-2b") model = AutoModelForMultimodalLM.from_pretrained("AiAF/rp-2b") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use AiAF/rp-2b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "AiAF/rp-2b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AiAF/rp-2b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/AiAF/rp-2b
- SGLang
How to use AiAF/rp-2b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "AiAF/rp-2b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AiAF/rp-2b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "AiAF/rp-2b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AiAF/rp-2b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use AiAF/rp-2b with Docker Model Runner:
docker model run hf.co/AiAF/rp-2b
Training in progress, step 700
Browse files- adapter_model.safetensors +1 -1
- debug.log +104 -1
adapter_model.safetensors
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 102264160
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:59badcbc1d668a371853f284e39f9ca33e2fe2af68b773148163044bb0f70bdd
|
| 3 |
size 102264160
|
debug.log
CHANGED
|
@@ -1702,4 +1702,107 @@ trainable params: 25,559,040 || all params: 2,639,900,928 || trainable%: 0.9682
|
|
| 1702 |
|
| 1703 |
65%|β| 651/1000 [11:27<20:16, 3.49s/it]
|
| 1704 |
65%|β| 652/1000 [11:28<15:29, 2.67s/it]
|
| 1705 |
|
| 1706 |
|
| 1707 |
65%|β| 652/1000 [11:28<15:29, 2.67s/it]
|
| 1708 |
65%|β| 653/1000 [11:29<12:08, 2.10s/it]
|
| 1709 |
|
| 1710 |
|
| 1711 |
65%|β| 653/1000 [11:29<12:08, 2.10s/it]
|
| 1712 |
65%|β| 654/1000 [11:29<09:50, 1.71s/it]
|
| 1713 |
|
| 1714 |
-
|
| 1715 |
65%|β| 654/1000 [11:29<09:50, 1.71s/it]
|
|
|
|
| 1716 |
65%|β| 654/1000 [11:29<09:50, 1.71s/it]
|
| 1717 |
66%|β| 655/1000 [11:30<08:09, 1.42s/it]
|
| 1718 |
|
|
|
|
| 1719 |
66%|β| 655/1000 [11:30<08:09, 1.42s/it]
|
| 1720 |
66%|β| 656/1000 [11:31<06:58, 1.22s/it]
|
| 1721 |
|
|
|
|
| 1722 |
66%|β| 656/1000 [11:31<06:58, 1.22s/it]
|
| 1723 |
66%|β| 657/1000 [11:32<06:09, 1.08s/it]
|
| 1724 |
|
|
|
|
| 1725 |
66%|β| 657/1000 [11:32<06:09, 1.08s/it]
|
| 1726 |
66%|β| 658/1000 [11:32<05:34, 1.02it/s]
|
| 1727 |
|
|
|
|
| 1728 |
66%|β| 658/1000 [11:32<05:34, 1.02it/s]
|
| 1729 |
66%|β| 659/1000 [11:33<05:04, 1.12it/s]
|
| 1730 |
|
|
|
|
| 1731 |
66%|β| 659/1000 [11:33<05:04, 1.12it/s]
|
| 1732 |
66%|β| 660/1000 [11:34<04:47, 1.18it/s]
|
| 1733 |
|
|
|
|
| 1734 |
66%|β| 660/1000 [11:34<04:47, 1.18it/s]
|
| 1735 |
66%|β| 661/1000 [11:35<04:32, 1.24it/s]
|
| 1736 |
|
|
|
|
| 1737 |
66%|β| 661/1000 [11:35<04:32, 1.24it/s]
|
| 1738 |
66%|β| 662/1000 [11:35<04:19, 1.30it/s]
|
| 1739 |
|
|
|
|
| 1740 |
66%|β| 662/1000 [11:35<04:19, 1.30it/s]
|
| 1741 |
66%|β| 663/1000 [11:36<04:18, 1.30it/s]
|
| 1742 |
|
|
|
|
| 1743 |
66%|β| 663/1000 [11:36<04:18, 1.30it/s]
|
| 1744 |
66%|β| 664/1000 [11:37<04:19, 1.30it/s]
|
| 1745 |
|
|
|
|
| 1746 |
66%|β| 664/1000 [11:37<04:19, 1.30it/s]
|
| 1747 |
66%|β| 665/1000 [11:38<04:21, 1.28it/s]
|
| 1748 |
|
|
|
|
| 1749 |
66%|β| 665/1000 [11:38<04:21, 1.28it/s]
|
| 1750 |
67%|β| 666/1000 [11:38<04:19, 1.29it/s]
|
| 1751 |
|
|
|
|
| 1752 |
67%|β| 666/1000 [11:38<04:19, 1.29it/s]
|
| 1753 |
67%|β| 667/1000 [11:39<04:14, 1.31it/s]
|
| 1754 |
|
|
|
|
| 1755 |
67%|β| 667/1000 [11:39<04:14, 1.31it/s]
|
| 1756 |
67%|β| 668/1000 [11:40<04:13, 1.31it/s]
|
| 1757 |
|
|
|
|
| 1758 |
67%|β| 668/1000 [11:40<04:13, 1.31it/s]
|
| 1759 |
67%|β| 669/1000 [11:41<04:14, 1.30it/s]
|
| 1760 |
|
|
|
|
| 1761 |
67%|β| 669/1000 [11:41<04:14, 1.30it/s]
|
| 1762 |
67%|β| 670/1000 [11:41<04:12, 1.30it/s]
|
| 1763 |
|
|
|
|
| 1764 |
67%|β| 670/1000 [11:41<04:12, 1.30it/s]
|
| 1765 |
67%|β| 671/1000 [11:42<04:11, 1.31it/s]
|
| 1766 |
|
|
|
|
| 1767 |
67%|β| 671/1000 [11:42<04:11, 1.31it/s]
|
| 1768 |
67%|β| 672/1000 [11:43<04:09, 1.31it/s]
|
| 1769 |
|
|
|
|
| 1770 |
67%|β| 672/1000 [11:43<04:09, 1.31it/s]
|
| 1771 |
67%|β| 673/1000 [11:44<04:07, 1.32it/s]
|
| 1772 |
|
|
|
|
| 1773 |
67%|β| 673/1000 [11:44<04:07, 1.32it/s]
|
| 1774 |
67%|β| 674/1000 [11:44<04:02, 1.34it/s]
|
| 1775 |
|
|
|
|
| 1776 |
67%|β| 674/1000 [11:44<04:02, 1.34it/s]
|
| 1777 |
68%|β| 675/1000 [11:45<04:00, 1.35it/s]
|
| 1778 |
|
|
|
|
| 1779 |
68%|β| 675/1000 [11:45<04:00, 1.35it/s]
|
| 1780 |
68%|β| 676/1000 [11:46<04:03, 1.33it/s]
|
| 1781 |
|
|
|
|
| 1782 |
68%|β| 676/1000 [11:46<04:03, 1.33it/s]
|
| 1783 |
68%|β| 677/1000 [11:47<04:03, 1.33it/s]
|
| 1784 |
|
|
|
|
| 1785 |
68%|β| 677/1000 [11:47<04:03, 1.33it/s]
|
| 1786 |
68%|β| 678/1000 [11:47<04:02, 1.33it/s]
|
| 1787 |
|
|
|
|
| 1788 |
68%|β| 678/1000 [11:47<04:02, 1.33it/s]
|
| 1789 |
68%|β| 679/1000 [11:48<04:02, 1.32it/s]
|
| 1790 |
|
|
|
|
| 1791 |
68%|β| 679/1000 [11:48<04:02, 1.32it/s]
|
| 1792 |
68%|β| 680/1000 [11:49<04:04, 1.31it/s]
|
| 1793 |
|
|
|
|
| 1794 |
68%|β| 680/1000 [11:49<04:04, 1.31it/s]
|
| 1795 |
68%|β| 681/1000 [11:50<03:59, 1.33it/s]
|
| 1796 |
|
|
|
|
| 1797 |
68%|β| 681/1000 [11:50<03:59, 1.33it/s]
|
| 1798 |
68%|β| 682/1000 [11:50<03:56, 1.34it/s]
|
| 1799 |
|
|
|
|
| 1800 |
68%|β| 682/1000 [11:50<03:56, 1.34it/s]
|
| 1801 |
68%|β| 683/1000 [11:51<03:56, 1.34it/s]
|
| 1802 |
|
|
|
|
| 1803 |
68%|β| 683/1000 [11:51<03:56, 1.34it/s]
|
| 1804 |
68%|β| 684/1000 [11:52<03:48, 1.38it/s]
|
| 1805 |
|
|
|
|
| 1806 |
68%|β| 684/1000 [11:52<03:48, 1.38it/s]
|
| 1807 |
68%|β| 685/1000 [11:53<03:49, 1.37it/s]
|
| 1808 |
|
|
|
|
| 1809 |
68%|β| 685/1000 [11:53<03:49, 1.37it/s]
|
| 1810 |
69%|β| 686/1000 [11:53<03:53, 1.35it/s]
|
| 1811 |
|
|
|
|
| 1812 |
69%|β| 686/1000 [11:53<03:53, 1.35it/s]
|
| 1813 |
69%|β| 687/1000 [11:54<03:57, 1.32it/s]
|
| 1814 |
|
|
|
|
| 1815 |
69%|β| 687/1000 [11:54<03:57, 1.32it/s]
|
| 1816 |
69%|β| 688/1000 [11:55<03:48, 1.36it/s]
|
| 1817 |
|
|
|
|
| 1818 |
69%|β| 688/1000 [11:55<03:48, 1.36it/s]
|
| 1819 |
69%|β| 689/1000 [11:56<03:52, 1.34it/s]
|
| 1820 |
|
|
|
|
| 1821 |
69%|β| 689/1000 [11:56<03:52, 1.34it/s]
|
| 1822 |
69%|β| 690/1000 [11:56<03:49, 1.35it/s]
|
| 1823 |
|
|
|
|
| 1824 |
69%|β| 690/1000 [11:56<03:49, 1.35it/s]
|
| 1825 |
69%|β| 691/1000 [11:57<03:46, 1.36it/s]
|
| 1826 |
|
|
|
|
| 1827 |
69%|β| 691/1000 [11:57<03:46, 1.36it/s]
|
| 1828 |
69%|β| 692/1000 [11:58<03:50, 1.34it/s]
|
| 1829 |
|
|
|
|
| 1830 |
69%|β| 692/1000 [11:58<03:50, 1.34it/s]
|
| 1831 |
69%|β| 693/1000 [11:59<03:52, 1.32it/s]
|
| 1832 |
|
|
|
|
| 1833 |
69%|β| 693/1000 [11:59<03:52, 1.32it/s]
|
| 1834 |
69%|β| 694/1000 [11:59<03:45, 1.35it/s]
|
| 1835 |
|
|
|
|
| 1836 |
69%|β| 694/1000 [11:59<03:45, 1.35it/s]
|
| 1837 |
70%|β| 695/1000 [12:00<03:45, 1.35it/s]
|
| 1838 |
|
|
|
|
| 1839 |
70%|β| 695/1000 [12:00<03:45, 1.35it/s]
|
| 1840 |
70%|β| 696/1000 [12:01<03:42, 1.36it/s]
|
| 1841 |
|
|
|
|
| 1842 |
70%|β| 696/1000 [12:01<03:42, 1.36it/s]
|
| 1843 |
70%|β| 697/1000 [12:01<03:39, 1.38it/s]
|
| 1844 |
|
|
|
|
| 1845 |
70%|β| 697/1000 [12:01<03:39, 1.38it/s]
|
| 1846 |
70%|β| 698/1000 [12:02<03:39, 1.38it/s]
|
| 1847 |
|
|
|
|
| 1848 |
70%|β| 698/1000 [12:02<03:39, 1.38it/s]
|
| 1849 |
70%|β| 699/1000 [12:03<03:42, 1.35it/s]
|
| 1850 |
|
|
|
|
| 1851 |
70%|β| 699/1000 [12:03<03:42, 1.35it/s]
|
| 1852 |
70%|β| 700/1000 [12:04<03:40, 1.36it/s]
|
| 1853 |
|
|
|
|
| 1854 |
70%|β| 700/1000 [12:04<03:40, 1.36it/s][2026-03-30 14:47:18,171] [INFO] [axolotl.core.trainers.base.evaluate:401] [PID:37135] Running evaluation step...
|
|
|
|
|
|
|
| 1855 |
0%| | 0/100 [00:00<?, ?it/s][A
|
|
|
|
| 1856 |
3%| | 3/100 [00:00<00:03, 29.32it/s][A
|
|
|
|
| 1857 |
6%|β | 6/100 [00:00<00:05, 17.31it/s][A
|
|
|
|
| 1858 |
8%|β | 8/100 [00:00<00:05, 17.08it/s][A
|
|
|
|
| 1859 |
10%|β | 10/100 [00:00<00:05, 16.26it/s][A
|
|
|
|
| 1860 |
12%|β | 12/100 [00:00<00:05, 17.18it/s][A
|
|
|
|
| 1861 |
14%|β | 14/100 [00:00<00:05, 17.01it/s][A
|
|
|
|
| 1862 |
16%|β | 16/100 [00:00<00:04, 16.97it/s][A
|
|
|
|
| 1863 |
18%|β | 18/100 [00:01<00:04, 17.16it/s][A
|
|
|
|
| 1864 |
20%|β | 20/100 [00:01<00:04, 17.47it/s][A
|
|
|
|
| 1865 |
22%|β | 22/100 [00:01<00:04, 17.03it/s][A
|
|
|
|
| 1866 |
24%|β | 24/100 [00:01<00:04, 17.66it/s][A
|
|
|
|
| 1867 |
26%|β | 26/100 [00:01<00:04, 17.05it/s][A
|
|
|
|
| 1868 |
28%|β | 28/100 [00:01<00:04, 17.07it/s][A
|
|
|
|
| 1869 |
30%|β | 30/100 [00:01<00:04, 16.62it/s][A
|
|
|
|
| 1870 |
32%|β | 32/100 [00:01<00:04, 16.72it/s][A
|
|
|
|
| 1871 |
34%|β | 34/100 [00:01<00:03, 16.86it/s][A
|
|
|
|
| 1872 |
37%|β | 37/100 [00:02<00:03, 17.25it/s][A
|
|
|
|
| 1873 |
39%|ββ | 39/100 [00:02<00:03, 17.24it/s][A
|
|
|
|
| 1874 |
41%|ββ | 41/100 [00:02<00:03, 17.45it/s][A
|
|
|
|
| 1875 |
44%|ββ | 44/100 [00:02<00:03, 18.13it/s][A
|
|
|
|
| 1876 |
46%|ββ | 46/100 [00:02<00:03, 17.25it/s][A
|
|
|
|
| 1877 |
48%|ββ | 48/100 [00:02<00:02, 17.67it/s][A
|
|
|
|
| 1878 |
50%|ββ | 50/100 [00:02<00:02, 17.02it/s][A
|
|
|
|
| 1879 |
52%|ββ | 52/100 [00:03<00:02, 17.00it/s][A
|
|
|
|
| 1880 |
54%|ββ | 54/100 [00:03<00:02, 16.32it/s][A
|
|
|
|
| 1881 |
56%|ββ | 56/100 [00:03<00:02, 16.62it/s][A
|
|
|
|
| 1882 |
58%|ββ | 58/100 [00:03<00:02, 16.50it/s][A
|
|
|
|
| 1883 |
60%|ββ | 60/100 [00:03<00:02, 16.98it/s][A
|
|
|
|
| 1884 |
62%|ββ | 62/100 [00:03<00:02, 17.41it/s][A
|
|
|
|
| 1885 |
64%|ββ | 64/100 [00:03<00:02, 17.57it/s][A
|
|
|
|
| 1886 |
66%|ββ | 66/100 [00:03<00:02, 16.85it/s][A
|
|
|
|
| 1887 |
68%|ββ | 68/100 [00:03<00:01, 17.41it/s][A
|
|
|
|
| 1888 |
70%|ββ | 70/100 [00:04<00:01, 16.83it/s][A
|
|
|
|
| 1889 |
72%|βββ| 72/100 [00:04<00:01, 17.35it/s][A
|
|
|
|
| 1890 |
74%|βββ| 74/100 [00:04<00:01, 16.46it/s][A
|
|
|
|
| 1891 |
77%|βββ| 77/100 [00:04<00:01, 17.09it/s][A
|
|
|
|
| 1892 |
79%|βββ| 79/100 [00:04<00:01, 17.50it/s][A
|
|
|
|
| 1893 |
81%|βββ| 81/100 [00:04<00:01, 17.24it/s][A
|
|
|
|
| 1894 |
84%|βββ| 84/100 [00:04<00:00, 18.46it/s][A
|
|
|
|
| 1895 |
86%|βββ| 86/100 [00:04<00:00, 17.71it/s][A
|
|
|
|
| 1896 |
89%|βββ| 89/100 [00:05<00:00, 17.91it/s][A
|
|
|
|
| 1897 |
91%|βββ| 91/100 [00:05<00:00, 18.25it/s][A
|
|
|
|
| 1898 |
93%|βββ| 93/100 [00:05<00:00, 17.15it/s][A
|
|
|
|
| 1899 |
95%|βββ| 95/100 [00:05<00:00, 16.80it/s][A
|
|
|
|
| 1900 |
97%|βββ| 97/100 [00:05<00:00, 16.88it/s][A
|
|
|
|
| 1901 |
|
|
|
|
| 1902 |
|
|
|
|
| 1903 |
70%|β| 700/1000 [12:10<03:40, 1.36it/s]
|
|
|
|
|
|
|
| 1904 |
[A[2026-03-30 14:47:24,258] [INFO] [axolotl.core.trainers.base._save:722] [PID:37135] Saving model checkpoint to /workspace/data/axolotl-outputs/sft/gemma-2-2b-it-rp-sft-qlora/checkpoint-700
|
|
|
|
| 1905 |
70%|β| 701/1000 [12:13<16:05, 3.23s/it]
|
| 1906 |
|
|
|
|
| 1907 |
70%|β| 701/1000 [12:13<16:05, 3.23s/it]
|
| 1908 |
70%|β| 702/1000 [12:13<12:18, 2.48s/it]
|
| 1909 |
|
|
|
|
| 1910 |
70%|β| 702/1000 [12:13<12:18, 2.48s/it]
|
| 1911 |
70%|β| 703/1000 [12:14<09:46, 1.97s/it]
|
| 1912 |
|
|
|
|
| 1913 |
70%|β| 703/1000 [12:14<09:46, 1.97s/it]
|
| 1914 |
70%|β| 704/1000 [12:15<07:54, 1.60s/it]
|
| 1915 |
|
|
|
|
| 1916 |
70%|β| 704/1000 [12:15<07:54, 1.60s/it]
|
|
|
|
| 1702 |
|
| 1703 |
65%|β| 651/1000 [11:27<20:16, 3.49s/it]
|
| 1704 |
65%|β| 652/1000 [11:28<15:29, 2.67s/it]
|
| 1705 |
|
| 1706 |
|
| 1707 |
65%|β| 652/1000 [11:28<15:29, 2.67s/it]
|
| 1708 |
65%|β| 653/1000 [11:29<12:08, 2.10s/it]
|
| 1709 |
|
| 1710 |
|
| 1711 |
65%|β| 653/1000 [11:29<12:08, 2.10s/it]
|
| 1712 |
65%|β| 654/1000 [11:29<09:50, 1.71s/it]
|
| 1713 |
|
|
|
|
| 1714 |
65%|β| 654/1000 [11:29<09:50, 1.71s/it]
|
| 1715 |
+
|
| 1716 |
65%|β| 654/1000 [11:29<09:50, 1.71s/it]
|
| 1717 |
66%|β| 655/1000 [11:30<08:09, 1.42s/it]
|
| 1718 |
|
| 1719 |
+
|
| 1720 |
66%|β| 655/1000 [11:30<08:09, 1.42s/it]
|
| 1721 |
66%|β| 656/1000 [11:31<06:58, 1.22s/it]
|
| 1722 |
|
| 1723 |
+
|
| 1724 |
66%|β| 656/1000 [11:31<06:58, 1.22s/it]
|
| 1725 |
66%|β| 657/1000 [11:32<06:09, 1.08s/it]
|
| 1726 |
|
| 1727 |
+
|
| 1728 |
66%|β| 657/1000 [11:32<06:09, 1.08s/it]
|
| 1729 |
66%|β| 658/1000 [11:32<05:34, 1.02it/s]
|
| 1730 |
|
| 1731 |
+
|
| 1732 |
66%|β| 658/1000 [11:32<05:34, 1.02it/s]
|
| 1733 |
66%|β| 659/1000 [11:33<05:04, 1.12it/s]
|
| 1734 |
|
| 1735 |
+
|
| 1736 |
66%|β| 659/1000 [11:33<05:04, 1.12it/s]
|
| 1737 |
66%|β| 660/1000 [11:34<04:47, 1.18it/s]
|
| 1738 |
|
| 1739 |
+
|
| 1740 |
66%|β| 660/1000 [11:34<04:47, 1.18it/s]
|
| 1741 |
66%|β| 661/1000 [11:35<04:32, 1.24it/s]
|
| 1742 |
|
| 1743 |
+
|
| 1744 |
66%|β| 661/1000 [11:35<04:32, 1.24it/s]
|
| 1745 |
66%|β| 662/1000 [11:35<04:19, 1.30it/s]
|
| 1746 |
|
| 1747 |
+
|
| 1748 |
66%|β| 662/1000 [11:35<04:19, 1.30it/s]
|
| 1749 |
66%|β| 663/1000 [11:36<04:18, 1.30it/s]
|
| 1750 |
|
| 1751 |
+
|
| 1752 |
66%|β| 663/1000 [11:36<04:18, 1.30it/s]
|
| 1753 |
66%|β| 664/1000 [11:37<04:19, 1.30it/s]
|
| 1754 |
|
| 1755 |
+
|
| 1756 |
66%|β| 664/1000 [11:37<04:19, 1.30it/s]
|
| 1757 |
66%|β| 665/1000 [11:38<04:21, 1.28it/s]
|
| 1758 |
|
| 1759 |
+
|
| 1760 |
66%|β| 665/1000 [11:38<04:21, 1.28it/s]
|
| 1761 |
67%|β| 666/1000 [11:38<04:19, 1.29it/s]
|
| 1762 |
|
| 1763 |
+
|
| 1764 |
67%|β| 666/1000 [11:38<04:19, 1.29it/s]
|
| 1765 |
67%|β| 667/1000 [11:39<04:14, 1.31it/s]
|
| 1766 |
|
| 1767 |
+
|
| 1768 |
67%|β| 667/1000 [11:39<04:14, 1.31it/s]
|
| 1769 |
67%|β| 668/1000 [11:40<04:13, 1.31it/s]
|
| 1770 |
|
| 1771 |
+
|
| 1772 |
67%|β| 668/1000 [11:40<04:13, 1.31it/s]
|
| 1773 |
67%|β| 669/1000 [11:41<04:14, 1.30it/s]
|
| 1774 |
|
| 1775 |
+
|
| 1776 |
67%|β| 669/1000 [11:41<04:14, 1.30it/s]
|
| 1777 |
67%|β| 670/1000 [11:41<04:12, 1.30it/s]
|
| 1778 |
|
| 1779 |
+
|
| 1780 |
67%|β| 670/1000 [11:41<04:12, 1.30it/s]
|
| 1781 |
67%|β| 671/1000 [11:42<04:11, 1.31it/s]
|
| 1782 |
|
| 1783 |
+
|
| 1784 |
67%|β| 671/1000 [11:42<04:11, 1.31it/s]
|
| 1785 |
67%|β| 672/1000 [11:43<04:09, 1.31it/s]
|
| 1786 |
|
| 1787 |
+
|
| 1788 |
67%|β| 672/1000 [11:43<04:09, 1.31it/s]
|
| 1789 |
67%|β| 673/1000 [11:44<04:07, 1.32it/s]
|
| 1790 |
|
| 1791 |
+
|
| 1792 |
67%|β| 673/1000 [11:44<04:07, 1.32it/s]
|
| 1793 |
67%|β| 674/1000 [11:44<04:02, 1.34it/s]
|
| 1794 |
|
| 1795 |
+
|
| 1796 |
67%|β| 674/1000 [11:44<04:02, 1.34it/s]
|
| 1797 |
68%|β| 675/1000 [11:45<04:00, 1.35it/s]
|
| 1798 |
|
| 1799 |
+
|
| 1800 |
68%|β| 675/1000 [11:45<04:00, 1.35it/s]
|
| 1801 |
68%|β| 676/1000 [11:46<04:03, 1.33it/s]
|
| 1802 |
|
| 1803 |
+
|
| 1804 |
68%|β| 676/1000 [11:46<04:03, 1.33it/s]
|
| 1805 |
68%|β| 677/1000 [11:47<04:03, 1.33it/s]
|
| 1806 |
|
| 1807 |
+
|
| 1808 |
68%|β| 677/1000 [11:47<04:03, 1.33it/s]
|
| 1809 |
68%|β| 678/1000 [11:47<04:02, 1.33it/s]
|
| 1810 |
|
| 1811 |
+
|
| 1812 |
68%|β| 678/1000 [11:47<04:02, 1.33it/s]
|
| 1813 |
68%|β| 679/1000 [11:48<04:02, 1.32it/s]
|
| 1814 |
|
| 1815 |
+
|
| 1816 |
68%|β| 679/1000 [11:48<04:02, 1.32it/s]
|
| 1817 |
68%|β| 680/1000 [11:49<04:04, 1.31it/s]
|
| 1818 |
|
| 1819 |
+
|
| 1820 |
68%|β| 680/1000 [11:49<04:04, 1.31it/s]
|
| 1821 |
68%|β| 681/1000 [11:50<03:59, 1.33it/s]
|
| 1822 |
|
| 1823 |
+
|
| 1824 |
68%|β| 681/1000 [11:50<03:59, 1.33it/s]
|
| 1825 |
68%|β| 682/1000 [11:50<03:56, 1.34it/s]
|
| 1826 |
|
| 1827 |
+
|
| 1828 |
68%|β| 682/1000 [11:50<03:56, 1.34it/s]
|
| 1829 |
68%|β| 683/1000 [11:51<03:56, 1.34it/s]
|
| 1830 |
|
| 1831 |
+
|
| 1832 |
68%|β| 683/1000 [11:51<03:56, 1.34it/s]
|
| 1833 |
68%|β| 684/1000 [11:52<03:48, 1.38it/s]
|
| 1834 |
|
| 1835 |
+
|
| 1836 |
68%|β| 684/1000 [11:52<03:48, 1.38it/s]
|
| 1837 |
68%|β| 685/1000 [11:53<03:49, 1.37it/s]
|
| 1838 |
|
| 1839 |
+
|
| 1840 |
68%|β| 685/1000 [11:53<03:49, 1.37it/s]
|
| 1841 |
69%|β| 686/1000 [11:53<03:53, 1.35it/s]
|
| 1842 |
|
| 1843 |
+
|
| 1844 |
69%|β| 686/1000 [11:53<03:53, 1.35it/s]
|
| 1845 |
69%|β| 687/1000 [11:54<03:57, 1.32it/s]
|
| 1846 |
|
| 1847 |
+
|
| 1848 |
69%|β| 687/1000 [11:54<03:57, 1.32it/s]
|
| 1849 |
69%|β| 688/1000 [11:55<03:48, 1.36it/s]
|
| 1850 |
|
| 1851 |
+
|
| 1852 |
69%|β| 688/1000 [11:55<03:48, 1.36it/s]
|
| 1853 |
69%|β| 689/1000 [11:56<03:52, 1.34it/s]
|
| 1854 |
|
| 1855 |
+
|
| 1856 |
69%|β| 689/1000 [11:56<03:52, 1.34it/s]
|
| 1857 |
69%|β| 690/1000 [11:56<03:49, 1.35it/s]
|
| 1858 |
|
| 1859 |
+
|
| 1860 |
69%|β| 690/1000 [11:56<03:49, 1.35it/s]
|
| 1861 |
69%|β| 691/1000 [11:57<03:46, 1.36it/s]
|
| 1862 |
|
| 1863 |
+
|
| 1864 |
69%|β| 691/1000 [11:57<03:46, 1.36it/s]
|
| 1865 |
69%|β| 692/1000 [11:58<03:50, 1.34it/s]
|
| 1866 |
|
| 1867 |
+
|
| 1868 |
69%|β| 692/1000 [11:58<03:50, 1.34it/s]
|
| 1869 |
69%|β| 693/1000 [11:59<03:52, 1.32it/s]
|
| 1870 |
|
| 1871 |
+
|
| 1872 |
69%|β| 693/1000 [11:59<03:52, 1.32it/s]
|
| 1873 |
69%|β| 694/1000 [11:59<03:45, 1.35it/s]
|
| 1874 |
|
| 1875 |
+
|
| 1876 |
69%|β| 694/1000 [11:59<03:45, 1.35it/s]
|
| 1877 |
70%|β| 695/1000 [12:00<03:45, 1.35it/s]
|
| 1878 |
|
| 1879 |
+
|
| 1880 |
70%|β| 695/1000 [12:00<03:45, 1.35it/s]
|
| 1881 |
70%|β| 696/1000 [12:01<03:42, 1.36it/s]
|
| 1882 |
|
| 1883 |
+
|
| 1884 |
70%|β| 696/1000 [12:01<03:42, 1.36it/s]
|
| 1885 |
70%|β| 697/1000 [12:01<03:39, 1.38it/s]
|
| 1886 |
|
| 1887 |
+
|
| 1888 |
70%|β| 697/1000 [12:01<03:39, 1.38it/s]
|
| 1889 |
70%|β| 698/1000 [12:02<03:39, 1.38it/s]
|
| 1890 |
|
| 1891 |
+
|
| 1892 |
70%|β| 698/1000 [12:02<03:39, 1.38it/s]
|
| 1893 |
70%|β| 699/1000 [12:03<03:42, 1.35it/s]
|
| 1894 |
|
| 1895 |
+
|
| 1896 |
70%|β| 699/1000 [12:03<03:42, 1.35it/s]
|
| 1897 |
70%|β| 700/1000 [12:04<03:40, 1.36it/s]
|
| 1898 |
|
| 1899 |
+
|
| 1900 |
70%|β| 700/1000 [12:04<03:40, 1.36it/s][2026-03-30 14:47:18,171] [INFO] [axolotl.core.trainers.base.evaluate:401] [PID:37135] Running evaluation step...
|
| 1901 |
+
|
| 1902 |
+
|
| 1903 |
0%| | 0/100 [00:00<?, ?it/s][A
|
| 1904 |
+
|
| 1905 |
3%| | 3/100 [00:00<00:03, 29.32it/s][A
|
| 1906 |
+
|
| 1907 |
6%|β | 6/100 [00:00<00:05, 17.31it/s][A
|
| 1908 |
+
|
| 1909 |
8%|β | 8/100 [00:00<00:05, 17.08it/s][A
|
| 1910 |
+
|
| 1911 |
10%|β | 10/100 [00:00<00:05, 16.26it/s][A
|
| 1912 |
+
|
| 1913 |
12%|β | 12/100 [00:00<00:05, 17.18it/s][A
|
| 1914 |
+
|
| 1915 |
14%|β | 14/100 [00:00<00:05, 17.01it/s][A
|
| 1916 |
+
|
| 1917 |
16%|β | 16/100 [00:00<00:04, 16.97it/s][A
|
| 1918 |
+
|
| 1919 |
18%|β | 18/100 [00:01<00:04, 17.16it/s][A
|
| 1920 |
+
|
| 1921 |
20%|β | 20/100 [00:01<00:04, 17.47it/s][A
|
| 1922 |
+
|
| 1923 |
22%|β | 22/100 [00:01<00:04, 17.03it/s][A
|
| 1924 |
+
|
| 1925 |
24%|β | 24/100 [00:01<00:04, 17.66it/s][A
|
| 1926 |
+
|
| 1927 |
26%|β | 26/100 [00:01<00:04, 17.05it/s][A
|
| 1928 |
+
|
| 1929 |
28%|β | 28/100 [00:01<00:04, 17.07it/s][A
|
| 1930 |
+
|
| 1931 |
30%|β | 30/100 [00:01<00:04, 16.62it/s][A
|
| 1932 |
+
|
| 1933 |
32%|β | 32/100 [00:01<00:04, 16.72it/s][A
|
| 1934 |
+
|
| 1935 |
34%|β | 34/100 [00:01<00:03, 16.86it/s][A
|
| 1936 |
+
|
| 1937 |
37%|β | 37/100 [00:02<00:03, 17.25it/s][A
|
| 1938 |
+
|
| 1939 |
39%|ββ | 39/100 [00:02<00:03, 17.24it/s][A
|
| 1940 |
+
|
| 1941 |
41%|ββ | 41/100 [00:02<00:03, 17.45it/s][A
|
| 1942 |
+
|
| 1943 |
44%|ββ | 44/100 [00:02<00:03, 18.13it/s][A
|
| 1944 |
+
|
| 1945 |
46%|ββ | 46/100 [00:02<00:03, 17.25it/s][A
|
| 1946 |
+
|
| 1947 |
48%|ββ | 48/100 [00:02<00:02, 17.67it/s][A
|
| 1948 |
+
|
| 1949 |
50%|ββ | 50/100 [00:02<00:02, 17.02it/s][A
|
| 1950 |
+
|
| 1951 |
52%|ββ | 52/100 [00:03<00:02, 17.00it/s][A
|
| 1952 |
+
|
| 1953 |
54%|ββ | 54/100 [00:03<00:02, 16.32it/s][A
|
| 1954 |
+
|
| 1955 |
56%|ββ | 56/100 [00:03<00:02, 16.62it/s][A
|
| 1956 |
+
|
| 1957 |
58%|ββ | 58/100 [00:03<00:02, 16.50it/s][A
|
| 1958 |
+
|
| 1959 |
60%|ββ | 60/100 [00:03<00:02, 16.98it/s][A
|
| 1960 |
+
|
| 1961 |
62%|ββ | 62/100 [00:03<00:02, 17.41it/s][A
|
| 1962 |
+
|
| 1963 |
64%|ββ | 64/100 [00:03<00:02, 17.57it/s][A
|
| 1964 |
+
|
| 1965 |
66%|ββ | 66/100 [00:03<00:02, 16.85it/s][A
|
| 1966 |
+
|
| 1967 |
68%|ββ | 68/100 [00:03<00:01, 17.41it/s][A
|
| 1968 |
+
|
| 1969 |
70%|ββ | 70/100 [00:04<00:01, 16.83it/s][A
|
| 1970 |
+
|
| 1971 |
72%|βββ| 72/100 [00:04<00:01, 17.35it/s][A
|
| 1972 |
+
|
| 1973 |
74%|βββ| 74/100 [00:04<00:01, 16.46it/s][A
|
| 1974 |
+
|
| 1975 |
77%|βββ| 77/100 [00:04<00:01, 17.09it/s][A
|
| 1976 |
+
|
| 1977 |
79%|βββ| 79/100 [00:04<00:01, 17.50it/s][A
|
| 1978 |
+
|
| 1979 |
81%|βββ| 81/100 [00:04<00:01, 17.24it/s][A
|
| 1980 |
+
|
| 1981 |
84%|βββ| 84/100 [00:04<00:00, 18.46it/s][A
|
| 1982 |
+
|
| 1983 |
86%|βββ| 86/100 [00:04<00:00, 17.71it/s][A
|
| 1984 |
+
|
| 1985 |
89%|βββ| 89/100 [00:05<00:00, 17.91it/s][A
|
| 1986 |
+
|
| 1987 |
91%|βββ| 91/100 [00:05<00:00, 18.25it/s][A
|
| 1988 |
+
|
| 1989 |
93%|βββ| 93/100 [00:05<00:00, 17.15it/s][A
|
| 1990 |
+
|
| 1991 |
95%|βββ| 95/100 [00:05<00:00, 16.80it/s][A
|
| 1992 |
+
|
| 1993 |
97%|βββ| 97/100 [00:05<00:00, 16.88it/s][A
|
| 1994 |
+
|
| 1995 |
|
| 1996 |
+
|
| 1997 |
|
| 1998 |
+
|
| 1999 |
70%|β| 700/1000 [12:10<03:40, 1.36it/s]
|
| 2000 |
+
|
| 2001 |
+
|
| 2002 |
[A[2026-03-30 14:47:24,258] [INFO] [axolotl.core.trainers.base._save:722] [PID:37135] Saving model checkpoint to /workspace/data/axolotl-outputs/sft/gemma-2-2b-it-rp-sft-qlora/checkpoint-700
|
| 2003 |
+
|
| 2004 |
70%|β| 701/1000 [12:13<16:05, 3.23s/it]
|
| 2005 |
|
| 2006 |
+
|
| 2007 |
70%|β| 701/1000 [12:13<16:05, 3.23s/it]
|
| 2008 |
70%|β| 702/1000 [12:13<12:18, 2.48s/it]
|
| 2009 |
|
| 2010 |
+
|
| 2011 |
70%|β| 702/1000 [12:13<12:18, 2.48s/it]
|
| 2012 |
70%|β| 703/1000 [12:14<09:46, 1.97s/it]
|
| 2013 |
|
| 2014 |
+
|
| 2015 |
70%|β| 703/1000 [12:14<09:46, 1.97s/it]
|
| 2016 |
70%|β| 704/1000 [12:15<07:54, 1.60s/it]
|
| 2017 |
|
| 2018 |
+
|
| 2019 |
70%|β| 704/1000 [12:15<07:54, 1.60s/it]
|