Text Generation
PEFT
Safetensors
Transformers
gemma2
axolotl
lora
conversational
text-generation-inference
4-bit precision
bitsandbytes
Instructions to use AiAF/rp-2b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use AiAF/rp-2b with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("google/gemma-2-2b-it") model = PeftModel.from_pretrained(base_model, "AiAF/rp-2b") - Transformers
How to use AiAF/rp-2b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="AiAF/rp-2b") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("AiAF/rp-2b") model = AutoModelForCausalLM.from_pretrained("AiAF/rp-2b") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use AiAF/rp-2b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "AiAF/rp-2b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AiAF/rp-2b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/AiAF/rp-2b
- SGLang
How to use AiAF/rp-2b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "AiAF/rp-2b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AiAF/rp-2b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "AiAF/rp-2b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AiAF/rp-2b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use AiAF/rp-2b with Docker Model Runner:
docker model run hf.co/AiAF/rp-2b
Training in progress, step 750
Browse files- adapter_model.safetensors +1 -1
- debug.log +104 -1
adapter_model.safetensors
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 102264160
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:f145aad3e393aacb1ea6687fe5c794bd1505c6b68c50e5038c6eac34efa7e4d6
|
| 3 |
size 102264160
|
debug.log
CHANGED
|
@@ -1805,4 +1805,107 @@ trainable params: 25,559,040 || all params: 2,639,900,928 || trainable%: 0.9682
|
|
| 1805 |
|
| 1806 |
70%|β| 701/1000 [12:13<16:05, 3.23s/it]
|
| 1807 |
70%|β| 702/1000 [12:13<12:18, 2.48s/it]
|
| 1808 |
|
| 1809 |
|
| 1810 |
70%|β| 702/1000 [12:13<12:18, 2.48s/it]
|
| 1811 |
70%|β| 703/1000 [12:14<09:46, 1.97s/it]
|
| 1812 |
|
| 1813 |
|
| 1814 |
70%|β| 703/1000 [12:14<09:46, 1.97s/it]
|
| 1815 |
70%|β| 704/1000 [12:15<07:54, 1.60s/it]
|
| 1816 |
|
| 1817 |
-
|
| 1818 |
70%|β| 704/1000 [12:15<07:54, 1.60s/it]
|
|
|
|
| 1819 |
70%|β| 704/1000 [12:15<07:54, 1.60s/it]
|
| 1820 |
70%|β| 705/1000 [12:16<06:40, 1.36s/it]
|
| 1821 |
|
|
|
|
| 1822 |
70%|β| 705/1000 [12:16<06:40, 1.36s/it]
|
| 1823 |
71%|β| 706/1000 [12:17<05:45, 1.18s/it]
|
| 1824 |
|
|
|
|
| 1825 |
71%|β| 706/1000 [12:17<05:45, 1.18s/it]
|
| 1826 |
71%|β| 707/1000 [12:17<05:04, 1.04s/it]
|
| 1827 |
|
|
|
|
| 1828 |
71%|β| 707/1000 [12:17<05:04, 1.04s/it]
|
| 1829 |
71%|β| 708/1000 [12:18<04:37, 1.05it/s]
|
| 1830 |
|
|
|
|
| 1831 |
71%|β| 708/1000 [12:18<04:37, 1.05it/s]
|
| 1832 |
71%|β| 709/1000 [12:19<04:20, 1.12it/s]
|
| 1833 |
|
|
|
|
| 1834 |
71%|β| 709/1000 [12:19<04:20, 1.12it/s]
|
| 1835 |
71%|β| 710/1000 [12:19<04:03, 1.19it/s]
|
| 1836 |
|
|
|
|
| 1837 |
71%|β| 710/1000 [12:19<04:03, 1.19it/s]
|
| 1838 |
71%|β| 711/1000 [12:20<03:53, 1.24it/s]
|
| 1839 |
|
|
|
|
| 1840 |
71%|β| 711/1000 [12:20<03:53, 1.24it/s]
|
| 1841 |
71%|β| 712/1000 [12:21<03:46, 1.27it/s]
|
| 1842 |
|
|
|
|
| 1843 |
71%|β| 712/1000 [12:21<03:46, 1.27it/s]
|
| 1844 |
71%|β| 713/1000 [12:22<03:45, 1.27it/s]
|
| 1845 |
|
|
|
|
| 1846 |
71%|β| 713/1000 [12:22<03:45, 1.27it/s]
|
| 1847 |
71%|β| 714/1000 [12:23<03:43, 1.28it/s]
|
| 1848 |
|
|
|
|
| 1849 |
71%|β| 714/1000 [12:23<03:43, 1.28it/s]
|
| 1850 |
72%|β| 715/1000 [12:23<03:40, 1.29it/s]
|
| 1851 |
|
|
|
|
| 1852 |
72%|β| 715/1000 [12:23<03:40, 1.29it/s]
|
| 1853 |
72%|β| 716/1000 [12:24<03:38, 1.30it/s]
|
| 1854 |
|
|
|
|
| 1855 |
72%|β| 716/1000 [12:24<03:38, 1.30it/s]
|
| 1856 |
72%|β| 717/1000 [12:25<03:35, 1.31it/s]
|
| 1857 |
|
|
|
|
| 1858 |
72%|β| 717/1000 [12:25<03:35, 1.31it/s]
|
| 1859 |
72%|β| 718/1000 [12:25<03:30, 1.34it/s]
|
| 1860 |
|
|
|
|
| 1861 |
72%|β| 718/1000 [12:25<03:30, 1.34it/s]
|
| 1862 |
72%|β| 719/1000 [12:26<03:29, 1.34it/s]
|
| 1863 |
|
|
|
|
| 1864 |
72%|β| 719/1000 [12:26<03:29, 1.34it/s]
|
| 1865 |
72%|β| 720/1000 [12:27<03:26, 1.36it/s]
|
| 1866 |
|
|
|
|
| 1867 |
72%|β| 720/1000 [12:27<03:26, 1.36it/s]
|
| 1868 |
72%|β| 721/1000 [12:28<03:27, 1.35it/s]
|
| 1869 |
|
|
|
|
| 1870 |
72%|β| 721/1000 [12:28<03:27, 1.35it/s]
|
| 1871 |
72%|β| 722/1000 [12:28<03:25, 1.35it/s]
|
| 1872 |
|
|
|
|
| 1873 |
72%|β| 722/1000 [12:28<03:25, 1.35it/s]
|
| 1874 |
72%|β| 723/1000 [12:29<03:26, 1.34it/s]
|
| 1875 |
|
|
|
|
| 1876 |
72%|β| 723/1000 [12:29<03:26, 1.34it/s]
|
| 1877 |
72%|β| 724/1000 [12:30<03:28, 1.33it/s]
|
| 1878 |
|
|
|
|
| 1879 |
72%|β| 724/1000 [12:30<03:28, 1.33it/s]
|
| 1880 |
72%|β| 725/1000 [12:31<03:24, 1.35it/s]
|
| 1881 |
|
|
|
|
| 1882 |
72%|β| 725/1000 [12:31<03:24, 1.35it/s]
|
| 1883 |
73%|β| 726/1000 [12:31<03:16, 1.39it/s]
|
| 1884 |
|
|
|
|
| 1885 |
73%|β| 726/1000 [12:31<03:16, 1.39it/s]
|
| 1886 |
73%|β| 727/1000 [12:32<03:15, 1.39it/s]
|
| 1887 |
|
|
|
|
| 1888 |
73%|β| 727/1000 [12:32<03:15, 1.39it/s]
|
| 1889 |
73%|β| 728/1000 [12:33<03:17, 1.38it/s]
|
| 1890 |
|
|
|
|
| 1891 |
73%|β| 728/1000 [12:33<03:17, 1.38it/s]
|
| 1892 |
73%|β| 729/1000 [12:34<03:18, 1.36it/s]
|
| 1893 |
|
|
|
|
| 1894 |
73%|β| 729/1000 [12:34<03:18, 1.36it/s]
|
| 1895 |
73%|β| 730/1000 [12:34<03:23, 1.33it/s]
|
| 1896 |
|
|
|
|
| 1897 |
73%|β| 730/1000 [12:34<03:23, 1.33it/s]
|
| 1898 |
73%|β| 731/1000 [12:35<03:21, 1.34it/s]
|
| 1899 |
|
|
|
|
| 1900 |
73%|β| 731/1000 [12:35<03:21, 1.34it/s]
|
| 1901 |
73%|β| 732/1000 [12:36<03:23, 1.32it/s]
|
| 1902 |
|
|
|
|
| 1903 |
73%|β| 732/1000 [12:36<03:23, 1.32it/s]
|
| 1904 |
73%|β| 733/1000 [12:37<03:19, 1.34it/s]
|
| 1905 |
|
|
|
|
| 1906 |
73%|β| 733/1000 [12:37<03:19, 1.34it/s]
|
| 1907 |
73%|β| 734/1000 [12:37<03:17, 1.35it/s]
|
| 1908 |
|
|
|
|
| 1909 |
73%|β| 734/1000 [12:37<03:17, 1.35it/s]
|
| 1910 |
74%|β| 735/1000 [12:38<03:17, 1.34it/s]
|
| 1911 |
|
|
|
|
| 1912 |
74%|β| 735/1000 [12:38<03:17, 1.34it/s]
|
| 1913 |
74%|β| 736/1000 [12:39<03:16, 1.34it/s]
|
| 1914 |
|
|
|
|
| 1915 |
74%|β| 736/1000 [12:39<03:16, 1.34it/s]
|
| 1916 |
74%|β| 737/1000 [12:40<03:16, 1.34it/s]
|
| 1917 |
|
|
|
|
| 1918 |
74%|β| 737/1000 [12:40<03:16, 1.34it/s]
|
| 1919 |
74%|β| 738/1000 [12:40<03:15, 1.34it/s]
|
| 1920 |
|
|
|
|
| 1921 |
74%|β| 738/1000 [12:40<03:15, 1.34it/s]
|
| 1922 |
74%|β| 739/1000 [12:41<03:17, 1.32it/s]
|
| 1923 |
|
|
|
|
| 1924 |
74%|β| 739/1000 [12:41<03:17, 1.32it/s]
|
| 1925 |
74%|β| 740/1000 [12:42<03:18, 1.31it/s]
|
| 1926 |
|
|
|
|
| 1927 |
74%|β| 740/1000 [12:42<03:18, 1.31it/s]
|
| 1928 |
74%|β| 741/1000 [12:43<03:18, 1.31it/s]
|
| 1929 |
|
|
|
|
| 1930 |
74%|β| 741/1000 [12:43<03:18, 1.31it/s]
|
| 1931 |
74%|β| 742/1000 [12:43<03:12, 1.34it/s]
|
| 1932 |
|
|
|
|
| 1933 |
74%|β| 742/1000 [12:43<03:12, 1.34it/s]
|
| 1934 |
74%|β| 743/1000 [12:44<03:11, 1.34it/s]
|
| 1935 |
|
|
|
|
| 1936 |
74%|β| 743/1000 [12:44<03:11, 1.34it/s]
|
| 1937 |
74%|β| 744/1000 [12:45<03:09, 1.35it/s]
|
| 1938 |
|
|
|
|
| 1939 |
74%|β| 744/1000 [12:45<03:09, 1.35it/s]
|
| 1940 |
74%|β| 745/1000 [12:46<03:09, 1.35it/s]
|
| 1941 |
|
|
|
|
| 1942 |
74%|β| 745/1000 [12:46<03:09, 1.35it/s]
|
| 1943 |
75%|β| 746/1000 [12:46<03:08, 1.35it/s]
|
| 1944 |
|
|
|
|
| 1945 |
75%|β| 746/1000 [12:46<03:08, 1.35it/s]
|
| 1946 |
75%|β| 747/1000 [12:47<03:07, 1.35it/s]
|
| 1947 |
|
|
|
|
| 1948 |
75%|β| 747/1000 [12:47<03:07, 1.35it/s]
|
| 1949 |
75%|β| 748/1000 [12:48<03:07, 1.35it/s]
|
| 1950 |
|
|
|
|
| 1951 |
75%|β| 748/1000 [12:48<03:07, 1.35it/s]
|
| 1952 |
75%|β| 749/1000 [12:49<03:10, 1.32it/s]
|
| 1953 |
|
|
|
|
| 1954 |
75%|β| 749/1000 [12:49<03:10, 1.32it/s]
|
| 1955 |
75%|β| 750/1000 [12:49<03:08, 1.33it/s]
|
| 1956 |
|
|
|
|
| 1957 |
75%|β| 750/1000 [12:49<03:08, 1.33it/s][2026-03-30 14:48:03,796] [INFO] [axolotl.core.trainers.base.evaluate:401] [PID:37135] Running evaluation step...
|
|
|
|
|
|
|
| 1958 |
0%| | 0/100 [00:00<?, ?it/s][A
|
|
|
|
| 1959 |
3%| | 3/100 [00:00<00:03, 28.04it/s][A
|
|
|
|
| 1960 |
6%|β | 6/100 [00:00<00:05, 16.57it/s][A
|
|
|
|
| 1961 |
8%|β | 8/100 [00:00<00:05, 16.62it/s][A
|
|
|
|
| 1962 |
10%|β | 10/100 [00:00<00:05, 15.99it/s][A
|
|
|
|
| 1963 |
12%|β | 12/100 [00:00<00:05, 16.99it/s][A
|
|
|
|
| 1964 |
14%|β | 14/100 [00:00<00:05, 16.89it/s][A
|
|
|
|
| 1965 |
16%|β | 16/100 [00:00<00:04, 16.99it/s][A
|
|
|
|
| 1966 |
18%|β | 18/100 [00:01<00:04, 17.21it/s][A
|
|
|
|
| 1967 |
20%|β | 20/100 [00:01<00:04, 17.51it/s][A
|
|
|
|
| 1968 |
22%|β | 22/100 [00:01<00:04, 17.05it/s][A
|
|
|
|
| 1969 |
24%|β | 24/100 [00:01<00:04, 17.68it/s][A
|
|
|
|
| 1970 |
26%|β | 26/100 [00:01<00:04, 17.06it/s][A
|
|
|
|
| 1971 |
28%|β | 28/100 [00:01<00:04, 17.10it/s][A
|
|
|
|
| 1972 |
30%|β | 30/100 [00:01<00:04, 16.71it/s][A
|
|
|
|
| 1973 |
32%|β | 32/100 [00:01<00:04, 16.79it/s][A
|
|
|
|
| 1974 |
34%|β | 34/100 [00:01<00:03, 16.91it/s][A
|
|
|
|
| 1975 |
37%|β | 37/100 [00:02<00:03, 17.30it/s][A
|
|
|
|
| 1976 |
39%|ββ | 39/100 [00:02<00:03, 17.32it/s][A
|
|
|
|
| 1977 |
41%|ββ | 41/100 [00:02<00:03, 17.54it/s][A
|
|
|
|
| 1978 |
44%|ββ | 44/100 [00:02<00:03, 18.21it/s][A
|
|
|
|
| 1979 |
46%|ββ | 46/100 [00:02<00:03, 17.31it/s][A
|
|
|
|
| 1980 |
48%|ββ | 48/100 [00:02<00:02, 17.71it/s][A
|
|
|
|
| 1981 |
50%|ββ | 50/100 [00:02<00:02, 17.06it/s][A
|
|
|
|
| 1982 |
52%|ββ | 52/100 [00:03<00:02, 17.04it/s][A
|
|
|
|
| 1983 |
54%|ββ | 54/100 [00:03<00:02, 16.32it/s][A
|
|
|
|
| 1984 |
56%|ββ | 56/100 [00:03<00:02, 16.65it/s][A
|
|
|
|
| 1985 |
58%|ββ | 58/100 [00:03<00:02, 16.54it/s][A
|
|
|
|
| 1986 |
60%|ββ | 60/100 [00:03<00:02, 17.01it/s][A
|
|
|
|
| 1987 |
62%|ββ | 62/100 [00:03<00:02, 17.45it/s][A
|
|
|
|
| 1988 |
64%|ββ | 64/100 [00:03<00:02, 17.67it/s][A
|
|
|
|
| 1989 |
66%|ββ | 66/100 [00:03<00:02, 16.87it/s][A
|
|
|
|
| 1990 |
68%|ββ | 68/100 [00:03<00:01, 17.43it/s][A
|
|
|
|
| 1991 |
70%|ββ | 70/100 [00:04<00:01, 16.84it/s][A
|
|
|
|
| 1992 |
72%|βββ| 72/100 [00:04<00:01, 17.34it/s][A
|
|
|
|
| 1993 |
74%|βββ| 74/100 [00:04<00:01, 16.45it/s][A
|
|
|
|
| 1994 |
77%|βββ| 77/100 [00:04<00:01, 17.09it/s][A
|
|
|
|
| 1995 |
79%|βββ| 79/100 [00:04<00:01, 17.51it/s][A
|
|
|
|
| 1996 |
81%|βββ| 81/100 [00:04<00:01, 16.77it/s][A
|
|
|
|
| 1997 |
84%|βββ| 84/100 [00:04<00:00, 18.11it/s][A
|
|
|
|
| 1998 |
86%|βββ| 86/100 [00:04<00:00, 17.51it/s][A
|
|
|
|
| 1999 |
89%|βββ| 89/100 [00:05<00:00, 17.36it/s][A
|
|
|
|
| 2000 |
91%|βββ| 91/100 [00:05<00:00, 17.85it/s][A
|
|
|
|
| 2001 |
93%|βββ| 93/100 [00:05<00:00, 16.91it/s][A
|
|
|
|
| 2002 |
95%|βββ| 95/100 [00:05<00:00, 16.63it/s][A
|
|
|
|
| 2003 |
97%|βββ| 97/100 [00:05<00:00, 16.76it/s][A
|
|
|
|
| 2004 |
|
|
|
|
| 2005 |
|
|
|
|
| 2006 |
75%|β| 750/1000 [12:55<03:08, 1.33it/s]
|
|
|
|
|
|
|
| 2007 |
[A[2026-03-30 14:48:09,865] [INFO] [axolotl.core.trainers.base._save:722] [PID:37135] Saving model checkpoint to /workspace/data/axolotl-outputs/sft/gemma-2-2b-it-rp-sft-qlora/checkpoint-750
|
|
|
|
| 2008 |
75%|β| 751/1000 [12:59<13:40, 3.29s/it]
|
| 2009 |
|
|
|
|
| 2010 |
75%|β| 751/1000 [12:59<13:40, 3.29s/it]
|
| 2011 |
75%|β| 752/1000 [12:59<10:27, 2.53s/it]
|
| 2012 |
|
|
|
|
| 2013 |
75%|β| 752/1000 [12:59<10:27, 2.53s/it]
|
| 2014 |
75%|β| 753/1000 [13:00<08:14, 2.00s/it]
|
| 2015 |
|
|
|
|
| 2016 |
75%|β| 753/1000 [13:00<08:14, 2.00s/it]
|
| 2017 |
75%|β| 754/1000 [13:01<06:34, 1.60s/it]
|
| 2018 |
|
|
|
|
| 2019 |
75%|β| 754/1000 [13:01<06:34, 1.60s/it]
|
|
|
|
| 1805 |
|
| 1806 |
70%|β| 701/1000 [12:13<16:05, 3.23s/it]
|
| 1807 |
70%|β| 702/1000 [12:13<12:18, 2.48s/it]
|
| 1808 |
|
| 1809 |
|
| 1810 |
70%|β| 702/1000 [12:13<12:18, 2.48s/it]
|
| 1811 |
70%|β| 703/1000 [12:14<09:46, 1.97s/it]
|
| 1812 |
|
| 1813 |
|
| 1814 |
70%|β| 703/1000 [12:14<09:46, 1.97s/it]
|
| 1815 |
70%|β| 704/1000 [12:15<07:54, 1.60s/it]
|
| 1816 |
|
|
|
|
| 1817 |
70%|β| 704/1000 [12:15<07:54, 1.60s/it]
|
| 1818 |
+
|
| 1819 |
70%|β| 704/1000 [12:15<07:54, 1.60s/it]
|
| 1820 |
70%|β| 705/1000 [12:16<06:40, 1.36s/it]
|
| 1821 |
|
| 1822 |
+
|
| 1823 |
70%|β| 705/1000 [12:16<06:40, 1.36s/it]
|
| 1824 |
71%|β| 706/1000 [12:17<05:45, 1.18s/it]
|
| 1825 |
|
| 1826 |
+
|
| 1827 |
71%|β| 706/1000 [12:17<05:45, 1.18s/it]
|
| 1828 |
71%|β| 707/1000 [12:17<05:04, 1.04s/it]
|
| 1829 |
|
| 1830 |
+
|
| 1831 |
71%|β| 707/1000 [12:17<05:04, 1.04s/it]
|
| 1832 |
71%|β| 708/1000 [12:18<04:37, 1.05it/s]
|
| 1833 |
|
| 1834 |
+
|
| 1835 |
71%|β| 708/1000 [12:18<04:37, 1.05it/s]
|
| 1836 |
71%|β| 709/1000 [12:19<04:20, 1.12it/s]
|
| 1837 |
|
| 1838 |
+
|
| 1839 |
71%|β| 709/1000 [12:19<04:20, 1.12it/s]
|
| 1840 |
71%|β| 710/1000 [12:19<04:03, 1.19it/s]
|
| 1841 |
|
| 1842 |
+
|
| 1843 |
71%|β| 710/1000 [12:19<04:03, 1.19it/s]
|
| 1844 |
71%|β| 711/1000 [12:20<03:53, 1.24it/s]
|
| 1845 |
|
| 1846 |
+
|
| 1847 |
71%|β| 711/1000 [12:20<03:53, 1.24it/s]
|
| 1848 |
71%|β| 712/1000 [12:21<03:46, 1.27it/s]
|
| 1849 |
|
| 1850 |
+
|
| 1851 |
71%|β| 712/1000 [12:21<03:46, 1.27it/s]
|
| 1852 |
71%|β| 713/1000 [12:22<03:45, 1.27it/s]
|
| 1853 |
|
| 1854 |
+
|
| 1855 |
71%|β| 713/1000 [12:22<03:45, 1.27it/s]
|
| 1856 |
71%|β| 714/1000 [12:23<03:43, 1.28it/s]
|
| 1857 |
|
| 1858 |
+
|
| 1859 |
71%|β| 714/1000 [12:23<03:43, 1.28it/s]
|
| 1860 |
72%|β| 715/1000 [12:23<03:40, 1.29it/s]
|
| 1861 |
|
| 1862 |
+
|
| 1863 |
72%|β| 715/1000 [12:23<03:40, 1.29it/s]
|
| 1864 |
72%|β| 716/1000 [12:24<03:38, 1.30it/s]
|
| 1865 |
|
| 1866 |
+
|
| 1867 |
72%|β| 716/1000 [12:24<03:38, 1.30it/s]
|
| 1868 |
72%|β| 717/1000 [12:25<03:35, 1.31it/s]
|
| 1869 |
|
| 1870 |
+
|
| 1871 |
72%|β| 717/1000 [12:25<03:35, 1.31it/s]
|
| 1872 |
72%|β| 718/1000 [12:25<03:30, 1.34it/s]
|
| 1873 |
|
| 1874 |
+
|
| 1875 |
72%|β| 718/1000 [12:25<03:30, 1.34it/s]
|
| 1876 |
72%|β| 719/1000 [12:26<03:29, 1.34it/s]
|
| 1877 |
|
| 1878 |
+
|
| 1879 |
72%|β| 719/1000 [12:26<03:29, 1.34it/s]
|
| 1880 |
72%|β| 720/1000 [12:27<03:26, 1.36it/s]
|
| 1881 |
|
| 1882 |
+
|
| 1883 |
72%|β| 720/1000 [12:27<03:26, 1.36it/s]
|
| 1884 |
72%|β| 721/1000 [12:28<03:27, 1.35it/s]
|
| 1885 |
|
| 1886 |
+
|
| 1887 |
72%|β| 721/1000 [12:28<03:27, 1.35it/s]
|
| 1888 |
72%|β| 722/1000 [12:28<03:25, 1.35it/s]
|
| 1889 |
|
| 1890 |
+
|
| 1891 |
72%|β| 722/1000 [12:28<03:25, 1.35it/s]
|
| 1892 |
72%|β| 723/1000 [12:29<03:26, 1.34it/s]
|
| 1893 |
|
| 1894 |
+
|
| 1895 |
72%|β| 723/1000 [12:29<03:26, 1.34it/s]
|
| 1896 |
72%|β| 724/1000 [12:30<03:28, 1.33it/s]
|
| 1897 |
|
| 1898 |
+
|
| 1899 |
72%|β| 724/1000 [12:30<03:28, 1.33it/s]
|
| 1900 |
72%|β| 725/1000 [12:31<03:24, 1.35it/s]
|
| 1901 |
|
| 1902 |
+
|
| 1903 |
72%|β| 725/1000 [12:31<03:24, 1.35it/s]
|
| 1904 |
73%|β| 726/1000 [12:31<03:16, 1.39it/s]
|
| 1905 |
|
| 1906 |
+
|
| 1907 |
73%|β| 726/1000 [12:31<03:16, 1.39it/s]
|
| 1908 |
73%|β| 727/1000 [12:32<03:15, 1.39it/s]
|
| 1909 |
|
| 1910 |
+
|
| 1911 |
73%|β| 727/1000 [12:32<03:15, 1.39it/s]
|
| 1912 |
73%|β| 728/1000 [12:33<03:17, 1.38it/s]
|
| 1913 |
|
| 1914 |
+
|
| 1915 |
73%|β| 728/1000 [12:33<03:17, 1.38it/s]
|
| 1916 |
73%|β| 729/1000 [12:34<03:18, 1.36it/s]
|
| 1917 |
|
| 1918 |
+
|
| 1919 |
73%|β| 729/1000 [12:34<03:18, 1.36it/s]
|
| 1920 |
73%|β| 730/1000 [12:34<03:23, 1.33it/s]
|
| 1921 |
|
| 1922 |
+
|
| 1923 |
73%|β| 730/1000 [12:34<03:23, 1.33it/s]
|
| 1924 |
73%|β| 731/1000 [12:35<03:21, 1.34it/s]
|
| 1925 |
|
| 1926 |
+
|
| 1927 |
73%|β| 731/1000 [12:35<03:21, 1.34it/s]
|
| 1928 |
73%|β| 732/1000 [12:36<03:23, 1.32it/s]
|
| 1929 |
|
| 1930 |
+
|
| 1931 |
73%|β| 732/1000 [12:36<03:23, 1.32it/s]
|
| 1932 |
73%|β| 733/1000 [12:37<03:19, 1.34it/s]
|
| 1933 |
|
| 1934 |
+
|
| 1935 |
73%|β| 733/1000 [12:37<03:19, 1.34it/s]
|
| 1936 |
73%|β| 734/1000 [12:37<03:17, 1.35it/s]
|
| 1937 |
|
| 1938 |
+
|
| 1939 |
73%|β| 734/1000 [12:37<03:17, 1.35it/s]
|
| 1940 |
74%|β| 735/1000 [12:38<03:17, 1.34it/s]
|
| 1941 |
|
| 1942 |
+
|
| 1943 |
74%|β| 735/1000 [12:38<03:17, 1.34it/s]
|
| 1944 |
74%|β| 736/1000 [12:39<03:16, 1.34it/s]
|
| 1945 |
|
| 1946 |
+
|
| 1947 |
74%|β| 736/1000 [12:39<03:16, 1.34it/s]
|
| 1948 |
74%|β| 737/1000 [12:40<03:16, 1.34it/s]
|
| 1949 |
|
| 1950 |
+
|
| 1951 |
74%|β| 737/1000 [12:40<03:16, 1.34it/s]
|
| 1952 |
74%|β| 738/1000 [12:40<03:15, 1.34it/s]
|
| 1953 |
|
| 1954 |
+
|
| 1955 |
74%|β| 738/1000 [12:40<03:15, 1.34it/s]
|
| 1956 |
74%|β| 739/1000 [12:41<03:17, 1.32it/s]
|
| 1957 |
|
| 1958 |
+
|
| 1959 |
74%|β| 739/1000 [12:41<03:17, 1.32it/s]
|
| 1960 |
74%|β| 740/1000 [12:42<03:18, 1.31it/s]
|
| 1961 |
|
| 1962 |
+
|
| 1963 |
74%|β| 740/1000 [12:42<03:18, 1.31it/s]
|
| 1964 |
74%|β| 741/1000 [12:43<03:18, 1.31it/s]
|
| 1965 |
|
| 1966 |
+
|
| 1967 |
74%|β| 741/1000 [12:43<03:18, 1.31it/s]
|
| 1968 |
74%|β| 742/1000 [12:43<03:12, 1.34it/s]
|
| 1969 |
|
| 1970 |
+
|
| 1971 |
74%|β| 742/1000 [12:43<03:12, 1.34it/s]
|
| 1972 |
74%|β| 743/1000 [12:44<03:11, 1.34it/s]
|
| 1973 |
|
| 1974 |
+
|
| 1975 |
74%|β| 743/1000 [12:44<03:11, 1.34it/s]
|
| 1976 |
74%|β| 744/1000 [12:45<03:09, 1.35it/s]
|
| 1977 |
|
| 1978 |
+
|
| 1979 |
74%|β| 744/1000 [12:45<03:09, 1.35it/s]
|
| 1980 |
74%|β| 745/1000 [12:46<03:09, 1.35it/s]
|
| 1981 |
|
| 1982 |
+
|
| 1983 |
74%|β| 745/1000 [12:46<03:09, 1.35it/s]
|
| 1984 |
75%|β| 746/1000 [12:46<03:08, 1.35it/s]
|
| 1985 |
|
| 1986 |
+
|
| 1987 |
75%|β| 746/1000 [12:46<03:08, 1.35it/s]
|
| 1988 |
75%|β| 747/1000 [12:47<03:07, 1.35it/s]
|
| 1989 |
|
| 1990 |
+
|
| 1991 |
75%|β| 747/1000 [12:47<03:07, 1.35it/s]
|
| 1992 |
75%|β| 748/1000 [12:48<03:07, 1.35it/s]
|
| 1993 |
|
| 1994 |
+
|
| 1995 |
75%|β| 748/1000 [12:48<03:07, 1.35it/s]
|
| 1996 |
75%|β| 749/1000 [12:49<03:10, 1.32it/s]
|
| 1997 |
|
| 1998 |
+
|
| 1999 |
75%|β| 749/1000 [12:49<03:10, 1.32it/s]
|
| 2000 |
75%|β| 750/1000 [12:49<03:08, 1.33it/s]
|
| 2001 |
|
| 2002 |
+
|
| 2003 |
75%|β| 750/1000 [12:49<03:08, 1.33it/s][2026-03-30 14:48:03,796] [INFO] [axolotl.core.trainers.base.evaluate:401] [PID:37135] Running evaluation step...
|
| 2004 |
+
|
| 2005 |
+
|
| 2006 |
0%| | 0/100 [00:00<?, ?it/s][A
|
| 2007 |
+
|
| 2008 |
3%| | 3/100 [00:00<00:03, 28.04it/s][A
|
| 2009 |
+
|
| 2010 |
6%|β | 6/100 [00:00<00:05, 16.57it/s][A
|
| 2011 |
+
|
| 2012 |
8%|β | 8/100 [00:00<00:05, 16.62it/s][A
|
| 2013 |
+
|
| 2014 |
10%|β | 10/100 [00:00<00:05, 15.99it/s][A
|
| 2015 |
+
|
| 2016 |
12%|β | 12/100 [00:00<00:05, 16.99it/s][A
|
| 2017 |
+
|
| 2018 |
14%|β | 14/100 [00:00<00:05, 16.89it/s][A
|
| 2019 |
+
|
| 2020 |
16%|β | 16/100 [00:00<00:04, 16.99it/s][A
|
| 2021 |
+
|
| 2022 |
18%|β | 18/100 [00:01<00:04, 17.21it/s][A
|
| 2023 |
+
|
| 2024 |
20%|β | 20/100 [00:01<00:04, 17.51it/s][A
|
| 2025 |
+
|
| 2026 |
22%|β | 22/100 [00:01<00:04, 17.05it/s][A
|
| 2027 |
+
|
| 2028 |
24%|β | 24/100 [00:01<00:04, 17.68it/s][A
|
| 2029 |
+
|
| 2030 |
26%|β | 26/100 [00:01<00:04, 17.06it/s][A
|
| 2031 |
+
|
| 2032 |
28%|β | 28/100 [00:01<00:04, 17.10it/s][A
|
| 2033 |
+
|
| 2034 |
30%|β | 30/100 [00:01<00:04, 16.71it/s][A
|
| 2035 |
+
|
| 2036 |
32%|β | 32/100 [00:01<00:04, 16.79it/s][A
|
| 2037 |
+
|
| 2038 |
34%|β | 34/100 [00:01<00:03, 16.91it/s][A
|
| 2039 |
+
|
| 2040 |
37%|β | 37/100 [00:02<00:03, 17.30it/s][A
|
| 2041 |
+
|
| 2042 |
39%|ββ | 39/100 [00:02<00:03, 17.32it/s][A
|
| 2043 |
+
|
| 2044 |
41%|ββ | 41/100 [00:02<00:03, 17.54it/s][A
|
| 2045 |
+
|
| 2046 |
44%|ββ | 44/100 [00:02<00:03, 18.21it/s][A
|
| 2047 |
+
|
| 2048 |
46%|ββ | 46/100 [00:02<00:03, 17.31it/s][A
|
| 2049 |
+
|
| 2050 |
48%|ββ | 48/100 [00:02<00:02, 17.71it/s][A
|
| 2051 |
+
|
| 2052 |
50%|ββ | 50/100 [00:02<00:02, 17.06it/s][A
|
| 2053 |
+
|
| 2054 |
52%|ββ | 52/100 [00:03<00:02, 17.04it/s][A
|
| 2055 |
+
|
| 2056 |
54%|ββ | 54/100 [00:03<00:02, 16.32it/s][A
|
| 2057 |
+
|
| 2058 |
56%|ββ | 56/100 [00:03<00:02, 16.65it/s][A
|
| 2059 |
+
|
| 2060 |
58%|ββ | 58/100 [00:03<00:02, 16.54it/s][A
|
| 2061 |
+
|
| 2062 |
60%|ββ | 60/100 [00:03<00:02, 17.01it/s][A
|
| 2063 |
+
|
| 2064 |
62%|ββ | 62/100 [00:03<00:02, 17.45it/s][A
|
| 2065 |
+
|
| 2066 |
64%|ββ | 64/100 [00:03<00:02, 17.67it/s][A
|
| 2067 |
+
|
| 2068 |
66%|ββ | 66/100 [00:03<00:02, 16.87it/s][A
|
| 2069 |
+
|
| 2070 |
68%|ββ | 68/100 [00:03<00:01, 17.43it/s][A
|
| 2071 |
+
|
| 2072 |
70%|ββ | 70/100 [00:04<00:01, 16.84it/s][A
|
| 2073 |
+
|
| 2074 |
72%|βββ| 72/100 [00:04<00:01, 17.34it/s][A
|
| 2075 |
+
|
| 2076 |
74%|βββ| 74/100 [00:04<00:01, 16.45it/s][A
|
| 2077 |
+
|
| 2078 |
77%|βββ| 77/100 [00:04<00:01, 17.09it/s][A
|
| 2079 |
+
|
| 2080 |
79%|βββ| 79/100 [00:04<00:01, 17.51it/s][A
|
| 2081 |
+
|
| 2082 |
81%|βββ| 81/100 [00:04<00:01, 16.77it/s][A
|
| 2083 |
+
|
| 2084 |
84%|βββ| 84/100 [00:04<00:00, 18.11it/s][A
|
| 2085 |
+
|
| 2086 |
86%|βββ| 86/100 [00:04<00:00, 17.51it/s][A
|
| 2087 |
+
|
| 2088 |
89%|βββ| 89/100 [00:05<00:00, 17.36it/s][A
|
| 2089 |
+
|
| 2090 |
91%|βββ| 91/100 [00:05<00:00, 17.85it/s][A
|
| 2091 |
+
|
| 2092 |
93%|βββ| 93/100 [00:05<00:00, 16.91it/s][A
|
| 2093 |
+
|
| 2094 |
95%|βββ| 95/100 [00:05<00:00, 16.63it/s][A
|
| 2095 |
+
|
| 2096 |
97%|βββ| 97/100 [00:05<00:00, 16.76it/s][A
|
| 2097 |
+
|
| 2098 |
|
| 2099 |
+
|
| 2100 |
|
| 2101 |
+
|
| 2102 |
75%|β| 750/1000 [12:55<03:08, 1.33it/s]
|
| 2103 |
+
|
| 2104 |
+
|
| 2105 |
[A[2026-03-30 14:48:09,865] [INFO] [axolotl.core.trainers.base._save:722] [PID:37135] Saving model checkpoint to /workspace/data/axolotl-outputs/sft/gemma-2-2b-it-rp-sft-qlora/checkpoint-750
|
| 2106 |
+
|
| 2107 |
75%|β| 751/1000 [12:59<13:40, 3.29s/it]
|
| 2108 |
|
| 2109 |
+
|
| 2110 |
75%|β| 751/1000 [12:59<13:40, 3.29s/it]
|
| 2111 |
75%|β| 752/1000 [12:59<10:27, 2.53s/it]
|
| 2112 |
|
| 2113 |
+
|
| 2114 |
75%|β| 752/1000 [12:59<10:27, 2.53s/it]
|
| 2115 |
75%|β| 753/1000 [13:00<08:14, 2.00s/it]
|
| 2116 |
|
| 2117 |
+
|
| 2118 |
75%|β| 753/1000 [13:00<08:14, 2.00s/it]
|
| 2119 |
75%|β| 754/1000 [13:01<06:34, 1.60s/it]
|
| 2120 |
|
| 2121 |
+
|
| 2122 |
75%|β| 754/1000 [13:01<06:34, 1.60s/it]
|