Text Generation
PEFT
Safetensors
Transformers
gemma2
axolotl
lora
conversational
text-generation-inference
4-bit precision
bitsandbytes
Instructions to use AiAF/rp-2b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use AiAF/rp-2b with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("google/gemma-2-2b-it") model = PeftModel.from_pretrained(base_model, "AiAF/rp-2b") - Transformers
How to use AiAF/rp-2b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="AiAF/rp-2b") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("AiAF/rp-2b") model = AutoModelForCausalLM.from_pretrained("AiAF/rp-2b") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use AiAF/rp-2b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "AiAF/rp-2b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AiAF/rp-2b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/AiAF/rp-2b
- SGLang
How to use AiAF/rp-2b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "AiAF/rp-2b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AiAF/rp-2b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "AiAF/rp-2b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AiAF/rp-2b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use AiAF/rp-2b with Docker Model Runner:
docker model run hf.co/AiAF/rp-2b
Training in progress, step 800
Browse files- adapter_model.safetensors +1 -1
- debug.log +105 -1
adapter_model.safetensors
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 102264160
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:cd804fe5a6a07ca92c0d9df3ee8901a99a952af466c85b5d67804f3b9b5754fc
|
| 3 |
size 102264160
|
debug.log
CHANGED
|
@@ -1908,4 +1908,108 @@ trainable params: 25,559,040 || all params: 2,639,900,928 || trainable%: 0.9682
|
|
| 1908 |
|
| 1909 |
75%|β| 751/1000 [12:59<13:40, 3.29s/it]
|
| 1910 |
75%|β| 752/1000 [12:59<10:27, 2.53s/it]
|
| 1911 |
|
| 1912 |
|
| 1913 |
75%|β| 752/1000 [12:59<10:27, 2.53s/it]
|
| 1914 |
75%|β| 753/1000 [13:00<08:14, 2.00s/it]
|
| 1915 |
|
| 1916 |
|
| 1917 |
75%|β| 753/1000 [13:00<08:14, 2.00s/it]
|
| 1918 |
75%|β| 754/1000 [13:01<06:34, 1.60s/it]
|
| 1919 |
|
| 1920 |
-
|
| 1921 |
75%|β| 754/1000 [13:01<06:34, 1.60s/it]
|
|
|
|
| 1922 |
75%|β| 754/1000 [13:01<06:34, 1.60s/it]
|
| 1923 |
76%|β| 755/1000 [13:01<05:24, 1.33s/it]
|
| 1924 |
|
|
|
|
| 1925 |
76%|β| 755/1000 [13:01<05:24, 1.33s/it]
|
| 1926 |
76%|β| 756/1000 [13:02<04:42, 1.16s/it]
|
| 1927 |
|
|
|
|
| 1928 |
76%|β| 756/1000 [13:02<04:42, 1.16s/it]
|
| 1929 |
76%|β| 757/1000 [13:03<04:11, 1.04s/it]
|
| 1930 |
|
|
|
|
| 1931 |
76%|β| 757/1000 [13:03<04:11, 1.04s/it]
|
| 1932 |
76%|β| 758/1000 [13:04<03:48, 1.06it/s]
|
| 1933 |
|
|
|
|
| 1934 |
76%|β| 758/1000 [13:04<03:48, 1.06it/s]
|
| 1935 |
76%|β| 759/1000 [13:04<03:31, 1.14it/s]
|
| 1936 |
|
|
|
|
| 1937 |
76%|β| 759/1000 [13:04<03:31, 1.14it/s]
|
| 1938 |
76%|β| 760/1000 [13:05<03:21, 1.19it/s]
|
| 1939 |
|
|
|
|
| 1940 |
76%|β| 760/1000 [13:05<03:21, 1.19it/s]
|
| 1941 |
76%|β| 761/1000 [13:06<03:17, 1.21it/s]
|
| 1942 |
|
|
|
|
| 1943 |
76%|β| 761/1000 [13:06<03:17, 1.21it/s]
|
| 1944 |
76%|β| 762/1000 [13:07<03:08, 1.26it/s]
|
| 1945 |
|
|
|
|
| 1946 |
76%|β| 762/1000 [13:07<03:08, 1.26it/s]
|
| 1947 |
76%|β| 763/1000 [13:07<03:05, 1.28it/s]
|
| 1948 |
|
|
|
|
| 1949 |
76%|β| 763/1000 [13:07<03:05, 1.28it/s]
|
| 1950 |
76%|β| 764/1000 [13:08<03:03, 1.28it/s]
|
| 1951 |
|
|
|
|
| 1952 |
76%|β| 764/1000 [13:08<03:03, 1.28it/s]
|
| 1953 |
76%|β| 765/1000 [13:09<03:01, 1.29it/s]
|
| 1954 |
|
|
|
|
| 1955 |
76%|β| 765/1000 [13:09<03:01, 1.29it/s]
|
| 1956 |
77%|β| 766/1000 [13:10<03:01, 1.29it/s]
|
| 1957 |
|
|
|
|
| 1958 |
77%|β| 766/1000 [13:10<03:01, 1.29it/s]
|
| 1959 |
77%|β| 767/1000 [13:10<02:56, 1.32it/s]
|
| 1960 |
|
|
|
|
| 1961 |
77%|β| 767/1000 [13:10<02:56, 1.32it/s]
|
| 1962 |
77%|β| 768/1000 [13:11<02:55, 1.32it/s]
|
| 1963 |
|
|
|
|
| 1964 |
77%|β| 768/1000 [13:11<02:55, 1.32it/s]
|
| 1965 |
77%|β| 769/1000 [13:12<02:53, 1.33it/s]
|
| 1966 |
|
|
|
|
| 1967 |
77%|β| 769/1000 [13:12<02:53, 1.33it/s]
|
| 1968 |
77%|β| 770/1000 [13:13<02:55, 1.31it/s]
|
| 1969 |
|
|
|
|
| 1970 |
77%|β| 770/1000 [13:13<02:55, 1.31it/s]
|
| 1971 |
77%|β| 771/1000 [13:13<02:55, 1.31it/s]
|
| 1972 |
|
|
|
|
| 1973 |
77%|β| 771/1000 [13:13<02:55, 1.31it/s]
|
| 1974 |
77%|β| 772/1000 [13:14<02:52, 1.32it/s]
|
| 1975 |
|
|
|
|
| 1976 |
77%|β| 772/1000 [13:14<02:52, 1.32it/s]
|
| 1977 |
77%|β| 773/1000 [13:15<02:50, 1.33it/s]
|
| 1978 |
|
|
|
|
| 1979 |
77%|β| 773/1000 [13:15<02:50, 1.33it/s]
|
| 1980 |
77%|β| 774/1000 [13:16<02:49, 1.33it/s]
|
| 1981 |
|
|
|
|
| 1982 |
77%|β| 774/1000 [13:16<02:49, 1.33it/s]
|
| 1983 |
78%|β| 775/1000 [13:16<02:46, 1.35it/s]
|
| 1984 |
|
|
|
|
| 1985 |
78%|β| 775/1000 [13:16<02:46, 1.35it/s]
|
| 1986 |
78%|β| 776/1000 [13:17<02:45, 1.36it/s]
|
| 1987 |
|
|
|
|
| 1988 |
78%|β| 776/1000 [13:17<02:45, 1.36it/s]
|
| 1989 |
78%|β| 777/1000 [13:18<02:45, 1.34it/s]
|
| 1990 |
|
|
|
|
| 1991 |
78%|β| 777/1000 [13:18<02:45, 1.34it/s]
|
| 1992 |
78%|β| 778/1000 [13:19<02:45, 1.34it/s]
|
| 1993 |
|
|
|
|
| 1994 |
78%|β| 778/1000 [13:19<02:45, 1.34it/s]
|
| 1995 |
78%|β| 779/1000 [13:19<02:47, 1.32it/s]
|
| 1996 |
|
|
|
|
| 1997 |
78%|β| 779/1000 [13:19<02:47, 1.32it/s]
|
| 1998 |
78%|β| 780/1000 [13:20<02:46, 1.33it/s]
|
| 1999 |
|
|
|
|
| 2000 |
78%|β| 780/1000 [13:20<02:46, 1.33it/s]
|
| 2001 |
78%|β| 781/1000 [13:21<02:43, 1.34it/s]
|
| 2002 |
|
|
|
|
| 2003 |
78%|β| 781/1000 [13:21<02:43, 1.34it/s]
|
| 2004 |
78%|β| 782/1000 [13:22<02:40, 1.36it/s]
|
| 2005 |
|
|
|
|
| 2006 |
78%|β| 782/1000 [13:22<02:40, 1.36it/s]
|
| 2007 |
78%|β| 783/1000 [13:22<02:41, 1.34it/s]
|
| 2008 |
|
|
|
|
| 2009 |
78%|β| 783/1000 [13:22<02:41, 1.34it/s]
|
| 2010 |
78%|β| 784/1000 [13:23<02:40, 1.35it/s]
|
| 2011 |
|
|
|
|
| 2012 |
78%|β| 784/1000 [13:23<02:40, 1.35it/s]
|
| 2013 |
78%|β| 785/1000 [13:24<02:40, 1.34it/s]
|
| 2014 |
|
|
|
|
| 2015 |
78%|β| 785/1000 [13:24<02:40, 1.34it/s]
|
| 2016 |
79%|β| 786/1000 [13:25<02:37, 1.36it/s]
|
| 2017 |
|
|
|
|
| 2018 |
79%|β| 786/1000 [13:25<02:37, 1.36it/s]
|
| 2019 |
79%|β| 787/1000 [13:25<02:39, 1.34it/s]
|
| 2020 |
|
|
|
|
| 2021 |
79%|β| 787/1000 [13:25<02:39, 1.34it/s]
|
| 2022 |
79%|β| 788/1000 [13:26<02:36, 1.35it/s]
|
| 2023 |
|
|
|
|
| 2024 |
79%|β| 788/1000 [13:26<02:36, 1.35it/s]
|
| 2025 |
79%|β| 789/1000 [13:27<02:35, 1.36it/s]
|
| 2026 |
|
|
|
|
| 2027 |
79%|β| 789/1000 [13:27<02:35, 1.36it/s]
|
| 2028 |
79%|β| 790/1000 [13:28<02:38, 1.33it/s]
|
| 2029 |
|
|
|
|
| 2030 |
79%|β| 790/1000 [13:28<02:38, 1.33it/s]
|
| 2031 |
79%|β| 791/1000 [13:28<02:33, 1.36it/s]
|
| 2032 |
|
|
|
|
| 2033 |
79%|β| 791/1000 [13:28<02:33, 1.36it/s]
|
| 2034 |
79%|β| 792/1000 [13:29<02:33, 1.35it/s]
|
| 2035 |
|
|
|
|
| 2036 |
79%|β| 792/1000 [13:29<02:33, 1.35it/s]
|
| 2037 |
79%|β| 793/1000 [13:30<02:31, 1.36it/s]
|
| 2038 |
|
|
|
|
| 2039 |
79%|β| 793/1000 [13:30<02:31, 1.36it/s]
|
| 2040 |
79%|β| 794/1000 [13:31<02:31, 1.36it/s]
|
| 2041 |
|
|
|
|
| 2042 |
79%|β| 794/1000 [13:31<02:31, 1.36it/s]
|
| 2043 |
80%|β| 795/1000 [13:31<02:31, 1.36it/s]
|
| 2044 |
|
|
|
|
| 2045 |
80%|β| 795/1000 [13:31<02:31, 1.36it/s]
|
| 2046 |
80%|β| 796/1000 [13:32<02:30, 1.36it/s]
|
| 2047 |
|
|
|
|
| 2048 |
80%|β| 796/1000 [13:32<02:30, 1.36it/s]
|
| 2049 |
80%|β| 797/1000 [13:33<02:30, 1.35it/s]
|
| 2050 |
|
|
|
|
| 2051 |
80%|β| 797/1000 [13:33<02:30, 1.35it/s]
|
| 2052 |
80%|β| 798/1000 [13:33<02:29, 1.35it/s]
|
| 2053 |
|
|
|
|
| 2054 |
80%|β| 798/1000 [13:33<02:29, 1.35it/s]
|
| 2055 |
80%|β| 799/1000 [13:34<02:30, 1.33it/s]
|
| 2056 |
|
|
|
|
| 2057 |
80%|β| 799/1000 [13:34<02:30, 1.33it/s]
|
| 2058 |
80%|β| 800/1000 [13:35<02:28, 1.34it/s]
|
| 2059 |
|
|
|
|
| 2060 |
80%|β| 800/1000 [13:35<02:28, 1.34it/s][2026-03-30 14:48:49,462] [INFO] [axolotl.core.trainers.base.evaluate:401] [PID:37135] Running evaluation step...
|
|
|
|
|
|
|
| 2061 |
0%| | 0/100 [00:00<?, ?it/s][A
|
|
|
|
| 2062 |
3%| | 3/100 [00:00<00:05, 18.22it/s][A
|
|
|
|
| 2063 |
5%|β | 5/100 [00:00<00:05, 15.88it/s][A
|
|
|
|
| 2064 |
7%|β | 7/100 [00:00<00:05, 16.81it/s][A
|
|
|
|
| 2065 |
9%|β | 9/100 [00:00<00:05, 15.81it/s][A
|
|
|
|
| 2066 |
11%|β | 11/100 [00:00<00:05, 16.34it/s][A
|
|
|
|
| 2067 |
13%|β | 13/100 [00:00<00:05, 16.63it/s][A
|
|
|
|
| 2068 |
15%|β | 15/100 [00:00<00:04, 17.30it/s][A
|
|
|
|
| 2069 |
17%|β | 17/100 [00:01<00:04, 16.87it/s][A
|
|
|
|
| 2070 |
19%|β | 19/100 [00:01<00:04, 17.69it/s][A
|
|
|
|
| 2071 |
21%|β | 21/100 [00:01<00:04, 16.98it/s][A
|
|
|
|
| 2072 |
23%|β | 23/100 [00:01<00:04, 17.49it/s][A
|
|
|
|
| 2073 |
25%|β | 25/100 [00:01<00:04, 17.09it/s][A
|
|
|
|
| 2074 |
27%|β | 27/100 [00:01<00:04, 17.18it/s][A
|
|
|
|
| 2075 |
29%|β | 29/100 [00:01<00:04, 16.48it/s][A
|
|
|
|
| 2076 |
31%|β | 31/100 [00:01<00:04, 17.11it/s][A
|
|
|
|
| 2077 |
33%|β | 33/100 [00:01<00:04, 16.51it/s][A
|
|
|
|
| 2078 |
36%|β | 36/100 [00:02<00:03, 17.86it/s][A
|
|
|
|
| 2079 |
38%|ββ | 38/100 [00:02<00:03, 17.47it/s][A
|
|
|
|
| 2080 |
40%|ββ | 40/100 [00:02<00:03, 17.54it/s][A
|
|
|
|
| 2081 |
42%|ββ | 42/100 [00:02<00:03, 18.02it/s][A
|
|
|
|
| 2082 |
44%|ββ | 44/100 [00:02<00:03, 17.95it/s][A
|
|
|
|
| 2083 |
46%|ββ | 46/100 [00:02<00:03, 14.58it/s][A
|
|
|
|
| 2084 |
48%|ββ | 48/100 [00:02<00:03, 15.62it/s][A
|
|
|
|
| 2085 |
50%|ββ | 50/100 [00:02<00:03, 15.61it/s][A
|
|
|
|
| 2086 |
52%|ββ | 52/100 [00:03<00:03, 15.99it/s][A
|
|
|
|
| 2087 |
54%|ββ | 54/100 [00:03<00:02, 15.68it/s][A
|
|
|
|
| 2088 |
56%|ββ | 56/100 [00:03<00:02, 16.17it/s][A
|
|
|
|
| 2089 |
58%|ββ | 58/100 [00:03<00:02, 16.20it/s][A
|
|
|
|
| 2090 |
60%|ββ | 60/100 [00:03<00:02, 16.75it/s][A
|
|
|
|
| 2091 |
62%|ββ | 62/100 [00:03<00:02, 17.22it/s][A
|
|
|
|
| 2092 |
64%|ββ | 64/100 [00:03<00:02, 17.50it/s][A
|
|
|
|
| 2093 |
66%|ββ | 66/100 [00:03<00:02, 16.78it/s][A
|
|
|
|
| 2094 |
68%|ββ | 68/100 [00:04<00:01, 17.35it/s][A
|
|
|
|
| 2095 |
70%|ββ | 70/100 [00:04<00:01, 16.79it/s][A
|
|
|
|
| 2096 |
72%|βββ| 72/100 [00:04<00:01, 17.31it/s][A
|
|
|
|
| 2097 |
74%|βββ| 74/100 [00:04<00:01, 16.43it/s][A
|
|
|
|
| 2098 |
77%|βββ| 77/100 [00:04<00:01, 17.05it/s][A
|
|
|
|
| 2099 |
79%|βββ| 79/100 [00:04<00:01, 17.48it/s][A
|
|
|
|
| 2100 |
81%|βββ| 81/100 [00:04<00:01, 17.23it/s][A
|
|
|
|
| 2101 |
84%|βββ| 84/100 [00:04<00:00, 18.43it/s][A
|
|
|
|
| 2102 |
86%|βββ| 86/100 [00:05<00:00, 17.73it/s][A
|
|
|
|
| 2103 |
89%|βββ| 89/100 [00:05<00:00, 17.95it/s][A
|
|
|
|
| 2104 |
91%|βββ| 91/100 [00:05<00:00, 18.29it/s][A
|
|
|
|
| 2105 |
93%|βββ| 93/100 [00:05<00:00, 17.19it/s][A
|
|
|
|
| 2106 |
95%|βββ| 95/100 [00:05<00:00, 16.82it/s][A
|
|
|
|
| 2107 |
97%|βββ| 97/100 [00:05<00:00, 16.89it/s][A
|
|
|
|
| 2108 |
|
|
|
|
| 2109 |
|
|
|
|
| 2110 |
80%|β| 800/1000 [13:41<02:28, 1.34it/s]
|
|
|
|
|
|
|
| 2111 |
[A[2026-03-30 14:48:55,637] [INFO] [axolotl.core.trainers.base._save:722] [PID:37135] Saving model checkpoint to /workspace/data/axolotl-outputs/sft/gemma-2-2b-it-rp-sft-qlora/checkpoint-800
|
|
|
|
| 2112 |
80%|β| 801/1000 [13:44<10:45, 3.25s/it]
|
| 2113 |
|
|
|
|
| 2114 |
80%|β| 801/1000 [13:44<10:45, 3.25s/it]
|
| 2115 |
80%|β| 802/1000 [13:45<08:15, 2.50s/it]
|
| 2116 |
|
|
|
|
| 2117 |
80%|β| 802/1000 [13:45<08:15, 2.50s/it]
|
| 2118 |
80%|β| 803/1000 [13:46<06:28, 1.97s/it]
|
| 2119 |
|
|
|
|
| 2120 |
80%|β| 803/1000 [13:46<06:28, 1.97s/it]
|
| 2121 |
80%|β| 804/1000 [13:46<05:12, 1.59s/it]
|
| 2122 |
|
|
|
|
| 2123 |
80%|β| 804/1000 [13:46<05:12, 1.59s/it]
|
|
|
|
| 1908 |
|
| 1909 |
75%|β| 751/1000 [12:59<13:40, 3.29s/it]
|
| 1910 |
75%|β| 752/1000 [12:59<10:27, 2.53s/it]
|
| 1911 |
|
| 1912 |
|
| 1913 |
75%|β| 752/1000 [12:59<10:27, 2.53s/it]
|
| 1914 |
75%|β| 753/1000 [13:00<08:14, 2.00s/it]
|
| 1915 |
|
| 1916 |
|
| 1917 |
75%|β| 753/1000 [13:00<08:14, 2.00s/it]
|
| 1918 |
75%|β| 754/1000 [13:01<06:34, 1.60s/it]
|
| 1919 |
|
|
|
|
| 1920 |
75%|β| 754/1000 [13:01<06:34, 1.60s/it]
|
| 1921 |
+
|
| 1922 |
75%|β| 754/1000 [13:01<06:34, 1.60s/it]
|
| 1923 |
76%|β| 755/1000 [13:01<05:24, 1.33s/it]
|
| 1924 |
|
| 1925 |
+
|
| 1926 |
76%|β| 755/1000 [13:01<05:24, 1.33s/it]
|
| 1927 |
76%|β| 756/1000 [13:02<04:42, 1.16s/it]
|
| 1928 |
|
| 1929 |
+
|
| 1930 |
76%|β| 756/1000 [13:02<04:42, 1.16s/it]
|
| 1931 |
76%|β| 757/1000 [13:03<04:11, 1.04s/it]
|
| 1932 |
|
| 1933 |
+
|
| 1934 |
76%|β| 757/1000 [13:03<04:11, 1.04s/it]
|
| 1935 |
76%|β| 758/1000 [13:04<03:48, 1.06it/s]
|
| 1936 |
|
| 1937 |
+
|
| 1938 |
76%|β| 758/1000 [13:04<03:48, 1.06it/s]
|
| 1939 |
76%|β| 759/1000 [13:04<03:31, 1.14it/s]
|
| 1940 |
|
| 1941 |
+
|
| 1942 |
76%|β| 759/1000 [13:04<03:31, 1.14it/s]
|
| 1943 |
76%|β| 760/1000 [13:05<03:21, 1.19it/s]
|
| 1944 |
|
| 1945 |
+
|
| 1946 |
76%|β| 760/1000 [13:05<03:21, 1.19it/s]
|
| 1947 |
76%|β| 761/1000 [13:06<03:17, 1.21it/s]
|
| 1948 |
|
| 1949 |
+
|
| 1950 |
76%|β| 761/1000 [13:06<03:17, 1.21it/s]
|
| 1951 |
76%|β| 762/1000 [13:07<03:08, 1.26it/s]
|
| 1952 |
|
| 1953 |
+
|
| 1954 |
76%|β| 762/1000 [13:07<03:08, 1.26it/s]
|
| 1955 |
76%|β| 763/1000 [13:07<03:05, 1.28it/s]
|
| 1956 |
|
| 1957 |
+
|
| 1958 |
76%|β| 763/1000 [13:07<03:05, 1.28it/s]
|
| 1959 |
76%|β| 764/1000 [13:08<03:03, 1.28it/s]
|
| 1960 |
|
| 1961 |
+
|
| 1962 |
76%|β| 764/1000 [13:08<03:03, 1.28it/s]
|
| 1963 |
76%|β| 765/1000 [13:09<03:01, 1.29it/s]
|
| 1964 |
|
| 1965 |
+
|
| 1966 |
76%|β| 765/1000 [13:09<03:01, 1.29it/s]
|
| 1967 |
77%|β| 766/1000 [13:10<03:01, 1.29it/s]
|
| 1968 |
|
| 1969 |
+
|
| 1970 |
77%|β| 766/1000 [13:10<03:01, 1.29it/s]
|
| 1971 |
77%|β| 767/1000 [13:10<02:56, 1.32it/s]
|
| 1972 |
|
| 1973 |
+
|
| 1974 |
77%|β| 767/1000 [13:10<02:56, 1.32it/s]
|
| 1975 |
77%|β| 768/1000 [13:11<02:55, 1.32it/s]
|
| 1976 |
|
| 1977 |
+
|
| 1978 |
77%|β| 768/1000 [13:11<02:55, 1.32it/s]
|
| 1979 |
77%|β| 769/1000 [13:12<02:53, 1.33it/s]
|
| 1980 |
|
| 1981 |
+
|
| 1982 |
77%|β| 769/1000 [13:12<02:53, 1.33it/s]
|
| 1983 |
77%|β| 770/1000 [13:13<02:55, 1.31it/s]
|
| 1984 |
|
| 1985 |
+
|
| 1986 |
77%|β| 770/1000 [13:13<02:55, 1.31it/s]
|
| 1987 |
77%|β| 771/1000 [13:13<02:55, 1.31it/s]
|
| 1988 |
|
| 1989 |
+
|
| 1990 |
77%|β| 771/1000 [13:13<02:55, 1.31it/s]
|
| 1991 |
77%|β| 772/1000 [13:14<02:52, 1.32it/s]
|
| 1992 |
|
| 1993 |
+
|
| 1994 |
77%|β| 772/1000 [13:14<02:52, 1.32it/s]
|
| 1995 |
77%|β| 773/1000 [13:15<02:50, 1.33it/s]
|
| 1996 |
|
| 1997 |
+
|
| 1998 |
77%|β| 773/1000 [13:15<02:50, 1.33it/s]
|
| 1999 |
77%|β| 774/1000 [13:16<02:49, 1.33it/s]
|
| 2000 |
|
| 2001 |
+
|
| 2002 |
77%|β| 774/1000 [13:16<02:49, 1.33it/s]
|
| 2003 |
78%|β| 775/1000 [13:16<02:46, 1.35it/s]
|
| 2004 |
|
| 2005 |
+
|
| 2006 |
78%|β| 775/1000 [13:16<02:46, 1.35it/s]
|
| 2007 |
78%|β| 776/1000 [13:17<02:45, 1.36it/s]
|
| 2008 |
|
| 2009 |
+
|
| 2010 |
78%|β| 776/1000 [13:17<02:45, 1.36it/s]
|
| 2011 |
78%|β| 777/1000 [13:18<02:45, 1.34it/s]
|
| 2012 |
|
| 2013 |
+
|
| 2014 |
78%|β| 777/1000 [13:18<02:45, 1.34it/s]
|
| 2015 |
78%|β| 778/1000 [13:19<02:45, 1.34it/s]
|
| 2016 |
|
| 2017 |
+
|
| 2018 |
78%|β| 778/1000 [13:19<02:45, 1.34it/s]
|
| 2019 |
78%|β| 779/1000 [13:19<02:47, 1.32it/s]
|
| 2020 |
|
| 2021 |
+
|
| 2022 |
78%|β| 779/1000 [13:19<02:47, 1.32it/s]
|
| 2023 |
78%|β| 780/1000 [13:20<02:46, 1.33it/s]
|
| 2024 |
|
| 2025 |
+
|
| 2026 |
78%|β| 780/1000 [13:20<02:46, 1.33it/s]
|
| 2027 |
78%|β| 781/1000 [13:21<02:43, 1.34it/s]
|
| 2028 |
|
| 2029 |
+
|
| 2030 |
78%|β| 781/1000 [13:21<02:43, 1.34it/s]
|
| 2031 |
78%|β| 782/1000 [13:22<02:40, 1.36it/s]
|
| 2032 |
|
| 2033 |
+
|
| 2034 |
78%|β| 782/1000 [13:22<02:40, 1.36it/s]
|
| 2035 |
78%|β| 783/1000 [13:22<02:41, 1.34it/s]
|
| 2036 |
|
| 2037 |
+
|
| 2038 |
78%|β| 783/1000 [13:22<02:41, 1.34it/s]
|
| 2039 |
78%|β| 784/1000 [13:23<02:40, 1.35it/s]
|
| 2040 |
|
| 2041 |
+
|
| 2042 |
78%|β| 784/1000 [13:23<02:40, 1.35it/s]
|
| 2043 |
78%|β| 785/1000 [13:24<02:40, 1.34it/s]
|
| 2044 |
|
| 2045 |
+
|
| 2046 |
78%|β| 785/1000 [13:24<02:40, 1.34it/s]
|
| 2047 |
79%|β| 786/1000 [13:25<02:37, 1.36it/s]
|
| 2048 |
|
| 2049 |
+
|
| 2050 |
79%|β| 786/1000 [13:25<02:37, 1.36it/s]
|
| 2051 |
79%|β| 787/1000 [13:25<02:39, 1.34it/s]
|
| 2052 |
|
| 2053 |
+
|
| 2054 |
79%|β| 787/1000 [13:25<02:39, 1.34it/s]
|
| 2055 |
79%|β| 788/1000 [13:26<02:36, 1.35it/s]
|
| 2056 |
|
| 2057 |
+
|
| 2058 |
79%|β| 788/1000 [13:26<02:36, 1.35it/s]
|
| 2059 |
79%|β| 789/1000 [13:27<02:35, 1.36it/s]
|
| 2060 |
|
| 2061 |
+
|
| 2062 |
79%|β| 789/1000 [13:27<02:35, 1.36it/s]
|
| 2063 |
79%|β| 790/1000 [13:28<02:38, 1.33it/s]
|
| 2064 |
|
| 2065 |
+
|
| 2066 |
79%|β| 790/1000 [13:28<02:38, 1.33it/s]
|
| 2067 |
79%|β| 791/1000 [13:28<02:33, 1.36it/s]
|
| 2068 |
|
| 2069 |
+
|
| 2070 |
79%|β| 791/1000 [13:28<02:33, 1.36it/s]
|
| 2071 |
79%|β| 792/1000 [13:29<02:33, 1.35it/s]
|
| 2072 |
|
| 2073 |
+
|
| 2074 |
79%|β| 792/1000 [13:29<02:33, 1.35it/s]
|
| 2075 |
79%|β| 793/1000 [13:30<02:31, 1.36it/s]
|
| 2076 |
|
| 2077 |
+
|
| 2078 |
79%|β| 793/1000 [13:30<02:31, 1.36it/s]
|
| 2079 |
79%|β| 794/1000 [13:31<02:31, 1.36it/s]
|
| 2080 |
|
| 2081 |
+
|
| 2082 |
79%|β| 794/1000 [13:31<02:31, 1.36it/s]
|
| 2083 |
80%|β| 795/1000 [13:31<02:31, 1.36it/s]
|
| 2084 |
|
| 2085 |
+
|
| 2086 |
80%|β| 795/1000 [13:31<02:31, 1.36it/s]
|
| 2087 |
80%|β| 796/1000 [13:32<02:30, 1.36it/s]
|
| 2088 |
|
| 2089 |
+
|
| 2090 |
80%|β| 796/1000 [13:32<02:30, 1.36it/s]
|
| 2091 |
80%|β| 797/1000 [13:33<02:30, 1.35it/s]
|
| 2092 |
|
| 2093 |
+
|
| 2094 |
80%|β| 797/1000 [13:33<02:30, 1.35it/s]
|
| 2095 |
80%|β| 798/1000 [13:33<02:29, 1.35it/s]
|
| 2096 |
|
| 2097 |
+
|
| 2098 |
80%|β| 798/1000 [13:33<02:29, 1.35it/s]
|
| 2099 |
80%|β| 799/1000 [13:34<02:30, 1.33it/s]
|
| 2100 |
|
| 2101 |
+
|
| 2102 |
80%|β| 799/1000 [13:34<02:30, 1.33it/s]
|
| 2103 |
80%|β| 800/1000 [13:35<02:28, 1.34it/s]
|
| 2104 |
|
| 2105 |
+
|
| 2106 |
80%|β| 800/1000 [13:35<02:28, 1.34it/s][2026-03-30 14:48:49,462] [INFO] [axolotl.core.trainers.base.evaluate:401] [PID:37135] Running evaluation step...
|
| 2107 |
+
|
| 2108 |
+
|
| 2109 |
0%| | 0/100 [00:00<?, ?it/s][A
|
| 2110 |
+
|
| 2111 |
3%| | 3/100 [00:00<00:05, 18.22it/s][A
|
| 2112 |
+
|
| 2113 |
5%|β | 5/100 [00:00<00:05, 15.88it/s][A
|
| 2114 |
+
|
| 2115 |
7%|β | 7/100 [00:00<00:05, 16.81it/s][A
|
| 2116 |
+
|
| 2117 |
9%|β | 9/100 [00:00<00:05, 15.81it/s][A
|
| 2118 |
+
|
| 2119 |
11%|β | 11/100 [00:00<00:05, 16.34it/s][A
|
| 2120 |
+
|
| 2121 |
13%|β | 13/100 [00:00<00:05, 16.63it/s][A
|
| 2122 |
+
|
| 2123 |
15%|β | 15/100 [00:00<00:04, 17.30it/s][A
|
| 2124 |
+
|
| 2125 |
17%|β | 17/100 [00:01<00:04, 16.87it/s][A
|
| 2126 |
+
|
| 2127 |
19%|β | 19/100 [00:01<00:04, 17.69it/s][A
|
| 2128 |
+
|
| 2129 |
21%|β | 21/100 [00:01<00:04, 16.98it/s][A
|
| 2130 |
+
|
| 2131 |
23%|β | 23/100 [00:01<00:04, 17.49it/s][A
|
| 2132 |
+
|
| 2133 |
25%|β | 25/100 [00:01<00:04, 17.09it/s][A
|
| 2134 |
+
|
| 2135 |
27%|β | 27/100 [00:01<00:04, 17.18it/s][A
|
| 2136 |
+
|
| 2137 |
29%|β | 29/100 [00:01<00:04, 16.48it/s][A
|
| 2138 |
+
|
| 2139 |
31%|β | 31/100 [00:01<00:04, 17.11it/s][A
|
| 2140 |
+
|
| 2141 |
33%|β | 33/100 [00:01<00:04, 16.51it/s][A
|
| 2142 |
+
|
| 2143 |
36%|β | 36/100 [00:02<00:03, 17.86it/s][A
|
| 2144 |
+
|
| 2145 |
38%|ββ | 38/100 [00:02<00:03, 17.47it/s][A
|
| 2146 |
+
|
| 2147 |
40%|ββ | 40/100 [00:02<00:03, 17.54it/s][A
|
| 2148 |
+
|
| 2149 |
42%|ββ | 42/100 [00:02<00:03, 18.02it/s][A
|
| 2150 |
+
|
| 2151 |
44%|ββ | 44/100 [00:02<00:03, 17.95it/s][A
|
| 2152 |
+
|
| 2153 |
46%|ββ | 46/100 [00:02<00:03, 14.58it/s][A
|
| 2154 |
+
|
| 2155 |
48%|ββ | 48/100 [00:02<00:03, 15.62it/s][A
|
| 2156 |
+
|
| 2157 |
50%|ββ | 50/100 [00:02<00:03, 15.61it/s][A
|
| 2158 |
+
|
| 2159 |
52%|ββ | 52/100 [00:03<00:03, 15.99it/s][A
|
| 2160 |
+
|
| 2161 |
54%|ββ | 54/100 [00:03<00:02, 15.68it/s][A
|
| 2162 |
+
|
| 2163 |
56%|ββ | 56/100 [00:03<00:02, 16.17it/s][A
|
| 2164 |
+
|
| 2165 |
58%|ββ | 58/100 [00:03<00:02, 16.20it/s][A
|
| 2166 |
+
|
| 2167 |
60%|ββ | 60/100 [00:03<00:02, 16.75it/s][A
|
| 2168 |
+
|
| 2169 |
62%|ββ | 62/100 [00:03<00:02, 17.22it/s][A
|
| 2170 |
+
|
| 2171 |
64%|ββ | 64/100 [00:03<00:02, 17.50it/s][A
|
| 2172 |
+
|
| 2173 |
66%|ββ | 66/100 [00:03<00:02, 16.78it/s][A
|
| 2174 |
+
|
| 2175 |
68%|ββ | 68/100 [00:04<00:01, 17.35it/s][A
|
| 2176 |
+
|
| 2177 |
70%|ββ | 70/100 [00:04<00:01, 16.79it/s][A
|
| 2178 |
+
|
| 2179 |
72%|βββ| 72/100 [00:04<00:01, 17.31it/s][A
|
| 2180 |
+
|
| 2181 |
74%|βββ| 74/100 [00:04<00:01, 16.43it/s][A
|
| 2182 |
+
|
| 2183 |
77%|βββ| 77/100 [00:04<00:01, 17.05it/s][A
|
| 2184 |
+
|
| 2185 |
79%|βββ| 79/100 [00:04<00:01, 17.48it/s][A
|
| 2186 |
+
|
| 2187 |
81%|βββ| 81/100 [00:04<00:01, 17.23it/s][A
|
| 2188 |
+
|
| 2189 |
84%|βββ| 84/100 [00:04<00:00, 18.43it/s][A
|
| 2190 |
+
|
| 2191 |
86%|βββ| 86/100 [00:05<00:00, 17.73it/s][A
|
| 2192 |
+
|
| 2193 |
89%|βββ| 89/100 [00:05<00:00, 17.95it/s][A
|
| 2194 |
+
|
| 2195 |
91%|βββ| 91/100 [00:05<00:00, 18.29it/s][A
|
| 2196 |
+
|
| 2197 |
93%|βββ| 93/100 [00:05<00:00, 17.19it/s][A
|
| 2198 |
+
|
| 2199 |
95%|βββ| 95/100 [00:05<00:00, 16.82it/s][A
|
| 2200 |
+
|
| 2201 |
97%|βββ| 97/100 [00:05<00:00, 16.89it/s][A
|
| 2202 |
+
|
| 2203 |
|
| 2204 |
+
|
| 2205 |
|
| 2206 |
+
|
| 2207 |
80%|β| 800/1000 [13:41<02:28, 1.34it/s]
|
| 2208 |
+
|
| 2209 |
+
|
| 2210 |
[A[2026-03-30 14:48:55,637] [INFO] [axolotl.core.trainers.base._save:722] [PID:37135] Saving model checkpoint to /workspace/data/axolotl-outputs/sft/gemma-2-2b-it-rp-sft-qlora/checkpoint-800
|
| 2211 |
+
|
| 2212 |
80%|β| 801/1000 [13:44<10:45, 3.25s/it]
|
| 2213 |
|
| 2214 |
+
|
| 2215 |
80%|β| 801/1000 [13:44<10:45, 3.25s/it]
|
| 2216 |
80%|β| 802/1000 [13:45<08:15, 2.50s/it]
|
| 2217 |
|
| 2218 |
+
|
| 2219 |
80%|β| 802/1000 [13:45<08:15, 2.50s/it]
|
| 2220 |
80%|β| 803/1000 [13:46<06:28, 1.97s/it]
|
| 2221 |
|
| 2222 |
+
|
| 2223 |
80%|β| 803/1000 [13:46<06:28, 1.97s/it]
|
| 2224 |
80%|β| 804/1000 [13:46<05:12, 1.59s/it]
|
| 2225 |
|
| 2226 |
+
|
| 2227 |
80%|β| 804/1000 [13:46<05:12, 1.59s/it]
|