Instructions to use kishan51/variable_grpo_checkpoint100 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use kishan51/variable_grpo_checkpoint100 with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("unsloth/Qwen2.5-0.5B") model = PeftModel.from_pretrained(base_model, "kishan51/variable_grpo_checkpoint100") - Transformers
How to use kishan51/variable_grpo_checkpoint100 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="kishan51/variable_grpo_checkpoint100") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("kishan51/variable_grpo_checkpoint100", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use kishan51/variable_grpo_checkpoint100 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "kishan51/variable_grpo_checkpoint100" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "kishan51/variable_grpo_checkpoint100", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/kishan51/variable_grpo_checkpoint100
- SGLang
How to use kishan51/variable_grpo_checkpoint100 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "kishan51/variable_grpo_checkpoint100" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "kishan51/variable_grpo_checkpoint100", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "kishan51/variable_grpo_checkpoint100" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "kishan51/variable_grpo_checkpoint100", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Unsloth Studio new
How to use kishan51/variable_grpo_checkpoint100 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for kishan51/variable_grpo_checkpoint100 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for kishan51/variable_grpo_checkpoint100 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for kishan51/variable_grpo_checkpoint100 to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="kishan51/variable_grpo_checkpoint100", max_seq_length=2048, ) - Docker Model Runner
How to use kishan51/variable_grpo_checkpoint100 with Docker Model Runner:
docker model run hf.co/kishan51/variable_grpo_checkpoint100
| { | |
| "best_global_step": null, | |
| "best_metric": null, | |
| "best_model_checkpoint": null, | |
| "epoch": 0.13333333333333333, | |
| "eval_steps": 500, | |
| "global_step": 100, | |
| "is_hyper_param_search": false, | |
| "is_local_process_zero": true, | |
| "is_world_process_zero": true, | |
| "log_history": [ | |
| { | |
| "clip_ratio/high_max": 0.0, | |
| "clip_ratio/high_mean": 0.0, | |
| "clip_ratio/low_mean": 0.0, | |
| "clip_ratio/low_min": 0.0, | |
| "clip_ratio/region_mean": 0.0, | |
| "completion_length": 1369.5, | |
| "completions/clipped_ratio": 0.03125, | |
| "completions/max_length": 1369.5, | |
| "completions/max_terminated_length": 967.4, | |
| "completions/mean_length": 349.9875, | |
| "completions/mean_terminated_length": 310.6795928955078, | |
| "completions/min_length": 19.7, | |
| "completions/min_terminated_length": 19.7, | |
| "epoch": 0.013333333333333334, | |
| "frac_reward_zero_std": 0.0, | |
| "grad_norm": 175.513671875, | |
| "kl": 0.406684020392967, | |
| "learning_rate": 9.000000000000001e-07, | |
| "loss": 0.0004, | |
| "num_tokens": 73366.0, | |
| "reward": -1.9828404784202576, | |
| "reward_std": 0.43738200813531875, | |
| "rewards/reward_func/mean": -1.9828404903411865, | |
| "rewards/reward_func/std": 0.5390992075204849, | |
| "step": 10 | |
| }, | |
| { | |
| "clip_ratio/high_max": 0.0, | |
| "clip_ratio/high_mean": 0.0, | |
| "clip_ratio/low_mean": 0.0, | |
| "clip_ratio/low_min": 0.0, | |
| "clip_ratio/region_mean": 0.0, | |
| "completion_length": 1114.4, | |
| "completions/clipped_ratio": 0.025, | |
| "completions/max_length": 1114.4, | |
| "completions/max_terminated_length": 878.8, | |
| "completions/mean_length": 285.65, | |
| "completions/mean_terminated_length": 253.98928833007812, | |
| "completions/min_length": 21.2, | |
| "completions/min_terminated_length": 21.2, | |
| "epoch": 0.02666666666666667, | |
| "frac_reward_zero_std": 0.0, | |
| "grad_norm": 84.8028335571289, | |
| "kl": 0.06926110610365868, | |
| "learning_rate": 1.9000000000000002e-06, | |
| "loss": 0.0001, | |
| "num_tokens": 135618.0, | |
| "reward": -2.0299901008605956, | |
| "reward_std": 0.34289126843214035, | |
| "rewards/reward_func/mean": -2.0299901247024534, | |
| "rewards/reward_func/std": 0.36671974062919616, | |
| "step": 20 | |
| }, | |
| { | |
| "clip_ratio/high_max": 0.0, | |
| "clip_ratio/high_mean": 0.0, | |
| "clip_ratio/low_mean": 0.0, | |
| "clip_ratio/low_min": 0.0, | |
| "clip_ratio/region_mean": 0.0, | |
| "completion_length": 1320.9, | |
| "completions/clipped_ratio": 0.0375, | |
| "completions/max_length": 1320.9, | |
| "completions/max_terminated_length": 1137.8, | |
| "completions/mean_length": 378.5125, | |
| "completions/mean_terminated_length": 332.92625885009767, | |
| "completions/min_length": 32.1, | |
| "completions/min_terminated_length": 32.1, | |
| "epoch": 0.04, | |
| "frac_reward_zero_std": 0.0, | |
| "grad_norm": 171.62295532226562, | |
| "kl": 0.6327452264726162, | |
| "learning_rate": 2.9e-06, | |
| "loss": 0.0006, | |
| "num_tokens": 212112.0, | |
| "reward": -1.8967698693275452, | |
| "reward_std": 0.5322311833500862, | |
| "rewards/reward_func/mean": -1.8967698216438293, | |
| "rewards/reward_func/std": 0.6287578240036964, | |
| "step": 30 | |
| }, | |
| { | |
| "clip_ratio/high_max": 0.0, | |
| "clip_ratio/high_mean": 0.0, | |
| "clip_ratio/low_mean": 0.0, | |
| "clip_ratio/low_min": 0.0, | |
| "clip_ratio/region_mean": 0.0, | |
| "completion_length": 1158.5, | |
| "completions/clipped_ratio": 0.04375, | |
| "completions/max_length": 1158.5, | |
| "completions/max_terminated_length": 814.9, | |
| "completions/mean_length": 302.51875, | |
| "completions/mean_terminated_length": 245.66839904785155, | |
| "completions/min_length": 21.4, | |
| "completions/min_terminated_length": 21.4, | |
| "epoch": 0.05333333333333334, | |
| "frac_reward_zero_std": 0.0, | |
| "grad_norm": 155.87843322753906, | |
| "kl": 20.71284626722336, | |
| "learning_rate": 3.900000000000001e-06, | |
| "loss": 0.0207, | |
| "num_tokens": 278219.0, | |
| "reward": -1.8949542403221131, | |
| "reward_std": 0.5414254635572433, | |
| "rewards/reward_func/mean": -1.8949542760848999, | |
| "rewards/reward_func/std": 0.6462407022714615, | |
| "step": 40 | |
| }, | |
| { | |
| "clip_ratio/high_max": 0.0, | |
| "clip_ratio/high_mean": 0.0, | |
| "clip_ratio/low_mean": 0.0, | |
| "clip_ratio/low_min": 0.0, | |
| "clip_ratio/region_mean": 0.0, | |
| "completion_length": 1438.0, | |
| "completions/clipped_ratio": 0.05625, | |
| "completions/max_length": 1438.0, | |
| "completions/max_terminated_length": 957.8, | |
| "completions/mean_length": 368.88125, | |
| "completions/mean_terminated_length": 298.481379699707, | |
| "completions/min_length": 25.8, | |
| "completions/min_terminated_length": 25.8, | |
| "epoch": 0.06666666666666667, | |
| "frac_reward_zero_std": 0.0, | |
| "grad_norm": 246.18365478515625, | |
| "kl": 81.723886013031, | |
| "learning_rate": 4.9000000000000005e-06, | |
| "loss": 0.0817, | |
| "num_tokens": 354520.0, | |
| "reward": -1.9963937759399415, | |
| "reward_std": 0.4492200046777725, | |
| "rewards/reward_func/mean": -1.9963937878608704, | |
| "rewards/reward_func/std": 0.5674424737691879, | |
| "step": 50 | |
| }, | |
| { | |
| "clip_ratio/high_max": 0.0, | |
| "clip_ratio/high_mean": 0.0, | |
| "clip_ratio/low_mean": 0.0, | |
| "clip_ratio/low_min": 0.0, | |
| "clip_ratio/region_mean": 0.0, | |
| "completion_length": 1460.3, | |
| "completions/clipped_ratio": 0.0625, | |
| "completions/max_length": 1460.3, | |
| "completions/max_terminated_length": 968.9, | |
| "completions/mean_length": 354.225, | |
| "completions/mean_terminated_length": 275.43447265625, | |
| "completions/min_length": 24.4, | |
| "completions/min_terminated_length": 24.4, | |
| "epoch": 0.08, | |
| "frac_reward_zero_std": 0.0, | |
| "grad_norm": 101.76862335205078, | |
| "kl": 7.8479931354522705, | |
| "learning_rate": 4.9000000000000005e-06, | |
| "loss": 0.0078, | |
| "num_tokens": 428700.0, | |
| "reward": -2.0827497005462647, | |
| "reward_std": 0.33130341917276385, | |
| "rewards/reward_func/mean": -2.0827497005462647, | |
| "rewards/reward_func/std": 0.4056150123476982, | |
| "step": 60 | |
| }, | |
| { | |
| "clip_ratio/high_max": 0.0, | |
| "clip_ratio/high_mean": 0.0, | |
| "clip_ratio/low_mean": 0.0, | |
| "clip_ratio/low_min": 0.0, | |
| "clip_ratio/region_mean": 0.0, | |
| "completion_length": 1161.2, | |
| "completions/clipped_ratio": 0.03125, | |
| "completions/max_length": 1161.2, | |
| "completions/max_terminated_length": 900.7, | |
| "completions/mean_length": 321.36875, | |
| "completions/mean_terminated_length": 282.4824447631836, | |
| "completions/min_length": 11.5, | |
| "completions/min_terminated_length": 11.5, | |
| "epoch": 0.09333333333333334, | |
| "frac_reward_zero_std": 0.0, | |
| "grad_norm": 140.64456176757812, | |
| "kl": 15.888973093032837, | |
| "learning_rate": 4.7888888888888894e-06, | |
| "loss": 0.0159, | |
| "num_tokens": 497455.0, | |
| "reward": -1.948601198196411, | |
| "reward_std": 0.4984253913164139, | |
| "rewards/reward_func/mean": -1.9486011862754822, | |
| "rewards/reward_func/std": 0.587457275390625, | |
| "step": 70 | |
| }, | |
| { | |
| "clip_ratio/high_max": 0.0, | |
| "clip_ratio/high_mean": 0.0, | |
| "clip_ratio/low_mean": 0.0, | |
| "clip_ratio/low_min": 0.0, | |
| "clip_ratio/region_mean": 0.0, | |
| "completion_length": 1319.1, | |
| "completions/clipped_ratio": 0.075, | |
| "completions/max_length": 1319.1, | |
| "completions/max_terminated_length": 917.3, | |
| "completions/mean_length": 363.85, | |
| "completions/mean_terminated_length": 271.0082809448242, | |
| "completions/min_length": 31.9, | |
| "completions/min_terminated_length": 31.9, | |
| "epoch": 0.10666666666666667, | |
| "frac_reward_zero_std": 0.0, | |
| "grad_norm": 151.1342010498047, | |
| "kl": 612.0117300510407, | |
| "learning_rate": 4.677777777777778e-06, | |
| "loss": 0.612, | |
| "num_tokens": 572911.0, | |
| "reward": -2.036101925373077, | |
| "reward_std": 0.3775564029812813, | |
| "rewards/reward_func/mean": -2.0361019134521485, | |
| "rewards/reward_func/std": 0.44103380143642423, | |
| "step": 80 | |
| }, | |
| { | |
| "clip_ratio/high_max": 0.0, | |
| "clip_ratio/high_mean": 0.0, | |
| "clip_ratio/low_mean": 0.0, | |
| "clip_ratio/low_min": 0.0, | |
| "clip_ratio/region_mean": 0.0, | |
| "completion_length": 1144.8, | |
| "completions/clipped_ratio": 0.025, | |
| "completions/max_length": 1144.8, | |
| "completions/max_terminated_length": 842.0, | |
| "completions/mean_length": 297.375, | |
| "completions/mean_terminated_length": 264.98167114257814, | |
| "completions/min_length": 13.7, | |
| "completions/min_terminated_length": 13.7, | |
| "epoch": 0.12, | |
| "frac_reward_zero_std": 0.0, | |
| "grad_norm": 76.42510223388672, | |
| "kl": 29.60899884700775, | |
| "learning_rate": 4.566666666666667e-06, | |
| "loss": 0.0296, | |
| "num_tokens": 637651.0, | |
| "reward": -2.0117875218391417, | |
| "reward_std": 0.48156112879514695, | |
| "rewards/reward_func/mean": -2.011787533760071, | |
| "rewards/reward_func/std": 0.6340822532773018, | |
| "step": 90 | |
| }, | |
| { | |
| "clip_ratio/high_max": 0.0, | |
| "clip_ratio/high_mean": 0.0, | |
| "clip_ratio/low_mean": 0.0, | |
| "clip_ratio/low_min": 0.0, | |
| "clip_ratio/region_mean": 0.0, | |
| "completion_length": 1019.2, | |
| "completions/clipped_ratio": 0.0375, | |
| "completions/max_length": 1019.2, | |
| "completions/max_terminated_length": 805.0, | |
| "completions/mean_length": 292.1625, | |
| "completions/mean_terminated_length": 243.4895050048828, | |
| "completions/min_length": 21.2, | |
| "completions/min_terminated_length": 21.2, | |
| "epoch": 0.13333333333333333, | |
| "frac_reward_zero_std": 0.0, | |
| "grad_norm": 52.52836990356445, | |
| "kl": 5.190238785743714, | |
| "learning_rate": 4.455555555555555e-06, | |
| "loss": 0.0052, | |
| "num_tokens": 701457.0, | |
| "reward": -2.0103227376937864, | |
| "reward_std": 0.3981830716133118, | |
| "rewards/reward_func/mean": -2.010322690010071, | |
| "rewards/reward_func/std": 0.46229155659675597, | |
| "step": 100 | |
| } | |
| ], | |
| "logging_steps": 10, | |
| "max_steps": 500, | |
| "num_input_tokens_seen": 701457, | |
| "num_train_epochs": 1, | |
| "save_steps": 100, | |
| "stateful_callbacks": { | |
| "TrainerControl": { | |
| "args": { | |
| "should_epoch_stop": false, | |
| "should_evaluate": false, | |
| "should_log": false, | |
| "should_save": true, | |
| "should_training_stop": false | |
| }, | |
| "attributes": {} | |
| } | |
| }, | |
| "total_flos": 0.0, | |
| "train_batch_size": 16, | |
| "trial_name": null, | |
| "trial_params": null | |
| } | |