| language: en | |
| license: mit | |
| # M-r1_distill_baseline-rl | |
| ## Model Details | |
| - **Training Method**: VeRL Reinforcement Learning (RL) | |
| - **Stage Name**: rl | |
| - **Experiment**: r1_distill_baseline | |
| - **RL Framework**: VeRL (Versatile Reinforcement Learning) | |
| ## Training Configuration | |
| ## Experiment Tracking | |
| 🔗 **View complete experiment details**: https://huggingface.co/datasets/TAUR-dev/D-ExpTracker__r1_distill_baseline__v1 | |
| ## Usage | |
| ```python | |
| from transformers import AutoTokenizer, AutoModelForCausalLM | |
| tokenizer = AutoTokenizer.from_pretrained("TAUR-dev/M-r1_distill_baseline-rl") | |
| model = AutoModelForCausalLM.from_pretrained("TAUR-dev/M-r1_distill_baseline-rl") | |
| ``` |