| language: en | |
| license: mit | |
| # M-1117_newmodels__qwen7b_R1Distill_ct3arg-rl | |
| ## Model Details | |
| - **Training Method**: VeRL Reinforcement Learning (RL) | |
| - **Stage Name**: rl | |
| - **Experiment**: 1117_newmodels__qwen7b_R1Distill_ct3arg | |
| - **RL Framework**: VeRL (Versatile Reinforcement Learning) | |
| ## Training Configuration | |
| ## Experiment Tracking | |
| 🔗 **View complete experiment details**: https://huggingface.co/datasets/TAUR-dev/D-ExpTracker__1117_newmodels__qwen7b_R1Distill_ct3arg__v1 | |
| ## Usage | |
| ```python | |
| from transformers import AutoTokenizer, AutoModelForCausalLM | |
| tokenizer = AutoTokenizer.from_pretrained("TAUR-dev/M-1117_newmodels__qwen7b_R1Distill_ct3arg-rl") | |
| model = AutoModelForCausalLM.from_pretrained("TAUR-dev/M-1117_newmodels__qwen7b_R1Distill_ct3arg-rl") | |
| ``` |