| language: en | |
| license: mit | |
| # M-1110_star__oursfixed_alltask-rl | |
| ## Model Details | |
| - **Training Method**: VeRL Reinforcement Learning (RL) | |
| - **Stage Name**: rl | |
| - **Experiment**: 1110_star__oursfixed_alltask | |
| - **RL Framework**: VeRL (Versatile Reinforcement Learning) | |
| ## Training Configuration | |
| ## Experiment Tracking | |
| 🔗 **View complete experiment details**: https://huggingface.co/datasets/TAUR-dev/D-ExpTracker__1110_star__oursfixed_alltask__v1 | |
| ## Usage | |
| ```python | |
| from transformers import AutoTokenizer, AutoModelForCausalLM | |
| tokenizer = AutoTokenizer.from_pretrained("TAUR-dev/M-1110_star__oursfixed_alltask-rl") | |
| model = AutoModelForCausalLM.from_pretrained("TAUR-dev/M-1110_star__oursfixed_alltask-rl") | |
| ``` |