isaiahbjork/cot-logic-reasoning
Viewer • Updated • 10.5k • 129 • 17
How to use alibidaran/GRPO_LLAMA3-instructive_reasoning1 with Transformers:
# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("alibidaran/GRPO_LLAMA3-instructive_reasoning1", dtype="auto")How to use alibidaran/GRPO_LLAMA3-instructive_reasoning1 with Unsloth Studio:
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for alibidaran/GRPO_LLAMA3-instructive_reasoning1 to start chatting
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for alibidaran/GRPO_LLAMA3-instructive_reasoning1 to start chatting
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for alibidaran/GRPO_LLAMA3-instructive_reasoning1 to start chatting
pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
model_name="alibidaran/GRPO_LLAMA3-instructive_reasoning1",
max_seq_length=2048,
)This llama model was trained 2x faster with Unsloth and Huggingface's TRL library.
We are using MMLU dataset in different tasks. Here are the results of using 100 random samples of MMLU dataset.