alibidaran
/

GRPO_LLAMA3-instructive_reasoning1

text-generation-inference

Model card Files Files and versions

Uploaded model

Developed by: alibidaran
License: apache-2.0
Finetuned from model : unsloth/meta-llama-3.1-8b-instruct-unsloth-bnb-4bit

This llama model was trained 2x faster with Unsloth and Huggingface's TRL library.

Evalution results

We are using MMLU dataset in different tasks. Here are the results of using 100 random samples of MMLU dataset.

Professional Psychology : 76%
Manengment: 74%
sociology: 75%

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train alibidaran/GRPO_LLAMA3-instructive_reasoning1