|
|
--- |
|
|
base_model: unsloth/gpt-oss-20b-unsloth-bnb-4bit |
|
|
tags: |
|
|
- text-generation-inference |
|
|
- transformers |
|
|
- unsloth |
|
|
- gpt_oss |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- en |
|
|
new_version: EpistemeAI/metatune-gpt20b-R1.1 |
|
|
--- |
|
|
|
|
|
## Model Card |
|
|
### We release open-weight metatune-gpt20b, fine tuned version of OpenAI's gpt-oss-20b model, this is one of the first public release recursive self improving AI. |
|
|
- Generates new data for itself, |
|
|
- Evaluates its performance, and |
|
|
- Adjusts its own hyperparameters based on improvement metrics. |
|
|
|
|
|
### additional Model Information |
|
|
Due to recursive self improvement method, there is no final model, but improved model, this is a 5th metacycle(generation) improved checkpoint model. |
|
|
|
|
|
## Use cases: |
|
|
- general purpose, and coding |
|
|
|
|
|
## Guardrails: |
|
|
- generally, please set reasoning = "high", it will usually prevent jailbreaking and prompt injection |
|
|
- use safety gpt oss 20b for guardrails before this model: [openai/gpt-oss-safeguard-20b](https://huggingface.co/openai/gpt-oss-safeguard-20b) |
|
|
|
|
|
# Inference examples |
|
|
|
|
|
## Transformers |
|
|
|
|
|
You can use `gpt-oss-120b` and `gpt-oss-20b` with Transformers. If you use the Transformers chat template, it will automatically apply the [harmony response format](https://github.com/openai/harmony). If you use `model.generate` directly, you need to apply the harmony format manually using the chat template or use our [openai-harmony](https://github.com/openai/harmony) package. |
|
|
|
|
|
To get started, install the necessary dependencies to setup your environment: |
|
|
|
|
|
``` |
|
|
pip install -U transformers kernels torch |
|
|
``` |
|
|
|
|
|
For Google Colab (free/Pro) |
|
|
``` |
|
|
!pip install -q --upgrade torch |
|
|
|
|
|
!pip install -q transformers triton==3.4 kernels |
|
|
|
|
|
!pip uninstall -q torchvision torchaudio -y |
|
|
``` |
|
|
|
|
|
Once, setup you can proceed to run the model by running the snippet below: |
|
|
|
|
|
```py |
|
|
from transformers import pipeline |
|
|
import torch |
|
|
model_id = "EpistemeAI/metatune-gpt20b-R1.2" |
|
|
pipe = pipeline( |
|
|
"text-generation", |
|
|
model=model_id, |
|
|
torch_dtype="auto", |
|
|
device_map="auto", |
|
|
) |
|
|
messages = [ |
|
|
{"role": "user", "content": "Derive the Euler–Lagrange equation from the principle of stationary action.""}, |
|
|
] |
|
|
outputs = pipe( |
|
|
messages, |
|
|
max_new_tokens=3000, |
|
|
) |
|
|
print(outputs[0]["generated_text"][-1]) |
|
|
``` |
|
|
# Reasoning levels |
|
|
|
|
|
You can adjust the reasoning level that suits your task across three levels: |
|
|
|
|
|
* **Low:** Fast responses for general dialogue. |
|
|
* **Medium:** Balanced speed and detail. |
|
|
* **High:** Deep and detailed analysis. |
|
|
|
|
|
The reasoning level can be set in the system prompts, e.g., "Reasoning: high". |
|
|
|
|
|
# Tool use |
|
|
|
|
|
The gpt-oss models are excellent for: |
|
|
* Web browsing (using built-in browsing tools) |
|
|
* Function calling with defined schemas |
|
|
* Agentic operations like browser tasks |
|
|
|
|
|
# Fine-tuning |
|
|
|
|
|
Both gpt-oss models can be fine-tuned for a variety of specialized use cases. |
|
|
|
|
|
|
|
|
# Risk: |
|
|
- Prompt safely with recursive self improvement model. Use safety gpt oss 20b for model safety analysis |
|
|
- Do not use this model for creating nuclear, biological and chemical weapons. |
|
|
- Do not allow harmful or malicious outputs |
|
|
|
|
|
## Benchmark |
|
|
Code to duplicate the benchmark (Using +std for final result) |
|
|
```py |
|
|
|
|
|
#gpqa diamond |
|
|
!lm_eval --model hf --model_args pretrained=EpistemeAI/metatune-gpt20b-R1.2,parallelize=True,dtype=bfloat16 --tasks gpqa_diamond_cot_zeroshot --num_fewshot 0 --gen_kwargs temperature=0.9,top_p=0.9,max_new_tokens=2048 --batch_size auto:4 --limit 10 --device cuda:0 --output_path ./eval_harness/gpt-oss-20b3 |
|
|
#gsm8k cot |
|
|
!lm_eval --model hf --model_args pretrained=EpistemeAI/metatune-gpt20b-R1.2,parallelize=True,dtype=bfloat16 --tasks gsm8k_cot_llama --apply_chat_template --fewshot_as_multiturn --num_fewshot 0 --gen_kwargs temperature=0.9,top_p=0.9,max_new_tokens=1024 --batch_size auto:4 --limit 10 --device cuda:0 --output_path ./eval_harness/gpt-oss-20b3 |
|
|
#mmlu computer science |
|
|
!lm_eval --model hf --model_args pretrained=EpistemeAI/metatune-gpt20b-R1.2,parallelize=True,dtype=bfloat16 --tasks mmlu_pro_plus_computer_science --apply_chat_template --fewshot_as_multiturn --num_fewshot 0 --gen_kwargs temperature=0.9,top_p=0.9,max_new_tokens=1024 --batch_size auto:4 --limit 10 --device cuda:0 --output_path ./eval_harness/gpt-oss-20b3 |
|
|
|
|
|
``` |
|
|
|
|
|
hf (pretrained=EpistemeAI/metatune-gpt20b-R1.09,parallelize=True,dtype=bfloat16), gen_kwargs: (temperature=0.9,top_p=0.9,max_new_tokens=2048), limit: 10.0, num_fewshot: 0, batch_size: auto:4 |
|
|
| Tasks |Version| Filter |n-shot| Metric |metatune R1.09(high)| metatune R1.1|metatune R0| |
|
|
|-------------------------|------:|----------------|:-----|-----------|:------------|:-----------|:----------| |
|
|
|gsm8k_cot_llama | 3|flexible- extrac| 0|exact_match| +1.0(0.9) |+1.0(0.9) |0.91 | |
|
|
|gpqa_diamond_cot_zeroshot| 1|flexible-extract| 0|exact_match| 0.933 |0.933 | | |
|
|
# Uploaded finetuned model |
|
|
|
|
|
- **Developed by:** EpistemeAI |
|
|
- **License:** apache-2.0 |
|
|
- **Finetuned from model :** unsloth/gpt-oss-20b-unsloth-bnb-4bit |
|
|
|
|
|
This gpt_oss model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library. |
|
|
|
|
|
[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth) |